Skip to content

Commit

Permalink
Bug: Fix Bug - Add barrier before 'destroy_process_group' in model be…
Browse files Browse the repository at this point in the history
…nchmarks (#198)

**Description**
Add barrier before 'destroy_process_group' to resolve the bug due to when multi models in one model benchmark, some processes haven't finished the previous process group while others failed to initialize new process group for the next model on rocm4.x when running bert_models.

**Major Revision**
-  Add barrier before 'destroy_process_group'.
  • Loading branch information
Yuting Jiang authored and abuccts committed Sep 24, 2021
1 parent 3553381 commit c9fb724
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions superbench/benchmarks/model_benchmarks/pytorch_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,7 @@ def _postprocess(self):

try:
if self._args.distributed_impl == DistributedImpl.DDP:
torch.distributed.barrier()
torch.distributed.destroy_process_group()
except BaseException as e:
self._result.set_return_code(ReturnCode.DISTRIBUTED_SETTING_DESTROY_FAILURE)
Expand Down

0 comments on commit c9fb724

Please sign in to comment.