Skip to content

Commit

Permalink
Bug: Fix Bug - Add barrier before 'destroy_process_group' in model be…
Browse files Browse the repository at this point in the history
…nchmarks (#198)

**Description**
Add barrier before 'destroy_process_group' to resolve the bug due to when multi models in one model benchmark, some processes haven't finished the previous process group while others failed to initialize new process group for the next model on rocm4.x when running bert_models.

**Major Revision**
-  Add barrier before 'destroy_process_group'.
  • Loading branch information
Yuting Jiang authored Sep 13, 2021
1 parent 1f9de77 commit 7a3a450
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions superbench/benchmarks/model_benchmarks/pytorch_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,7 @@ def _postprocess(self):

try:
if self._args.distributed_impl == DistributedImpl.DDP:
torch.distributed.barrier()
torch.distributed.destroy_process_group()
except BaseException as e:
self._result.set_return_code(ReturnCode.DISTRIBUTED_SETTING_DESTROY_FAILURE)
Expand Down

0 comments on commit 7a3a450

Please sign in to comment.