Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
an option to raise exception if oom happens during fairseq.trainer.tr…
…ain_step (facebookresearch#2) Summary: Pull Request resolved: fairinternal/fairspeq#2 Pull Request resolved: facebookresearch#689 We found not raising OOM during trainer.train_step causes various issue, including NCCL hangs / gloo sync errors because gradient is not synced properly. Before we found the root cause, let's give users an option to raise OOMs. Reviewed By: jmp84 Differential Revision: D15170357 fbshipit-source-id: 3e15e4e111a8380612157955509c39821a216ec4
- Loading branch information