Skip to content

Commit

Permalink
[bugfix] OSS no reduce loss (#133)
Browse files Browse the repository at this point in the history
* bugfix
* adjust default non-regression loss, not all_reduced now
  • Loading branch information
blefaudeux committed Oct 10, 2020
1 parent 5220f89 commit 177151e
Show file tree
Hide file tree
Showing 3 changed files with 1 addition and 5 deletions.
2 changes: 1 addition & 1 deletion .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ run_oss_benchmark: &run_oss_benchmark
- run:
name: Run OSS Benchmark
command: |
python benchmarks/oss.py --check_regression --world_size 4 --reference_speed 13.7 --reference_memory 4390 --reference_loss 0.595
python benchmarks/oss.py --check_regression --world_size 4 --reference_speed 13.7 --reference_memory 4390 --reference_loss 0.152
run_oss_gloo: &run_oss_gloo
- run:
Expand Down
2 changes: 0 additions & 2 deletions benchmarks/oss.py
Original file line number Diff line number Diff line change
Expand Up @@ -124,8 +124,6 @@ def closure():
loss /= world_size
loss.backward()

dist.all_reduce(loss, op=dist.ReduceOp.SUM)

if use_sdp:
ddp.reduce() # Send the gradients to the appropriate shards

Expand Down
2 changes: 0 additions & 2 deletions docs/source/tutorials/oss.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,6 @@ Let's suppose that your trainer looks like
loss = loss_fn(outputs, target)
loss /= world_size
loss.backward()
torch.distributed.all_reduce(loss, op=torch.distributed.ReduceOp.SUM)
optimizer.step()
Expand Down Expand Up @@ -90,7 +89,6 @@ Then sharding the optimizer state is merely a matter of wrapping your optimizer
loss = loss_fn(outputs, target)
loss /= world_size
loss.backward()
torch.distributed.all_reduce(loss, op=torch.distributed.ReduceOp.SUM)
optimizer.step()
Expand Down

0 comments on commit 177151e

Please sign in to comment.