[minor] OSS: bring DDP in the benchmark #130

blefaudeux · 2020-10-08T18:50:59Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
Did you read the contributor guideline?
Did you make sure to update the docs?
Did you write any new necessary tests?

What does this PR do?

A suggestion from @msbaines on another PR made me think of that, the benchmark job that we had was not using DDP so not super realistic. The fake pictures were also the same on each rank, which prevents catching some sync issues, fixing that

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

blefaudeux · 2020-10-08T18:51:33Z

(the values in the benchmark regression test will need to be updated, that's expected. The current commit will fail because of that)

min-xu-ai

LGTM!

min-xu-ai · 2020-10-08T19:57:37Z

benchmarks/oss.py

@@ -91,6 +94,7 @@ def train(
        optimizer = ddp.optimizer
        model = ddp
    else:
+        model = DDP(model, device_ids=[rank])


should we toggle find_unused_parameters to True/False and see its impacts?

ahh, good point ! it was not yet up for review actually, looks like there's something wrong with SDP, looking into that :) I'll test this option

oops, sorry, didn't notice that! :-)

blefaudeux · 2020-10-08T21:08:18Z

looks like something is wrong in SDP depending on the world size, and this benchmark change exposed that

…is correct wrt DDP, but SDP needs to be fixed

blefaudeux added 2 commits October 8, 2020 11:34

more realistic benchmark

3a069d5

randomize pictures per rank, better sanity test

cf5bbcc

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 8, 2020

regression test adjustment to the fact that DDP is now running

bf6e85c

min-xu-ai approved these changes Oct 8, 2020

View reviewed changes

Merge remote-tracking branch 'upstream/master' into oss_realistic_bench

60648aa

blefaudeux marked this pull request as draft October 8, 2020 21:07

blefaudeux added 3 commits October 8, 2020 15:19

Merge remote-tracking branch 'upstream/master' into oss_realistic_bench

f4ddc78

broadcast buffers, needed for BN correctness

f0eb243

Switching SDP regression test off for now so that the benchmark code …

940f431

…is correct wrt DDP, but SDP needs to be fixed

blefaudeux marked this pull request as ready for review October 9, 2020 04:39

blefaudeux mentioned this pull request Oct 9, 2020

[OSS-SDP] Acuracy bug - results differ from DDP and OSS #132

Closed

blefaudeux merged commit bfd88ca into master Oct 9, 2020

blefaudeux deleted the oss_realistic_bench branch October 9, 2020 04:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[minor] OSS: bring DDP in the benchmark #130

[minor] OSS: bring DDP in the benchmark #130

blefaudeux commented Oct 8, 2020

blefaudeux commented Oct 8, 2020

min-xu-ai left a comment

min-xu-ai Oct 8, 2020

blefaudeux Oct 8, 2020

min-xu-ai Oct 8, 2020

blefaudeux commented Oct 8, 2020

[minor] OSS: bring DDP in the benchmark #130

[minor] OSS: bring DDP in the benchmark #130

Conversation

blefaudeux commented Oct 8, 2020

Before submitting

What does this PR do?

PR review

Did you have fun?

blefaudeux commented Oct 8, 2020

min-xu-ai left a comment

Choose a reason for hiding this comment

min-xu-ai Oct 8, 2020

Choose a reason for hiding this comment

blefaudeux Oct 8, 2020

Choose a reason for hiding this comment

min-xu-ai Oct 8, 2020

Choose a reason for hiding this comment

blefaudeux commented Oct 8, 2020