Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] Gossip/SlowMo #378

Merged
merged 123 commits into from
Nov 8, 2021
Merged
Changes from 2 commits
Commits
Show all changes
123 commits
Select commit Hold shift + click to select a range
268f2f8
Add latest version of gossip code from branch latest_master of vtanti…
vtantia Jan 5, 2021
f152379
Add code for importing GossipDataParallel in fairscale
vtantia Jan 5, 2021
bbeab4a
Add tests (currently in wrong location so will need to be moved)
vtantia Jan 5, 2021
4616722
Remove extra ad_psgd file
vtantia Jan 5, 2021
ed7b866
Add change in gitignore to ignore vscode config
vtantia Jan 5, 2021
89d865f
Perform formatting (black, isort, flake8)
vtantia Jan 5, 2021
8157603
Add scripts to load environment and format code
vtantia Jan 5, 2021
5d458f9
Add stubs for fairscale script
vtantia Jan 5, 2021
9fdd823
[Temp] Comment out a line in stubs to fix error message
vtantia Jan 5, 2021
3a09576
Remove remaining adpsgd code
vtantia Jan 6, 2021
96fec9e
Remove unnecessary function
vtantia Jan 19, 2021
d32d384
Add mypy typing to GossipDataParallel
vtantia Jan 19, 2021
015537f
Fix formatting
vtantia Jan 19, 2021
9b5aff7
Make format.sh a script
vtantia Jan 19, 2021
cc83b84
Make flaky test log message clearer
vtantia Jan 21, 2021
9c78976
Fix minor bug in mypy implementation
vtantia Jan 21, 2021
3068a34
Add tests for SGP
vtantia Jan 21, 2021
dbb4eb3
Minor mypy changes
vtantia Jan 21, 2021
4b4c373
Fix errors with multiple process groups by synchronizing appropriately
vtantia Jan 21, 2021
ab01f16
Remove deprecated file
vtantia Jan 21, 2021
993c6ff
Fix mypy in utils/helpers.py
vtantia Jan 21, 2021
5e30d5d
Finish mypy typing for distributed.py
vtantia Feb 2, 2021
00c1ff2
Add typing to and format test files
vtantia Feb 2, 2021
7d75ab2
Fix mypy errors including those for switching to Python 3.6
vtantia Feb 2, 2021
7c1e998
Temporary commit - cleaning up parameters
vtantia Feb 2, 2021
92aef32
Remove single process support to make code cleaner
vtantia Feb 2, 2021
0e6f6ea
Change localsgd to be set as an option
vtantia Feb 2, 2021
98f9d36
Refactor perform_additional_optimizer_actions function
vtantia Feb 2, 2021
9427e12
Clean up
vtantia Feb 2, 2021
b1c66c7
Factor out sgp_int
vtantia Feb 2, 2021
43efe01
Add temporary comments to prevent auto-formatting of argument separation
vtantia Feb 9, 2021
70ab95f
Rename sgp functions. Move sgp and slowmo functions together
vtantia Feb 9, 2021
383dfbc
Factorize creation of process groups in SlowMo
vtantia Feb 9, 2021
8f8a275
Remove extra variable
vtantia Feb 9, 2021
d5d4108
Change default value of localsgd_frequency to 3
vtantia Feb 9, 2021
9b99cbc
Factorize initialization of process groups
vtantia Feb 9, 2021
24bd02c
Minor name change
vtantia Feb 9, 2021
aa63481
Minor formatting change
vtantia Feb 9, 2021
242272a
Add a todo
vtantia Feb 9, 2021
2380783
Make distributed_broadcast_coalesced more generalizable
vtantia Feb 9, 2021
b67ef2c
Fix pre-commit errors (mainly mypy)
vtantia Feb 9, 2021
7c57b58
Formatting changes in scripts
vtantia Feb 9, 2021
01b34c3
Missed renaming change
vtantia Feb 9, 2021
b43d859
Precommit formatting
vtantia Feb 9, 2021
1c6549b
Add changes for fairseq fp16 optimizer
vtantia Feb 9, 2021
49db45e
Change slowmo_world_size to slowmo_num_shards
vtantia Feb 9, 2021
33a39bb
Fix flaky test and change parameter names
vtantia Feb 9, 2021
84fe38d
Fix minor bugs
vtantia Feb 9, 2021
669c90b
Fairscale pyproject change. Not sure why this happens
vtantia Feb 9, 2021
aed0595
Add a no sharding version of SlowMo. Add tests for the no sharding ve…
vtantia Feb 10, 2021
65d3861
Clean up SGP conditions
vtantia Feb 10, 2021
ed8b219
minor tweaks, seems to run fine
blefaudeux Feb 11, 2021
2bbc373
lint
blefaudeux Feb 11, 2021
e6c1b7f
Merge branch 'master' into slowmo_ben
blefaudeux Feb 11, 2021
2d7eff3
removing some changes which slipped in
blefaudeux Feb 11, 2021
ebfc864
changing the cudnn deterministic setting, seems that running all test…
blefaudeux Feb 11, 2021
898cc55
moving all the tests to pytest, would probably need a second cleanup …
blefaudeux Feb 13, 2021
ee5f94c
fix an assert on a parameter list
blefaudeux Feb 14, 2021
1675f39
small test refactor, not perfect but a bit more redeable I presume
blefaudeux Feb 16, 2021
79ea7f8
does not look like setting files manually is a good idea
blefaudeux Feb 16, 2021
70e40bb
destroy process groups when done
blefaudeux Feb 17, 2021
1b74cc5
fixing unit tests firing consecutive process groups
blefaudeux Feb 19, 2021
152e004
Formatting changes
vtantia Feb 9, 2021
61d501f
Changes in documentation
vtantia Feb 10, 2021
94c6757
Add documentation for slowmo_memory_efficient
vtantia Feb 19, 2021
589a609
Make private methods start with underscore. Minor name changes
vtantia Feb 19, 2021
c10ada7
Move sgp related functions together
vtantia Feb 19, 2021
d0dece9
Minor flake8 fix
vtantia Feb 19, 2021
b9a7d8a
Remove enum SlowmoBaseAlgorithm. Use string instead
vtantia Feb 20, 2021
ecb558c
Remove extra parameter
vtantia Feb 20, 2021
4fd8be9
Change license header on all the files
vtantia Feb 20, 2021
ce50089
Rename function
vtantia Feb 20, 2021
8f7dc6e
Add tutorial for slowmo (very slightly modified from tutorial_oss.py)
vtantia Feb 20, 2021
c13e287
Fix broken tests on > 2 GPU machines
vtantia Feb 20, 2021
5d6dc69
Add SlowMo to init
vtantia Feb 20, 2021
3fa7593
Remove extra imports
vtantia Feb 20, 2021
cba3829
Minor addition missed 2 commits before
vtantia Feb 20, 2021
75b8cba
moving gossip to experimental
blefaudeux Mar 10, 2021
4ee9577
Merge branch 'master' into slowmo_ben
blefaudeux Mar 10, 2021
451cf6d
removing a change which slipped in
blefaudeux Mar 10, 2021
ef86bb1
Merge branch 'main' into slowmo_ben
blefaudeux Oct 18, 2021
b4a798f
code review + fixing an issue with model parallel tests
blefaudeux Oct 18, 2021
43ac702
removing private torch variable which seemed broken on nightly
blefaudeux Oct 18, 2021
76e87b4
addressing some more comments
blefaudeux Oct 18, 2021
fa214b7
tentatively debugging the unit tests, the interface is not too nice
blefaudeux Oct 19, 2021
1bd1b71
Fix a couple of bugs related to spawning processes
vtantia Oct 22, 2021
06f5af2
Fix a bug by ensuring that data is the same on all GPUs at setup time
vtantia Oct 22, 2021
5d025d0
Resolve comments on PR - misc
vtantia Oct 28, 2021
8fc366b
Resolve comments on PR - break rank and world_size into 2 variables
vtantia Oct 28, 2021
c89bdc9
Refactor to clean up _maybe_create_process_groups
vtantia Oct 28, 2021
1110eff
Fix non-deterministic behaviour in a clean way
vtantia Oct 28, 2021
7bf8017
Merge branch 'main' into slowmo_ben
vtantia Oct 28, 2021
da5357d
Fix bug by removing residual option
vtantia Oct 29, 2021
4d29165
Migrate list to deque to prevent future memory leak
vtantia Oct 29, 2021
e1aca67
Address PR comments
vtantia Oct 29, 2021
20f55ed
Minor formatting fixes
vtantia Oct 29, 2021
8d98b4d
Change slowmo_base_algorithm from string to Enum
vtantia Oct 29, 2021
cc3e829
Remove extra cast in the code
vtantia Oct 29, 2021
f3d91ec
Address PR comments
vtantia Oct 29, 2021
3eeef73
Update documentation to include SlowMo. Add tutorial. Remove tutorial…
vtantia Oct 27, 2021
e2a9d13
Modify docs to add custom sections
vtantia Nov 2, 2021
318dd91
Adress comments in PR in docs and tutorials
vtantia Nov 2, 2021
f83d5ad
Convert class and methods to abstract to address PR review
vtantia Nov 2, 2021
fb8383d
Adress further comments in PR in docs and tutorials
vtantia Nov 2, 2021
e19cc2a
Fix minor typo
vtantia Nov 2, 2021
da5bb69
Fix backticks linter error
vtantia Nov 2, 2021
ebe1196
Minor refactor - Rename an argument to remove Sphinx error
vtantia Nov 2, 2021
a19323e
Minor renaming in docs
vtantia Nov 3, 2021
306dbef
Merge branch 'main' into slowmo_ben
vtantia Nov 3, 2021
39383bd
Minor addition to CHANGELOG.md
vtantia Nov 3, 2021
5bd07f9
Merge branch 'main' into slowmo_ben
vtantia Nov 3, 2021
c6d0273
Add deep dive for SlowMo
vtantia Nov 4, 2021
b59a835
Modify deep dive and tutorial to address recommendations in code review
vtantia Nov 5, 2021
d9765a7
Minor refactor - name change
vtantia Nov 5, 2021
122e082
Modify deep dive to make condition for using SlowMo clearer
vtantia Nov 5, 2021
b325371
MModification to CHANGELOG.md to address review comments
vtantia Nov 5, 2021
45830c1
Add changes in documentation to address code review
vtantia Nov 5, 2021
c7242de
Fix minor linter error
vtantia Nov 5, 2021
22efbaa
Fix missing parameter in docs
vtantia Nov 5, 2021
68ff8f1
Fix link in docs
vtantia Nov 5, 2021
d0d94d0
Fix missing parameter in docs
vtantia Nov 5, 2021
67f6003
Modification to tutorials to address code review comments
vtantia Nov 8, 2021
9cf9153
Merge branch 'main' into slowmo_ben
vtantia Nov 8, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [FSDP]: limited support of shared weights between FSDP wrappers. This allows large parameter
and gradient memory to be sharded despite being needed from different layers due to
weight sharing. [#836]
- SlowMoDistributedDataParallel[feature][experimental] ([#378])
vtantia marked this conversation as resolved.
Show resolved Hide resolved

## [0.4.1] - 2021-09-17
### Fixed
Expand Down