Multi-node+multi-GPU `papers100m`+`GCN` example #8070

puririshi98 · 2023-09-22T20:18:31Z

working w/ nvidia pyg container

for more information, see https://pre-commit.ci

rusty1s

Wondering if it is possible to merge this with the initial multi-GPU example, or whether there exists a strong reason not do this.

puririshi98 · 2023-09-25T15:45:53Z

Wondering if it is possible to merge this with the initial multi-GPU example, or whether there exists a strong reason not do this.

The set up is relatively different. The old one uses mp.spawn for single node multigpu, for multinode multigpu we are using torch distributed w/ nccl backend, i think for now we should keep them seperate. additionally for the single node multigpu its very simple to run, just python3 scriptname.py and it handles the rest, for multinode you need to run several slurm commands to get it to work (see comments on file). i think a separate multi-node-multi-gpu example and tutorial (#8071) is more appropriate given the extra complexity

for more information, see https://pre-commit.ci

akihironitta

This will be a very nice addition! 🚀

Haven't tried running this example in a multi-node multi-gpu environment myself, but the code looks good to me. Just like Matthias minimised the other example in #7954, I think we should keep this example minimal, too, where it makes sense.

examples/multi_gpu/multigpu_papers100m_gcn.py

…cn.py

puririshi98 · 2023-10-17T16:50:37Z

@akihironitta @rusty1s lmk if anything else needed to merge

rusty1s

Thanks @puririshi98. I cleaned it up a bit. I really like the example. One thing we should definitely improve on would be to allow distributed evaluation as well, but it is not blocking this PR.

Multinode-multigpu Papers100m GCN example

ef32b84

puririshi98 requested a review from rusty1s September 22, 2023 20:18

puririshi98 requested a review from wsad1 as a code owner September 22, 2023 20:18

github-actions bot added the example label Sep 22, 2023

pre-commit-ci bot and others added 7 commits September 22, 2023 20:19

[pre-commit.ci] auto fixes from pre-commit.com hooks

8540ec0

for more information, see https://pre-commit.ci

Update multinode-multigpu-papers100m-gcn.py

7f03b57

[pre-commit.ci] auto fixes from pre-commit.com hooks

8693427

for more information, see https://pre-commit.ci

Update papers100m_multigpu.py

cc4ba56

Rename papers100m_multigpu.py to multigpu_papers100m_gcn.py

8341d12

optimization updates

4f6949f

[pre-commit.ci] auto fixes from pre-commit.com hooks

d4605a3

for more information, see https://pre-commit.ci

puririshi98 self-assigned this Sep 22, 2023

Update CHANGELOG.md

94e4757

rusty1s reviewed Sep 23, 2023

View reviewed changes

rusty1s changed the title ~~Multinode-multigpu Papers100m GCN example~~ Multi-node+multi-GPU papers100m+GCN example Sep 23, 2023

rusty1s added 0 - Priority P0 distributed labels Sep 23, 2023

puririshi98 and others added 11 commits September 25, 2023 08:45

Merge branch 'master' into papers100m-multinode

e297b1f

removing unused code

b45f939

cleaning up

25bce03

cleaning flake

96719ef

fix

7f4b0a1

[pre-commit.ci] auto fixes from pre-commit.com hooks

c78de79

for more information, see https://pre-commit.ci

cleaning

381cfe5

Merge branch 'master' into papers100m-multinode

2146bbb

Merge branch 'master' into papers100m-multinode

0dc57ce

Merge branch 'master' into papers100m-multinode

2763d0b

Merge branch 'master' into papers100m-multinode

e63c0c0

akihironitta reviewed Oct 6, 2023

View reviewed changes

examples/multi_gpu/multigpu_papers100m_gcn.py Outdated Show resolved Hide resolved

puririshi98 added 13 commits October 6, 2023 09:15

Rename multigpu_papers100m_gcn.py to singlenode_multigpu_papers100m_g…

4da89c1

…cn.py

Merge branch 'master' into papers100m-multinode

85ece34

Merge branch 'master' into papers100m-multinode

74da17a

Merge branch 'master' into papers100m-multinode

78ff70a

Merge branch 'master' into papers100m-multinode

c590526

Merge branch 'master' into papers100m-multinode

8edbe0c

Merge branch 'master' into papers100m-multinode

3ebfeec

Merge branch 'master' into papers100m-multinode

f9654b3

Merge branch 'master' into papers100m-multinode

531a313

cleaning comments

c05b13c

Merge branch 'master' into papers100m-multinode

aef0599

Merge branch 'master' into papers100m-multinode

b3b49b6

Merge branch 'master' into papers100m-multinode

5bb3d8b

puririshi98 added 5 commits October 18, 2023 10:26

Merge branch 'master' into papers100m-multinode

2df905c

Merge branch 'master' into papers100m-multinode

f0b732e

Merge branch 'master' into papers100m-multinode

e277f0f

Merge branch 'master' into papers100m-multinode

25292f2

Merge branch 'master' into papers100m-multinode

dbd92e4

puririshi98 enabled auto-merge (squash) October 23, 2023 20:53

puririshi98 disabled auto-merge October 23, 2023 20:53

puririshi98 and others added 4 commits October 23, 2023 15:08

Merge branch 'master' into papers100m-multinode

7c2b873

update

8700533

update

8b5dd6d

update

35084ab

rusty1s approved these changes Oct 24, 2023

View reviewed changes

rusty1s added 2 commits October 23, 2023 18:08

update

b9b03ee

update

1a1cf57

rusty1s merged commit 3854bcf into master Oct 24, 2023
14 checks passed

rusty1s deleted the papers100m-multinode branch October 24, 2023 01:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-node+multi-GPU `papers100m`+`GCN` example #8070

Multi-node+multi-GPU `papers100m`+`GCN` example #8070

puririshi98 commented Sep 22, 2023 •

edited

rusty1s left a comment

puririshi98 commented Sep 25, 2023 •

edited

akihironitta left a comment

puririshi98 commented Oct 17, 2023

rusty1s left a comment

Multi-node+multi-GPU papers100m+GCN example #8070

Multi-node+multi-GPU papers100m+GCN example #8070

Conversation

puririshi98 commented Sep 22, 2023 • edited

rusty1s left a comment

Choose a reason for hiding this comment

puririshi98 commented Sep 25, 2023 • edited

akihironitta left a comment

Choose a reason for hiding this comment

puririshi98 commented Oct 17, 2023

rusty1s left a comment

Choose a reason for hiding this comment

Multi-node+multi-GPU `papers100m`+`GCN` example #8070

Multi-node+multi-GPU `papers100m`+`GCN` example #8070

puririshi98 commented Sep 22, 2023 •

edited

puririshi98 commented Sep 25, 2023 •

edited