New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyTorch Synthetic Benchmark #545
Conversation
cc8b87e
to
36618c9
Compare
# Horovod: broadcast parameters & optimizer state. | ||
hvd.broadcast_parameters(model.state_dict(), root_rank=0) | ||
# TODO: needs bugfix | ||
#hvd.broadcast_optimizer_state(optimizer, root_rank=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tgaddair, we should fix this bug before landing. optim.SGD
w/o momentum & weight decay cause the following issue with broadcast_optimizer_state
:
Traceback (most recent call last):
File "pytorch_synthetic_benchmark.py", line 64, in <module>
hvd.broadcast_optimizer_state(optimizer, root_rank=0)
File "/usr/local/lib/python2.7/site-packages/horovod/torch/__init__.py", line 213, in broadcast_optimizer_state
param_state = state_dict['state'][pid]
KeyError: 4592867280
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done #548.
# Horovod: broadcast parameters & optimizer state. | ||
hvd.broadcast_parameters(model.state_dict(), root_rank=0) | ||
# TODO: needs bugfix | ||
#hvd.broadcast_optimizer_state(optimizer, root_rank=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done #548.
@alsrgv I'm getting the following error when running the pytorch_synthetic_benchmark.py
I get the error in parallel as well. Same output k times...For example
Gives I've included the reference call below in /torch/init.py
|
Yes. I'll check it. Thanks. |
It worked. Thanks!! |
@alsrgv Awesome!!! So nice to have this. Again, Thanks!!! |
@alsrgv How does the code decide when to stop "warmup" and proceed with the test?? Just curious. Also, would switching to fp16 have any effect? |
@bapriddy, warmup runs for |
@alsrgv Is it possible to modify the pytorch_synthetic_benchmark.py for resnet18, resnet101, or other imagenet models? I did this with pytorch_imagenet_resnet50.py by changing line 114.
|
@bapriddy, yeah, you can just pass |
@alsrgv Got it! |
No description provided.