-
Notifications
You must be signed in to change notification settings - Fork 880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
check port availability only in main deepspeed/torchrun launcher #2078
check port availability only in main deepspeed/torchrun launcher #2078
Conversation
This is great @Jingru, can we also do this for the |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
Sure thing. Already updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This looks great to me.
need_port_check = num_machines <= 1 or int(args.machine_rank) == 0 | ||
if need_port_check and is_port_in_use(main_process_port): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small nit though thinking on this more, can we add a comment here just mentioning how/why we should only check on the main node for this, in case others want to learn why as well :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @Jingru :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @Jingru :)
@muellerzr already add some comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from Zach's comment, LGTM, thanks.
add comments
comments added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Jingru for fixing this issue, left a comment.
What does this PR do?
Currently,
main_process_port
availability is checked on all the deepspeed launchers (for multi_node senario), but this check is only necessary for main launcher whose machine rank is 0.Who can review?