Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce the torchrun entrypoint #64049

Closed
wants to merge 1 commit into from
Closed

Conversation

cbalioglu
Copy link
Contributor

@cbalioglu cbalioglu commented Aug 26, 2021

This PR introduces a new torchrun entrypoint that simply "points" to python -m torch.distributed.run. It is shorter and less error-prone to type and gives a nicer syntax than a rather cryptic python -m ... command line. Along with the new entrypoint the documentation is also updated and places where torch.distributed.run are mentioned are replaced with torchrun.

cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @agolynski @SciPioneer @H-Huang @mrzzd @cbalioglu @gcramer23

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Aug 26, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 6e17981 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot
Copy link
Contributor

@kiukchung has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@kiukchung merged this pull request in 65e6194.

@kiukchung kiukchung modified the milestone: 1.9.1 Aug 30, 2021
@vadimkantorov
Copy link
Contributor

vadimkantorov commented Sep 20, 2021

One controversial thing about it is that it hides somewhat which Python distribution is used for running.

It may be good that it prints the path to used Python on the screen along with an explanation of how to change it if needed. E.g. for the case where python is aliased to system's python2 and the good one is python3.

It may also be good for the script to have some guards against old or unrelated pythons

Also, could not find source of torchrun in PyTorch sources. Is it exactly equivalent to python -m torch.distributed.run? If so, it should be clearly stated in the docs. Currently, it may leave an impression is that torchrun has some extended functionality / magic (esp. phrasing torchrun is superset of python -m torch.distributed.run). Which along with difficulty of finding the source code raises questions

@cbalioglu cbalioglu deleted the balioglu-torchrun branch October 8, 2021 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged oncall: distributed Add this issue/PR to distributed oncall triage queue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFC] Add torch.distributed.run as a console script in pytorch's setup.py
4 participants