-
Notifications
You must be signed in to change notification settings - Fork 22.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Add support for device extension autoloading #127074
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/127074
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 45fd7ad with merge base 6b5fbc5 (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Thanks for the pull request. Could you please add some tests? |
cc @bsochack |
Sure, I'm willing to do this! |
@shink , @jgong5 This is a nice work that directly reflect proposal (RFC). We have tried to test it against our device_extension and there was a crash : Problem seems to be that torch._dynamo is having a code checking all symbols exported from torch/init.py . |
@bsochack @jczaja Nice work! Thanks so much!
Yes, we alse met this crash. |
@jgong5 @bsochack @bkowalskiINTEL @FFFrog @hipudding Please have a look at this unit test. See b0e08d3 |
@@ -0,0 +1,2 @@ | |||
[torch.backends] | |||
device_backend = backend_pkg:autoload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in fact, this file should be ignored, and the reason it is here is to declare the entry point of the backend package
torch/__init__.py
Outdated
try: | ||
# just load the plugin without calling | ||
backend.load() | ||
except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to show failure information when an exception is encountered.
torch/__init__.py
Outdated
|
||
|
||
def is_device_backend_autoload_enabled() -> bool: | ||
var = os.getenv("TORCH_DISABLE_DEVICE_BACKEND_AUTOLOAD") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getenv
with default value shoule be better.
test/autoload/test_autoload.py
Outdated
|
||
class TestAutoload(TestCase): | ||
def test_load_plugins(self): | ||
device_backend_path = os.path.abspath( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to think of this path operation as context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fxied, please take a look again
Please add @bkowalskiINTEL and @jczaja as co-authors. Lets also continue all addressing code review here. |
Sure, thanks so much, and any suggestions are welcome. |
test/autoload/test_autoload.py
Outdated
def test_autoload(self): | ||
# after importing the extension, the value of this environment variable should be true | ||
torch.import_device_backends() | ||
value = os.getenv("IS_CUSTOM_DEVICE_BACKEND_IMPORTED", "false") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides checking the environment variable, are you going to add test to autoload an extension (maybe a mock one)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review. Please have a look at the code I just updated.
```bash | ||
test/autoload/device_backend | ||
├── README.md | ||
├── backend_pkg # The backend package |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to reuse the existing modules we already have for testing in test/cpp_extension/setup.py
to test this without having to create a brand new extension?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for your review. I'm testing this idea now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please have a look at this commit 1e7cf76
test/autoload/test_autoload.py
Outdated
# Test the function defined in backend_pkg/__init__.py | ||
import backend_pkg | ||
self.assertTrue(hasattr(backend_pkg, "apply_patch")) | ||
self.assertEqual(backend_pkg.apply_patch(), "success") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this is testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's testing a function defined in the backend extension
@shink Seems there is still lint errors. You may repro with "lintrunner -a" locally. |
# When importing this package, set this environment variable to true | ||
os.environ["IS_CUSTOM_DEVICE_BACKEND_IMPORTED"] = "true" | ||
|
||
|
||
def _autoload(): | ||
# Do nothing in this entry point | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do things inside this entrypoint instead to make sure it is called?
test/test_cpp_extensions_aot.py
Outdated
@@ -367,5 +367,13 @@ def f(a: bool, b: bool): | |||
self.assertIn("torch_library::logical_and", str(s.graph)) | |||
|
|||
|
|||
class TestDeviceBackendAutoload(common.TestCase): | |||
def test_autoload(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I understand it correctly but when the cpp extension is explicitly imported from this test file, not autoloaded implicitly. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for putting this test case here is that I want to reuse the existing modules in test/cpp_extension/setup.py
.
The extension is installed here:
Lines 667 to 671 in 7efaeb1
# Build the test cpp extensions modules | |
shell_env = os.environ.copy() | |
shell_env["USE_NINJA"] = str(1 if use_ninja else 0) | |
cmd = [sys.executable, "setup.py", "install", "--root", "./install"] | |
return_code = shell(cmd, cwd=cpp_extensions_test_dir, env=shell_env) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small things only!
test/test_autoload.py
Outdated
|
||
class TestDeviceBackendAutoload(TestCase): | ||
def test_autoload(self): | ||
switch = os.getenv("TORCH_DEVICE_BACKEND_AUTOLOAD", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
switch = os.getenv("TORCH_DEVICE_BACKEND_AUTOLOAD", None) | |
switch = os.getenv("TORCH_DEVICE_BACKEND_AUTOLOAD", False) |
If you do this, you won't need the if switch
below right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review, I will test this change.
return _test_autoload(test_directory, options, enable=False) | ||
|
||
|
||
def _test_autoload(test_directory, options, enable=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @clee2000 for change to run_test.py
torch/__init__.py
Outdated
""" | ||
# enabled by default | ||
is_enable = os.getenv("TORCH_DEVICE_BACKEND_AUTOLOAD", "1") | ||
return is_enable.strip().lower() in {"1", "true", "yes", "on", "y"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we usually do "0" and "1" only for these for simplicity.
@albanD is this PR ready to merge? |
""" | ||
Whether autoloading out-of-the-tree device extensions is enabled. | ||
The switch depends on the value of the environment variable | ||
`TORCH_DEVICE_BACKEND_AUTOLOAD`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drisspg which .rst file should we add this new env variable to for proper documentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main env var RST is here:https://github.com/pytorch/pytorch/blob/main/docs/source/torch_environment_variables.rst
and they branch out to others
Co-authored-by: albanD <desmaison.alban@gmail.com>
@albanD any comments to this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit on moving the doc to another file, but SGTM otherwise!
@@ -26,3 +26,4 @@ If you find anything in this documentation that is missing, incorrect, or could | |||
miscellaneous_environment_variables | |||
logging | |||
torch_nccl_environment_variables | |||
privateuse1_environment_variables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please move this to the miscellaneous section? I do expect all users to be interested in this env variable, not just privateuse1 devs.
- Under some conditions, autograd threads can hang on shutdown, therefore we do not wait for them to shutdown indefinitely but rely on timeout that is default set to ``10`` seconds. This environment variable can be used to set the timeout in seconds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@albanD Sure, it has been moved. Please take a look again. Thanks so much for your patient review.
@pytorchbot merge |
Merge failedReason: This PR needs a If not, please add the To add a label, you can comment to pytorchbot, for example For more information, see Details for Dev Infra teamRaised by workflow job |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Fixes pytorch#122468 - Load device extensions at the end of `torch/__init__.py` - Enabled by default, or you can disable it with `TORCH_DEVICE_BACKEND_AUTOLOAD=0` run test: ```python python test/run_test.py -i test_autoload_enable python test/run_test.py -i test_autoload_disable ``` doc: https://docs-preview.pytorch.org/pytorch/pytorch/127074/miscellaneous_environment_variables.html co-author: @jgong5 @bsochack @bkowalskiINTEL @jczaja @FFFrog @hipudding Co-authored-by: albanD <desmaison.alban@gmail.com> Co-authored-by: Jiong Gong <jiong.gong@intel.com> Pull Request resolved: pytorch#127074 Approved by: https://github.com/albanD, https://github.com/jgong5
Fixes #122468
torch/__init__.py
TORCH_DEVICE_BACKEND_AUTOLOAD=0
run test:
doc:
https://docs-preview.pytorch.org/pytorch/pytorch/127074/miscellaneous_environment_variables.html
co-author: @jgong5 @bsochack @bkowalskiINTEL @jczaja @FFFrog @hipudding
cc @albanD