[RFC] Add support for device extension autoloading #127074

shink · 2024-05-24T09:31:19Z

Load device extensions at the end of torch/__init__.py
Enabled by default, or you can disable it with TORCH_DEVICE_BACKEND_AUTOLOAD=0

run test:

python test/run_test.py -i test_autoload_enable
python test/run_test.py -i test_autoload_disable

doc:

https://docs-preview.pytorch.org/pytorch/pytorch/127074/miscellaneous_environment_variables.html

co-author: @jgong5 @bsochack @bkowalskiINTEL @jczaja @FFFrog @hipudding

cc @albanD

pytorch-bot · 2024-05-24T09:31:22Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/127074

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 45fd7ad with merge base 6b5fbc5 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jgong5 · 2024-05-24T09:47:01Z

Thanks for the pull request. Could you please add some tests?

jgong5 · 2024-05-24T09:49:11Z

cc @bsochack

shink · 2024-05-24T09:56:24Z

Thanks for the pull request. Could you please add some tests?

Sure, I'm willing to do this!

bsochack · 2024-05-27T14:45:58Z

@shink we pushed a draft PR of a change here: #127228
A few changes (as in review) + unit tests are still required.

jczaja · 2024-05-27T14:53:28Z

@shink , @jgong5 This is a nice work that directly reflect proposal (RFC). We have tried to test it against our device_extension and there was a crash :

Problem seems to be that torch._dynamo is having a code checking all symbols exported from torch/init.py .
So to avoid that it would be good to hide changes proposed here inside a function. This issue may be observed with extensions that are directly importing torch._dynamo . Here is other PR implementing mentioned RFC that works for us: #127228 .

shink · 2024-05-28T01:23:06Z

@bsochack @jczaja Nice work! Thanks so much!

@shink , @jgong5 This is a nice work that directly reflect proposal (RFC). We have tried to test it against our device_extension and there was a crash :

Problem seems to be that torch._dynamo is having a code checking all symbols exported from torch/init.py . So to avoid that it would be good to hide changes proposed here inside a function. This issue may be observed with extensions that are directly importing torch._dynamo . Here is other PR implementing mentioned RFC that works for us: #127228 .

Yes, we alse met this crash.

shink · 2024-05-31T08:47:52Z

@jgong5 @bsochack @bkowalskiINTEL @FFFrog @hipudding Please have a look at this unit test. See b0e08d3

shink · 2024-05-31T08:52:52Z

test/autoload/device_backend/device_backend.egg-info/entry_points.txt

@@ -0,0 +1,2 @@
+[torch.backends]
+device_backend = backend_pkg:autoload


in fact, this file should be ignored, and the reason it is here is to declare the entry point of the backend package

FFFrog · 2024-05-31T09:50:02Z

torch/__init__.py

+        try:
+            # just load the plugin without calling
+            backend.load()
+        except Exception:


I think it would be better to show failure information when an exception is encountered.

FFFrog · 2024-05-31T09:51:05Z

torch/__init__.py

+
+
+def is_device_backend_autoload_enabled() -> bool:
+    var = os.getenv("TORCH_DISABLE_DEVICE_BACKEND_AUTOLOAD")


getenv with default value shoule be better.

FFFrog · 2024-05-31T09:52:48Z

test/autoload/test_autoload.py

+
+class TestAutoload(TestCase):
+    def test_load_plugins(self):
+        device_backend_path = os.path.abspath(


It would be better to think of this path operation as context.

fxied, please take a look again

bsochack · 2024-05-31T10:14:03Z

@jgong5 @bsochack @bkowalskiINTEL @FFFrog @hipudding Please have a look at this unit test. See b0e08d3

Please add @bkowalskiINTEL and @jczaja as co-authors. Lets also continue all addressing code review here.

shink · 2024-05-31T10:41:40Z

@jgong5 @bsochack @bkowalskiINTEL @FFFrog @hipudding Please have a look at this unit test. See b0e08d3

Please add @bkowalskiINTEL and @jczaja as co-authors. Lets also continue all addressing code review here.

Sure, thanks so much, and any suggestions are welcome.

jgong5 · 2024-06-03T02:45:57Z

test/autoload/test_autoload.py

+    def test_autoload(self):
+        # after importing the extension, the value of this environment variable should be true
+        torch.import_device_backends()
+        value = os.getenv("IS_CUSTOM_DEVICE_BACKEND_IMPORTED", "false")


Besides checking the environment variable, are you going to add test to autoload an extension (maybe a mock one)?

Thanks for your review. Please have a look at the code I just updated.

torch/__init__.py

albanD · 2024-06-03T18:35:52Z

test/autoload/device_backend/README.md

+```bash
+test/autoload/device_backend
+├── README.md
+├── backend_pkg               # The backend package


Would it be possible to reuse the existing modules we already have for testing in test/cpp_extension/setup.py to test this without having to create a brand new extension?

Thank you so much for your review. I'm testing this idea now.

Could you please have a look at this commit 1e7cf76

albanD · 2024-06-03T18:37:04Z

test/autoload/test_autoload.py

+        # Test the function defined in backend_pkg/__init__.py
+        import backend_pkg
+        self.assertTrue(hasattr(backend_pkg, "apply_patch"))
+        self.assertEqual(backend_pkg.apply_patch(), "success")


I'm not sure what this is testing?

it's testing a function defined in the backend extension

jgong5 · 2024-06-07T05:12:28Z

@shink Seems there is still lint errors. You may repro with "lintrunner -a" locally.

jgong5 · 2024-06-07T05:23:40Z

test/cpp_extensions/torch_test_cpp_extension/__init__.py

+# When importing this package, set this environment variable to true
+os.environ["IS_CUSTOM_DEVICE_BACKEND_IMPORTED"] = "true"
+
+
+def _autoload():
+    # Do nothing in this entry point
+    pass


Should we do things inside this entrypoint instead to make sure it is called?

jgong5 · 2024-06-07T05:26:28Z

test/test_cpp_extensions_aot.py

@@ -367,5 +367,13 @@ def f(a: bool, b: bool):
        self.assertIn("torch_library::logical_and", str(s.graph))


+class TestDeviceBackendAutoload(common.TestCase):
+    def test_autoload(self):


Not sure if I understand it correctly but when the cpp extension is explicitly imported from this test file, not autoloaded implicitly. Is that correct?

The reason for putting this test case here is that I want to reuse the existing modules in test/cpp_extension/setup.py.

The extension is installed here:

pytorch/test/run_test.py

Lines 667 to 671 in 7efaeb1

# Build the test cpp extensions modules

shell_env = os.environ.copy()

shell_env["USE_NINJA"] = str(1 if use_ninja else 0)

cmd = [sys.executable, "setup.py", "install", "--root", "./install"]

return_code = shell(cmd, cwd=cpp_extensions_test_dir, env=shell_env)

albanD

Small things only!

albanD · 2024-06-20T18:52:59Z

test/test_autoload.py

+
+class TestDeviceBackendAutoload(TestCase):
+    def test_autoload(self):
+        switch = os.getenv("TORCH_DEVICE_BACKEND_AUTOLOAD", None)


Suggested change

switch = os.getenv("TORCH_DEVICE_BACKEND_AUTOLOAD", None)

switch = os.getenv("TORCH_DEVICE_BACKEND_AUTOLOAD", False)

If you do this, you won't need the if switch below right?

Thanks for your review, I will test this change.

albanD · 2024-06-20T18:56:16Z

test/run_test.py

+    return _test_autoload(test_directory, options, enable=False)
+
+
+def _test_autoload(test_directory, options, enable=True):


FYI @clee2000 for change to run_test.py

albanD · 2024-06-20T18:58:11Z

torch/__init__.py

+    """
+    # enabled by default
+    is_enable = os.getenv("TORCH_DEVICE_BACKEND_AUTOLOAD", "1")
+    return is_enable.strip().lower() in {"1", "true", "yes", "on", "y"}


nit: we usually do "0" and "1" only for these for simplicity.

bsochack · 2024-06-25T18:48:18Z

@albanD is this PR ready to merge?

test/test_autoload.py

albanD · 2024-06-25T19:46:29Z

torch/__init__.py

+    """
+    Whether autoloading out-of-the-tree device extensions is enabled.
+    The switch depends on the value of the environment variable
+    `TORCH_DEVICE_BACKEND_AUTOLOAD`.


@drisspg which .rst file should we add this new env variable to for proper documentation?

The main env var RST is here:https://github.com/pytorch/pytorch/blob/main/docs/source/torch_environment_variables.rst

and they branch out to others

Co-authored-by: albanD <desmaison.alban@gmail.com>

bsochack · 2024-07-02T17:28:41Z

@albanD any comments to this PR?

albanD

Small nit on moving the doc to another file, but SGTM otherwise!

albanD · 2024-07-02T18:13:54Z

docs/source/torch_environment_variables.rst

@@ -26,3 +26,4 @@ If you find anything in this documentation that is missing, incorrect, or could
   miscellaneous_environment_variables
   logging
   torch_nccl_environment_variables
+   privateuse1_environment_variables


Could you please move this to the miscellaneous section? I do expect all users to be interested in this env variable, not just privateuse1 devs.

pytorch/docs/source/miscellaneous_environment_variables.rst

Line 13 in 34e94c5

- Under some conditions, autograd threads can hang on shutdown, therefore we do not wait for them to shutdown indefinitely but rely on timeout that is default set to ``10`` seconds. This environment variable can be used to set the timeout in seconds.

@albanD Sure, it has been moved. Please take a look again. Thanks so much for your patient review.

jgong5 · 2024-07-09T00:06:58Z

@pytorchbot merge

pytorchmergebot · 2024-07-09T00:09:07Z

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

jgong5 · 2024-07-09T00:33:22Z

@pytorchbot merge

pytorchmergebot · 2024-07-09T00:35:05Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

@jgong5

Fixes pytorch#122468 - Load device extensions at the end of `torch/__init__.py` - Enabled by default, or you can disable it with `TORCH_DEVICE_BACKEND_AUTOLOAD=0` run test: ```python python test/run_test.py -i test_autoload_enable python test/run_test.py -i test_autoload_disable ``` doc: https://docs-preview.pytorch.org/pytorch/pytorch/127074/miscellaneous_environment_variables.html co-author: @jgong5 @bsochack @bkowalskiINTEL @jczaja @FFFrog @hipudding Co-authored-by: albanD <desmaison.alban@gmail.com> Co-authored-by: Jiong Gong <jiong.gong@intel.com> Pull Request resolved: pytorch#127074 Approved by: https://github.com/albanD, https://github.com/jgong5

shink marked this pull request as draft May 24, 2024 09:35

pytorchbot added the open source label May 24, 2024

shink marked this pull request as ready for review May 31, 2024 08:44

shink commented May 31, 2024

View reviewed changes

shink mentioned this pull request May 31, 2024

[from RFC #122468] Implement autoload device extension mechanism #127386

Closed

FFFrog reviewed May 31, 2024

View reviewed changes

jgong5 reviewed Jun 3, 2024

View reviewed changes

cpuhrsch requested a review from albanD June 3, 2024 18:10

cpuhrsch added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 3, 2024

cpuhrsch requested a review from ezyang June 3, 2024 18:10

cpuhrsch added the module: python frontend For issues relating to PyTorch's Python frontend label Jun 3, 2024

albanD reviewed Jun 3, 2024

View reviewed changes

shink requested a review from albanD June 5, 2024 10:04

jgong5 reviewed Jun 7, 2024

View reviewed changes

shink requested a review from a team as a code owner June 7, 2024 12:25

albanD reviewed Jun 20, 2024

View reviewed changes

shink added 2 commits June 21, 2024 16:05

Merge branch 'main' into feat/autoload

ce61ed0

improve test code

6496997

albanD reviewed Jun 25, 2024

View reviewed changes

shink and others added 4 commits June 26, 2024 09:07

Update test/test_autoload.py [skip ci]

d9682dd

Co-authored-by: albanD <desmaison.alban@gmail.com>

remove redundant import

5ee36dd

docs: add env var rst

53e2498

docs: fix format

6ae2d01

shink requested a review from albanD June 28, 2024 01:20

albanD approved these changes Jul 2, 2024

View reviewed changes

docs: move the env to miscellaneous_environment_variables.rst

45fd7ad

jgong5 approved these changes Jul 9, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 9, 2024

pytorchmergebot added the merging label Jul 9, 2024

pytorchmergebot removed the merging label Jul 9, 2024

jgong5 added the release notes: python_frontend python frontend release notes category label Jul 9, 2024

pytorchmergebot added the merging label Jul 9, 2024

pytorchmergebot added the Merged label Jul 9, 2024

pytorchmergebot closed this in 312652c Jul 9, 2024

pytorchmergebot removed the merging label Jul 9, 2024

nicholasw-gc mentioned this pull request Jul 26, 2024

Allow device specific imports for inductor wrapper #131190

Open

shink mentioned this pull request Sep 5, 2024

Add autoloading tutorial pytorch/tutorials#3037

Open

		@@ -0,0 +1,2 @@
		[torch.backends]
		device_backend = backend_pkg:autoload



		def is_device_backend_autoload_enabled() -> bool:
		var = os.getenv("TORCH_DISABLE_DEVICE_BACKEND_AUTOLOAD")

	# Build the test cpp extensions modules
	shell_env = os.environ.copy()
	shell_env["USE_NINJA"] = str(1 if use_ninja else 0)
	cmd = [sys.executable, "setup.py", "install", "--root", "./install"]
	return_code = shell(cmd, cwd=cpp_extensions_test_dir, env=shell_env)

	switch = os.getenv("TORCH_DEVICE_BACKEND_AUTOLOAD", None)
	switch = os.getenv("TORCH_DEVICE_BACKEND_AUTOLOAD", False)

		return _test_autoload(test_directory, options, enable=False)


		def _test_autoload(test_directory, options, enable=True):

[RFC] Add support for device extension autoloading #127074

[RFC] Add support for device extension autoloading #127074

Conversation

shink commented May 24, 2024 • edited Loading

pytorch-bot bot commented May 24, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/127074

✅ No Failures

jgong5 commented May 24, 2024

jgong5 commented May 24, 2024

shink commented May 24, 2024

bsochack commented May 27, 2024

jczaja commented May 27, 2024 • edited Loading

shink commented May 28, 2024

shink commented May 31, 2024

shink May 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bsochack commented May 31, 2024

shink commented May 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jgong5 commented Jun 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shink Jun 7, 2024 • edited Loading

Choose a reason for hiding this comment

albanD left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bsochack commented Jun 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bsochack commented Jul 2, 2024

albanD left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jgong5 commented Jul 9, 2024

pytorchmergebot commented Jul 9, 2024

Merge failed

jgong5 commented Jul 9, 2024

pytorchmergebot commented Jul 9, 2024

Merge started

shink commented May 24, 2024 •

edited

Loading

pytorch-bot bot commented May 24, 2024 •

edited

Loading

jczaja commented May 27, 2024 •

edited

Loading

shink May 31, 2024 •

edited

Loading

shink commented May 31, 2024 •

edited

Loading

shink Jun 7, 2024 •

edited

Loading