Add registration API for torch.compile-eager #121387

EikanWang · 2024-03-07T06:35:58Z

This PR is a follow-up of RFC #115545.

In this PR, we intend to provide a registration API dedicated to eager-through-torch.compile. The major workflow of this API will be as follows.

Load cache
Check cache according to the input tensors
- Cache Hit: Run the cached kernel directly
- Cache Miss: Run the AOTI to produce kernel and run the produced kernel. If AOTI fails to produce the kernel, invoke the python fallback function.

Currently, this PR always fallback to python kernel now and cache mechanism will be implemented in another PR - #116368

Stack from ghstack (oldest at bottom):

cc @voznesenskym @penguinwu @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler @amjames @desertfire @chauhang

Differential Revision: D57164385

[ghstack-poisoned]

pytorch-bot · 2024-03-07T06:36:01Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/121387

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (10 Unrelated Failures)

As of commit 56b60d7 with merge base 8cad88e ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 3, 5, linux.g5.4xlarge.nvidia.gpu) (gh) (similar failure)
inductor/test_codecache.py::TestUtils::test_fresh_inductor_cache
trunk / macos-12-py3-arm64 / test (default, 3, 3, macos-m1-stable) (gh) (similar failure)
inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_multilayer_var_dynamic_shapes_cpu

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 3, 5, linux.4xlarge.nvidia.gpu) (gh) (trunk failure)
inductor/test_codecache.py::TestUtils::test_fresh_inductor_cache
pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 1, 5, linux.g5.4xlarge.nvidia.gpu) (gh) (trunk failure)
inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_remote_caching_dynamic_False
pull / linux-focal-py3.11-clang10 / test (default, 1, 3, linux.2xlarge) (gh) (trunk failure)
inductor/test_codecache.py::TestUtils::test_fresh_inductor_cache
pull / linux-focal-py3.12-clang10 / test (default, 3, 3, linux.2xlarge) (gh) (trunk failure)
inductor/test_codecache.py::TestUtils::test_fresh_inductor_cache
pull / linux-focal-py3.8-clang10 / test (default, 2, 3, linux.2xlarge) (gh) (trunk failure)
inductor/test_codecache.py::TestUtils::test_fresh_inductor_cache
pull / linux-jammy-py3.8-gcc11 / test (default, 3, 3, linux.2xlarge) (gh) (trunk failure)
inductor/test_codecache.py::TestUtils::test_fresh_inductor_cache
trunk / macos-12-py3-arm64 / test (default, 2, 3, macos-m1-stable) (gh) (trunk failure)
inductor/test_torchinductor.py::CpuTests::test_AllenaiLongformerBase_repro_cpu
trunk / win-vs2019-cpu-py3 / test (default, 1, 3, windows.4xlarge.nonephemeral) (gh) (trunk failure)
[ FAILED ] LoggingTest.ExceptionWhat

This comment was automatically generated by Dr. CI and updates every 15 minutes.

This PR is a follow-up of RFC #115545. In this PR, we are trying to provide a registration mode to implement a single aten operation on the top of `torch.compile` and then register to aten. [ghstack-poisoned]

ghstack-source-id: ea1bcf0 Pull Request resolved: #121387

This PR is a follow-up of RFC #115545. In this PR, we are trying to provide a registration mode to implement a single aten operation on the top of `torch.compile` and then register to aten. [ghstack-poisoned]

This PR is a follow-up of RFC #115545. In this PR, we are trying to provide a registration mode to implement a single aten operation on the top of `torch.compile` and then register to aten. cc voznesenskym penguinwu jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

kit1980 · 2024-04-30T22:09:33Z

I need to revert this as it's failing internally

ERROR: test_torch_compile_override_registration_cuda (caffe2.test.inductor.test_torchinductor.GPUTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/dev/shm/uid-30083/32adf917-seed-nspid4026541534_cgpid12437421-ns-4026541785/caffe2/test/inductor/test_torchinductor.py", line 9819, in new_test
    return value(self)
  File "/dev/shm/uid-30083/32adf917-seed-nspid4026541534_cgpid12437421-ns-4026541785/caffe2/test/inductor/test_torchinductor.py", line 808, in test_torch_compile_override_registration
    res_array.append(getattr(torch, unary_op_name)(x))
RuntimeError: Error in dlopen: /re_tmp/tmpxk4fyq1k/cw74bdbcowopv7pivuo7czemlte5oieduln75gcpslbqehcdy7q2/cstf4iodqtmpabtysxkoezwgtt5d5nv6ooe6q4vaztopatoszyl7.so: undefined symbol: aoti_torch_device_type_cpu

I think the new file needs to be added to the buck dependencies.
@zou3519 @jansel see D56736862 if you want to help to re-land this.

kit1980 · 2024-04-30T22:11:00Z

@pytorchbot revert -m "breaking internal builds" -c ghfirst

pytorchmergebot · 2024-04-30T22:12:56Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot · 2024-04-30T22:13:08Z

@EikanWang your PR has been successfully reverted.

This reverts commit 61e937f. Reverted #121387 on behalf of https://github.com/kit1980 due to breaking internal builds ([comment](#121387 (comment)))

This PR is a follow-up of RFC #115545. In this PR, we intend to provide a registration API dedicated to eager-through-torch.compile. The major workflow of this API will be as follows. - Load cache - Check cache according to the input tensors - Cache Hit: Run the cached kernel directly - Cache Miss: Run the AOTI to produce kernel and run the produced kernel. If AOTI fails to produce the kernel, invoke the python fallback function. Currently, this PR always fallback to python kernel now and cache mechanism will be implemented in another PR - #116368 Pull Request resolved: #121387 Approved by: https://github.com/desertfire, https://github.com/jansel, https://github.com/zou3519, https://github.com/jgong5

This reverts commit 61e937f. Reverted pytorch#121387 on behalf of https://github.com/kit1980 due to breaking internal builds ([comment](pytorch#121387 (comment)))

ghstack-source-id: 0a1c186 Pull Request resolved: pytorch/pytorch#121387

This PR is a follow-up of RFC #115545. In this PR, we intend to provide a registration API dedicated to eager-through-torch.compile. The major workflow of this API will be as follows. - Load cache - Check cache according to the input tensors - Cache Hit: Run the cached kernel directly - Cache Miss: Run the AOTI to produce kernel and run the produced kernel. If AOTI fails to produce the kernel, invoke the python fallback function. Currently, this PR always fallback to python kernel now and cache mechanism will be implemented in another PR - #116368 cc voznesenskym penguinwu jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

atalman · 2024-05-09T16:04:23Z

@atalman has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

atalman · 2024-05-10T00:28:28Z

@pytorchmergebot merge -f "Already landed in fbcode"

pytorchmergebot · 2024-05-10T00:30:15Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Add registration API for torch.compile-eager

23a5efb

[ghstack-poisoned]

EikanWang marked this pull request as draft March 7, 2024 06:36

EikanWang changed the title ~~Add registration API for torch.compile-eager~~ [WIP] Add registration API for torch.compile-eager Mar 7, 2024

pytorchbot added the open source label Mar 7, 2024

Update on "[WIP] Add registration API for torch.compile-eager"

b590908

This PR is a follow-up of RFC #115545. In this PR, we are trying to provide a registration mode to implement a single aten operation on the top of `torch.compile` and then register to aten. [ghstack-poisoned]

EikanWang added a commit that referenced this pull request Mar 7, 2024

Add registration API for torch.compile-eager

80d4fae

ghstack-source-id: ea1bcf0 Pull Request resolved: #121387

EikanWang added 4 commits March 7, 2024 07:33

Update on "[WIP] Add registration API for torch.compile-eager"

59d3a14

This PR is a follow-up of RFC #115545. In this PR, we are trying to provide a registration mode to implement a single aten operation on the top of `torch.compile` and then register to aten. [ghstack-poisoned]

Update on "[WIP] Add registration API for torch.compile-eager"

395b805

This PR is a follow-up of RFC #115545. In this PR, we are trying to provide a registration mode to implement a single aten operation on the top of `torch.compile` and then register to aten. [ghstack-poisoned]

Update on "[WIP] Add registration API for torch.compile-eager"

b253ec7

This PR is a follow-up of RFC #115545. In this PR, we are trying to provide a registration mode to implement a single aten operation on the top of `torch.compile` and then register to aten. [ghstack-poisoned]

Update on "[WIP] Add registration API for torch.compile-eager"

7ff5446

This PR is a follow-up of RFC #115545. In this PR, we are trying to provide a registration mode to implement a single aten operation on the top of `torch.compile` and then register to aten. [ghstack-poisoned]

EikanWang mentioned this pull request Mar 12, 2024

[WIP] Separate compuation kernel and wrapper #121727

Closed

EikanWang added 7 commits March 12, 2024 14:56

Update on "[WIP] Add registration API for torch.compile-eager"

9eca982

This PR is a follow-up of RFC #115545. In this PR, we are trying to provide a registration mode to implement a single aten operation on the top of `torch.compile` and then register to aten. [ghstack-poisoned]

Update on "[WIP] Add registration API for torch.compile-eager"

f3d4aa8

This PR is a follow-up of RFC #115545. In this PR, we are trying to provide a registration mode to implement a single aten operation on the top of `torch.compile` and then register to aten. [ghstack-poisoned]

Update on "[WIP] Add registration API for torch.compile-eager"

9898682

This PR is a follow-up of RFC #115545. In this PR, we are trying to provide a registration mode to implement a single aten operation on the top of `torch.compile` and then register to aten. [ghstack-poisoned]

Update on "[WIP] Add registration API for torch.compile-eager"

65794c2

This PR is a follow-up of RFC #115545. In this PR, we are trying to provide a registration mode to implement a single aten operation on the top of `torch.compile` and then register to aten. [ghstack-poisoned]

Update on "[WIP] Add registration API for torch.compile-eager"

eafc5d0

This PR is a follow-up of RFC #115545. In this PR, we are trying to provide a registration mode to implement a single aten operation on the top of `torch.compile` and then register to aten. [ghstack-poisoned]

Update on "[WIP] Add registration API for torch.compile-eager"

2b022cf

This PR is a follow-up of RFC #115545. In this PR, we are trying to provide a registration mode to implement a single aten operation on the top of `torch.compile` and then register to aten. [ghstack-poisoned]

Update on "[WIP] Add registration API for torch.compile-eager"

aa528bf

This PR is a follow-up of RFC #115545. In this PR, we are trying to provide a registration mode to implement a single aten operation on the top of `torch.compile` and then register to aten. [ghstack-poisoned]

pytorch-bot bot added the module: inductor label Mar 16, 2024

EikanWang added 5 commits March 16, 2024 13:43

EikanWang changed the title ~~[WIP] Add registration API for torch.compile-eager~~ Add registration API for torch.compile-eager Mar 21, 2024

pytorchmergebot closed this in 61e937f Apr 27, 2024

pytorchmergebot removed the merging label Apr 27, 2024

pytorchmergebot added the Reverted label Apr 30, 2024

pytorchmergebot reopened this Apr 30, 2024

fathnd pushed a commit to fathnd/homomorphic that referenced this pull request May 5, 2024

Add registration API for torch.compile-eager

7bad973

ghstack-source-id: 0a1c186 Pull Request resolved: pytorch/pytorch#121387

EikanWang closed this May 6, 2024

EikanWang reopened this May 6, 2024

EikanWang mentioned this pull request May 6, 2024

Modularize aten parameter parser and checker #125308

Closed

This was referenced May 9, 2024

Separate AOTI Eager utils as a single file #125819

Closed

[3/N] Non-Tensor: Support string parameter for aten operations #125831

Closed

pytorchmergebot added the merging label May 10, 2024

pytorchmergebot closed this in 978b572 May 10, 2024

pytorchmergebot removed the merging label May 10, 2024

EikanWang mentioned this pull request May 10, 2024

[4/N] Non-Tensor: Support layout, device and dtype for aten operations #125897

Closed

github-actions bot deleted the gh/EikanWang/42/head branch June 9, 2024 01:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add registration API for torch.compile-eager #121387

Add registration API for torch.compile-eager #121387

Uh oh!

EikanWang commented Mar 7, 2024 •

edited by atalman

Loading

Uh oh!

pytorch-bot bot commented Mar 7, 2024 •

edited

Loading

Uh oh!

kit1980 commented Apr 30, 2024

Uh oh!

kit1980 commented Apr 30, 2024

Uh oh!

pytorchmergebot commented Apr 30, 2024

Uh oh!

pytorchmergebot commented Apr 30, 2024

Uh oh!

atalman commented May 9, 2024

Uh oh!

atalman commented May 10, 2024

Uh oh!

pytorchmergebot commented May 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Add registration API for torch.compile-eager #121387

Add registration API for torch.compile-eager #121387

Uh oh!

Conversation

EikanWang commented Mar 7, 2024 • edited by atalman Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/121387

✅ You can merge normally! (10 Unrelated Failures)

Uh oh!

kit1980 commented Apr 30, 2024

Uh oh!

kit1980 commented Apr 30, 2024

Uh oh!

pytorchmergebot commented Apr 30, 2024

Uh oh!

pytorchmergebot commented Apr 30, 2024

Uh oh!

atalman commented May 9, 2024

Uh oh!

atalman commented May 10, 2024

Uh oh!

pytorchmergebot commented May 10, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

EikanWang commented Mar 7, 2024 •

edited by atalman

Loading

pytorch-bot bot commented Mar 7, 2024 •

edited

Loading