Skip to content

Conversation

EikanWang
Copy link
Collaborator

@EikanWang EikanWang commented Mar 7, 2024

This PR is a follow-up of RFC #115545.

In this PR, we intend to provide a registration API dedicated to eager-through-torch.compile. The major workflow of this API will be as follows.

  • Load cache
  • Check cache according to the input tensors
    • Cache Hit: Run the cached kernel directly
    • Cache Miss: Run the AOTI to produce kernel and run the produced kernel. If AOTI fails to produce the kernel, invoke the python fallback function.

Currently, this PR always fallback to python kernel now and cache mechanism will be implemented in another PR - #116368

Stack from ghstack (oldest at bottom):

cc @voznesenskym @penguinwu @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler @amjames @desertfire @chauhang

Differential Revision: D57164385

Copy link

pytorch-bot bot commented Mar 7, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/121387

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (10 Unrelated Failures)

As of commit 56b60d7 with merge base 8cad88e (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

This PR is a follow-up of RFC #115545.

In this PR, we are trying to provide a registration mode to implement a single aten operation on the top of `torch.compile` and then register to aten. 




[ghstack-poisoned]
EikanWang added a commit that referenced this pull request Mar 7, 2024
ghstack-source-id: ea1bcf0
Pull Request resolved: #121387
This PR is a follow-up of RFC #115545.

In this PR, we are trying to provide a registration mode to implement a single aten operation on the top of `torch.compile` and then register to aten. 




[ghstack-poisoned]
This PR is a follow-up of RFC #115545.

In this PR, we are trying to provide a registration mode to implement a single aten operation on the top of `torch.compile` and then register to aten. 




[ghstack-poisoned]
This PR is a follow-up of RFC #115545.

In this PR, we are trying to provide a registration mode to implement a single aten operation on the top of `torch.compile` and then register to aten. 




[ghstack-poisoned]
This PR is a follow-up of RFC #115545.

In this PR, we are trying to provide a registration mode to implement a single aten operation on the top of `torch.compile` and then register to aten. 




[ghstack-poisoned]
This PR is a follow-up of RFC #115545.

In this PR, we are trying to provide a registration mode to implement a single aten operation on the top of `torch.compile` and then register to aten. 




[ghstack-poisoned]
This PR is a follow-up of RFC #115545.

In this PR, we are trying to provide a registration mode to implement a single aten operation on the top of `torch.compile` and then register to aten. 




[ghstack-poisoned]
This PR is a follow-up of RFC #115545.

In this PR, we are trying to provide a registration mode to implement a single aten operation on the top of `torch.compile` and then register to aten. 




[ghstack-poisoned]
This PR is a follow-up of RFC #115545.

In this PR, we are trying to provide a registration mode to implement a single aten operation on the top of `torch.compile` and then register to aten. 




[ghstack-poisoned]
This PR is a follow-up of RFC #115545.

In this PR, we are trying to provide a registration mode to implement a single aten operation on the top of `torch.compile` and then register to aten. 




[ghstack-poisoned]
This PR is a follow-up of RFC #115545.

In this PR, we are trying to provide a registration mode to implement a single aten operation on the top of `torch.compile` and then register to aten. 




[ghstack-poisoned]
This PR is a follow-up of RFC #115545.

In this PR, we are trying to provide a registration mode to implement a single aten operation on the top of `torch.compile` and then register to aten. 




[ghstack-poisoned]
This PR is a follow-up of RFC #115545.

In this PR, we are trying to provide a registration mode to implement a single aten operation on the top of `torch.compile` and then register to aten. 




cc voznesenskym penguinwu jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
This PR is a follow-up of RFC #115545.

In this PR, we are trying to provide a registration mode to implement a single aten operation on the top of `torch.compile` and then register to aten. 




cc voznesenskym penguinwu jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
This PR is a follow-up of RFC #115545.

In this PR, we are trying to provide a registration mode to implement a single aten operation on the top of `torch.compile` and then register to aten. 




cc voznesenskym penguinwu jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
This PR is a follow-up of RFC #115545.

In this PR, we are trying to provide a registration mode to implement a single aten operation on the top of `torch.compile` and then register to aten. 




cc voznesenskym penguinwu jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
This PR is a follow-up of RFC #115545.

In this PR, we are trying to provide a registration mode to implement a single aten operation on the top of `torch.compile` and then register to aten. 




cc voznesenskym penguinwu jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
@EikanWang EikanWang changed the title [WIP] Add registration API for torch.compile-eager Add registration API for torch.compile-eager Mar 21, 2024
@kit1980
Copy link
Contributor

kit1980 commented Apr 30, 2024

I need to revert this as it's failing internally

ERROR: test_torch_compile_override_registration_cuda (caffe2.test.inductor.test_torchinductor.GPUTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/dev/shm/uid-30083/32adf917-seed-nspid4026541534_cgpid12437421-ns-4026541785/caffe2/test/inductor/test_torchinductor.py", line 9819, in new_test
    return value(self)
  File "/dev/shm/uid-30083/32adf917-seed-nspid4026541534_cgpid12437421-ns-4026541785/caffe2/test/inductor/test_torchinductor.py", line 808, in test_torch_compile_override_registration
    res_array.append(getattr(torch, unary_op_name)(x))
RuntimeError: Error in dlopen: /re_tmp/tmpxk4fyq1k/cw74bdbcowopv7pivuo7czemlte5oieduln75gcpslbqehcdy7q2/cstf4iodqtmpabtysxkoezwgtt5d5nv6ooe6q4vaztopatoszyl7.so: undefined symbol: aoti_torch_device_type_cpu

I think the new file needs to be added to the buck dependencies.
@zou3519 @jansel see D56736862 if you want to help to re-land this.

@kit1980
Copy link
Contributor

kit1980 commented Apr 30, 2024

@pytorchbot revert -m "breaking internal builds" -c ghfirst

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@pytorchmergebot
Copy link
Collaborator

@EikanWang your PR has been successfully reverted.

pytorchmergebot added a commit that referenced this pull request Apr 30, 2024
This reverts commit 61e937f.

Reverted #121387 on behalf of https://github.com/kit1980 due to breaking internal builds ([comment](#121387 (comment)))
pytorch-bot bot pushed a commit that referenced this pull request May 3, 2024
This PR is a follow-up of RFC #115545.

In this PR, we intend to provide a registration API dedicated to eager-through-torch.compile. The major workflow of this API will be as follows.

- Load cache
- Check cache according to the input tensors
  - Cache Hit: Run the cached kernel directly
  - Cache Miss: Run the AOTI to produce kernel and run the produced kernel. If AOTI fails to produce the kernel, invoke the python fallback function.

Currently, this PR always fallback to python kernel now and cache mechanism will be implemented in another PR - #116368

Pull Request resolved: #121387
Approved by: https://github.com/desertfire, https://github.com/jansel, https://github.com/zou3519, https://github.com/jgong5
petrex pushed a commit to petrex/pytorch that referenced this pull request May 3, 2024
fathnd pushed a commit to fathnd/homomorphic that referenced this pull request May 5, 2024
@EikanWang EikanWang closed this May 6, 2024
@EikanWang EikanWang reopened this May 6, 2024
This PR is a follow-up of RFC #115545.

In this PR, we intend to provide a registration API dedicated to eager-through-torch.compile. The major workflow of this API will be as follows.

- Load cache
- Check cache according to the input tensors
  - Cache Hit: Run the cached kernel directly
  - Cache Miss: Run the AOTI to produce kernel and run the produced kernel. If AOTI fails to produce the kernel, invoke the python fallback function.

Currently, this PR always fallback to python kernel now and cache mechanism will be implemented in another PR - #116368




cc voznesenskym penguinwu jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
This PR is a follow-up of RFC #115545.

In this PR, we intend to provide a registration API dedicated to eager-through-torch.compile. The major workflow of this API will be as follows.

- Load cache
- Check cache according to the input tensors
  - Cache Hit: Run the cached kernel directly
  - Cache Miss: Run the AOTI to produce kernel and run the produced kernel. If AOTI fails to produce the kernel, invoke the python fallback function.

Currently, this PR always fallback to python kernel now and cache mechanism will be implemented in another PR - #116368




cc voznesenskym penguinwu jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
@atalman
Copy link
Contributor

atalman commented May 9, 2024

@atalman has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@atalman
Copy link
Contributor

atalman commented May 10, 2024

@pytorchmergebot merge -f "Already landed in fbcode"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

10 participants