Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[inductor] optimize isa dry compile time. #124602

Closed
wants to merge 10 commits into from

Conversation

xuhancn
Copy link
Collaborator

@xuhancn xuhancn commented Apr 22, 2024

Fixes #100378
Original issue caused by startup dry compile need cost almost 1 second.

This PR add compiler version info, isa build options and pytorch version info to the test binary path hash.
So same compile, same isa and same pytorch can skip the dry compile.

Local test:
First time:
image
We need to compile all c++ modules and it cost 16.5s.

Second time:
image
We skipped dry compile due to the same isa fingerprint. It is only cost 0.36s.

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

@xuhancn xuhancn requested a review from EikanWang April 22, 2024 09:14
Copy link

pytorch-bot bot commented Apr 22, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/124602

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit fe29f70 with merge base 87079f5 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@xuhancn xuhancn marked this pull request as ready for review April 22, 2024 09:24
@xuhancn xuhancn requested a review from jgong5 April 22, 2024 09:24
@xuhancn
Copy link
Collaborator Author

xuhancn commented Apr 22, 2024

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased xu_optimize_isa_dry_compile onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout xu_optimize_isa_dry_compile && git pull --rebase)

@xuhancn xuhancn added the intel This tag is for PR from Intel label Apr 22, 2024
@colesbury colesbury added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 22, 2024
torch/_inductor/codecache.py Outdated Show resolved Hide resolved
@xuhancn xuhancn requested a review from ezyang April 24, 2024 07:13
@xuhancn xuhancn changed the title optimize isa dry compile time. [inductor] optimize isa dry compile time. Apr 24, 2024
@ezyang
Copy link
Contributor

ezyang commented Apr 24, 2024

I guess we can land this as is, but you're still doing multiple(!) subprocess calls, and I would hope that there was some way to architect this where those were not needed either.

@xuhancn
Copy link
Collaborator Author

xuhancn commented Apr 24, 2024

I guess we can land this as is, but you're still doing multiple(!) subprocess calls, and I would hope that there was some way to architect this where those were not needed either.

Here has some lru_cache to cache result. What optimization you still want to do?

@xuhancn
Copy link
Collaborator Author

xuhancn commented Apr 24, 2024

3e09156 @ezyang I add a cache to pick_vec_isa and it would reduce the callee function run.

@xuhancn xuhancn requested a review from malfet April 24, 2024 16:54
@xuhancn
Copy link
Collaborator Author

xuhancn commented Apr 24, 2024

@malfet Any comment to me for next step?

@xuhancn xuhancn added the intel priority matters to intel architecture from performance wise label Apr 24, 2024
@ezyang
Copy link
Contributor

ezyang commented Apr 24, 2024

lru_cache doesn't help, you need a cache that persists across processes.

@xuhancn
Copy link
Collaborator Author

xuhancn commented Apr 24, 2024

lru_cache doesn't help, you need a cache that persists across processes.

I guess you means mutiple process call pick_vec_isa to get isa for cpp build.
I think what I can do is that write the isa info to the temp file. But still a problem, we always need to get compiler version information to ensure the compiler's capability can cover the isa level. At the same time, get compile version still need to call subprocess. So I think we maybe not have potential to reduce call subprocess.

@xuhancn
Copy link
Collaborator Author

xuhancn commented Apr 25, 2024

@ezyang we can land this PR firstly.
If any other optimize idea let create new PR to land.

@ezyang
Copy link
Contributor

ezyang commented Apr 25, 2024

ci fails look real

@xuhancn
Copy link
Collaborator Author

xuhancn commented Apr 25, 2024

ci fails look real

Let me debug for it.

@xuhancn
Copy link
Collaborator Author

xuhancn commented Apr 25, 2024

ci fails look real

@ezyang the CI was failed as the reason that lru_cache for pick_vec_isa cache the value not match the test case expected result. After revert the related code, it works well.

@xuhancn
Copy link
Collaborator Author

xuhancn commented Apr 25, 2024

@pytorchmergebot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 25, 2024
@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

@xuhancn
Copy link
Collaborator Author

xuhancn commented Apr 25, 2024

@pytorchbot label "topic: not user facing"

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Apr 25, 2024
@xuhancn
Copy link
Collaborator Author

xuhancn commented Apr 25, 2024

@pytorchmergebot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@xuhancn xuhancn deleted the xu_optimize_isa_dry_compile branch April 26, 2024 01:23
carmocca pushed a commit to carmocca/pytorch that referenced this pull request Apr 29, 2024
Fixes pytorch#100378
Original issue caused by startup dry compile need cost almost 1 second.

This PR add compiler version info, isa build options and pytorch version info to the test binary path hash.
So same compile, same isa and same pytorch can skip the dry compile.

Local test:
First time:
<img width="1588" alt="image" src="https://github.com/pytorch/pytorch/assets/8433590/d0b83f5d-849e-4f37-9977-3b0276e5a5a5">
We need to compile all c++ modules and it cost 16.5s.

Second time:
<img width="1589" alt="image" src="https://github.com/pytorch/pytorch/assets/8433590/44f07fb0-5a15-4342-b0f6-dfe2c880b5d3">
We skipped dry compile due to the same isa fingerprint. It is only cost 0.36s.

Pull Request resolved: pytorch#124602
Approved by: https://github.com/jgong5, https://github.com/ezyang
andoorve pushed a commit to andoorve/pytorch that referenced this pull request May 1, 2024
Fixes pytorch#100378
Original issue caused by startup dry compile need cost almost 1 second.

This PR add compiler version info, isa build options and pytorch version info to the test binary path hash.
So same compile, same isa and same pytorch can skip the dry compile.

Local test:
First time:
<img width="1588" alt="image" src="https://github.com/pytorch/pytorch/assets/8433590/d0b83f5d-849e-4f37-9977-3b0276e5a5a5">
We need to compile all c++ modules and it cost 16.5s.

Second time:
<img width="1589" alt="image" src="https://github.com/pytorch/pytorch/assets/8433590/44f07fb0-5a15-4342-b0f6-dfe2c880b5d3">
We skipped dry compile due to the same isa fingerprint. It is only cost 0.36s.

Pull Request resolved: pytorch#124602
Approved by: https://github.com/jgong5, https://github.com/ezyang
petrex pushed a commit to petrex/pytorch that referenced this pull request May 3, 2024
Fixes pytorch#100378
Original issue caused by startup dry compile need cost almost 1 second.

This PR add compiler version info, isa build options and pytorch version info to the test binary path hash.
So same compile, same isa and same pytorch can skip the dry compile.

Local test:
First time:
<img width="1588" alt="image" src="https://github.com/pytorch/pytorch/assets/8433590/d0b83f5d-849e-4f37-9977-3b0276e5a5a5">
We need to compile all c++ modules and it cost 16.5s.

Second time:
<img width="1589" alt="image" src="https://github.com/pytorch/pytorch/assets/8433590/44f07fb0-5a15-4342-b0f6-dfe2c880b5d3">
We skipped dry compile due to the same isa fingerprint. It is only cost 0.36s.

Pull Request resolved: pytorch#124602
Approved by: https://github.com/jgong5, https://github.com/ezyang
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request intel priority matters to intel architecture from performance wise intel This tag is for PR from Intel Merged module: inductor open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

VecISA.__bool__ is very expensive (nearly a second) on startup
6 participants