[inductor] Slightly faster memory allocation on CPU #118171

jansel · 2024-01-24T04:59:46Z

Stack from ghstack (oldest at bottom):

Based on python benchmarks/dynamo/microbenchmarks/overheads.py:

Before 12.2us
After 10.5us

This is inspired by a2c17a2 -- but in Python rather than C++

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler

[ghstack-poisoned]

pytorch-bot · 2024-01-24T04:59:49Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/118171

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 8f31079 with merge base d59c2d6 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 22c9c87d75ed07ed3288ffa9d908d281afb7e10c Pull Request resolved: #118171

jgong5

Do we have test to cover the allocation of >8D tensors?

jgong5 · 2024-01-24T09:59:10Z

torch/csrc/dynamo/guards.cpp

+
+static PyObject* _empty_strided_cpu(PyObject* dummy, PyObject* args) {
+  // at::empty_strided is surprising slow.  This is a lower-overhead
+  // version that saves ~2us on every allocation.  Though it is


I guess the extra overhead comes from the ATen dispatching which should apply to GPU too but GPU can overlap the device and host compute.

torch/csrc/dynamo/guards.cpp

torch/_inductor/codegen/wrapper.py

torch/csrc/dynamo/guards.cpp

Based on `python benchmarks/dynamo/microbenchmarks/overheads.py`: - Before `12.2us` - After `10.7us` This is inspired by a2c17a2 -- but in Python rather than C++ [ghstack-poisoned]

ghstack-source-id: 6b6f39e6a4d82437dce4d04b7a22e4ac6395fbeb Pull Request resolved: #118171

torch/csrc/dynamo/guards.cpp

Based on `python benchmarks/dynamo/microbenchmarks/overheads.py`: - Before `12.2us` - After `10.5us` This is inspired by a2c17a2 -- but in Python rather than C++ [ghstack-poisoned]

jansel · 2024-01-25T05:42:07Z

@pytorchbot merge

pytorchmergebot · 2024-01-25T05:44:42Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-01-25T05:53:04Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / macos-12-py3-arm64 / build

Details for Dev Infra team

Raised by workflow job

Based on `python benchmarks/dynamo/microbenchmarks/overheads.py`: - Before `12.2us` - After `10.5us` This is inspired by a2c17a2 -- but in Python rather than C++ cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]

jansel · 2024-01-25T16:52:29Z

@pytorchbot merge

pytorchmergebot · 2024-01-25T16:54:32Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Pull Request resolved: #118255 Approved by: https://github.com/peterbell10 ghstack dependencies: #118065, #118070, #118171

[inductor] Slightly faster memory allocation on CPU

bd78601

[ghstack-poisoned]

This was referenced Jan 24, 2024

[dynamo] Optimize BACKEND_MATCH guard #118065

Closed

[dynamo] Optimize overheads from _TorchDynamoContext #118070

Closed

jansel added a commit that referenced this pull request Jan 24, 2024

[inductor] Slightly faster memory allocation on CPU

08b7133

ghstack-source-id: 22c9c87d75ed07ed3288ffa9d908d281afb7e10c Pull Request resolved: #118171

github-actions bot added module: inductor module: dynamo ciflow/inductor labels Jan 24, 2024

jgong5 approved these changes Jan 24, 2024

View reviewed changes

peterbell10 reviewed Jan 24, 2024

View reviewed changes

torch/csrc/dynamo/guards.cpp Outdated Show resolved Hide resolved

torch/_inductor/codegen/wrapper.py Outdated Show resolved Hide resolved

torch/csrc/dynamo/guards.cpp Outdated Show resolved Hide resolved

Update on "[inductor] Slightly faster memory allocation on CPU"

2a1c138

Based on `python benchmarks/dynamo/microbenchmarks/overheads.py`: - Before `12.2us` - After `10.7us` This is inspired by a2c17a2 -- but in Python rather than C++ [ghstack-poisoned]

jansel added a commit that referenced this pull request Jan 25, 2024

[inductor] Slightly faster memory allocation on CPU

9abdd1a

ghstack-source-id: 6b6f39e6a4d82437dce4d04b7a22e4ac6395fbeb Pull Request resolved: #118171

jansel mentioned this pull request Jan 25, 2024

[inductor] Slightly faster memory allocation on CUDA #118255

Closed

peterbell10 approved these changes Jan 25, 2024

View reviewed changes

torch/csrc/dynamo/guards.cpp Outdated Show resolved Hide resolved

Update on "[inductor] Slightly faster memory allocation on CPU"

2ee2bb4

Based on `python benchmarks/dynamo/microbenchmarks/overheads.py`: - Before `12.2us` - After `10.5us` This is inspired by a2c17a2 -- but in Python rather than C++ [ghstack-poisoned]

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 25, 2024

jansel added the topic: not user facing topic category label Jan 25, 2024

pytorchmergebot added the merging label Jan 25, 2024

pytorchmergebot removed the merging label Jan 25, 2024

pytorchmergebot added the merging label Jan 25, 2024

pytorchmergebot closed this in 817debe Jan 25, 2024

pytorchmergebot added Merged and removed merging labels Jan 25, 2024

pytorchmergebot pushed a commit that referenced this pull request Jan 25, 2024

[inductor] Slightly faster memory allocation on CUDA (#118255)

2de24c1

Pull Request resolved: #118255 Approved by: https://github.com/peterbell10 ghstack dependencies: #118065, #118070, #118171

facebook-github-bot deleted the gh/jansel/224/head branch January 29, 2024 15:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[inductor] Slightly faster memory allocation on CPU #118171

[inductor] Slightly faster memory allocation on CPU #118171

jansel commented Jan 24, 2024 •

edited by pytorch-bot bot

pytorch-bot bot commented Jan 24, 2024 •

edited

jgong5 left a comment

jgong5 Jan 24, 2024

jansel commented Jan 25, 2024

pytorchmergebot commented Jan 25, 2024

pytorchmergebot commented Jan 25, 2024

jansel commented Jan 25, 2024

pytorchmergebot commented Jan 25, 2024

[inductor] Slightly faster memory allocation on CPU #118171

[inductor] Slightly faster memory allocation on CPU #118171

Conversation

jansel commented Jan 24, 2024 • edited by pytorch-bot bot

pytorch-bot bot commented Jan 24, 2024 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/118171

✅ No Failures

jgong5 left a comment

Choose a reason for hiding this comment

jgong5 Jan 24, 2024

Choose a reason for hiding this comment

jansel commented Jan 25, 2024

pytorchmergebot commented Jan 25, 2024

Merge started

pytorchmergebot commented Jan 25, 2024

Merge failed

jansel commented Jan 25, 2024

pytorchmergebot commented Jan 25, 2024

Merge started

jansel commented Jan 24, 2024 •

edited by pytorch-bot bot

pytorch-bot bot commented Jan 24, 2024 •

edited