[compiled autograd] move inputs to cuda with non_blocking=True #129181

xmfan · 2024-06-20T23:15:25Z

Stack from ghstack (oldest at bottom):

non_blocking=True requires first pinning, which shouldn't be a problem given that they are cpu scalars

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang

[ghstack-poisoned]

pytorch-bot · 2024-06-20T23:15:27Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129181

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (8 Unrelated Failures)

As of commit b999cb8 with merge base 9d06e37 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / linux-jammy-py3.8-gcc11 / test (distributed, 1, 2, linux.2xlarge) (gh) (similar failure)
Process completed with exit code 1.

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

inductor / cuda12.1-py3.10-gcc9-sm86 / test (aot_inductor_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu, unstable) (gh) (#128903)
hf_BigBird
inductor / cuda12.1-py3.10-gcc9-sm86 / test (dynamic_inductor_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu, unstable) (gh) (#128902)
hf_BigBird
inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu, unstable) (gh) (#128901)
hf_BigBird
inductor / rocm6.1-py3.8-inductor / test (inductor, 1, 1, linux.rocm.gpu.2, unstable) (gh) (#128871)
The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.
inductor-periodic / cuda12.1-py3.10-gcc9-sm86-periodic-dynamo-benchmarks / test (aot_eager_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu, unstable) (gh) (#128929)
hf_BigBird
inductor-periodic / cuda12.1-py3.10-gcc9-sm86-periodic-dynamo-benchmarks / test (dynamic_aot_eager_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu, unstable) (gh) (#128931)
hf_BigBird
inductor-periodic / cuda12.1-py3.10-gcc9-sm86-periodic-dynamo-benchmarks / test (dynamo_eager_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu, unstable) (gh) (#128932)
hf_BigBird

This comment was automatically generated by Dr. CI and updates every 15 minutes.

eellison

looks good, one question

eellison · 2024-06-21T00:11:19Z

torch/_dynamo/compiled_autograd.py

                in_compiled_autograd_region = True
                for i in runtime_inputs_to_move:
-                    inputs[i] = inputs[i].cuda()
+                    inputs[i] = inputs[i].pin_memory().cuda(non_blocking=True)


Should we put a numel() limit on this ? The general guidance for pin memory is to avoid overallocating.

i guess it doesn't matter rn since every input in runtime_inputs_to_move would have numel=1

covered by test_compiled_autograd.py and test_standalone_compile.py Pull Request resolved: #129116 Approved by: https://github.com/jansel ghstack dependencies: #127960, #128905, #128982, #128987, #129181

[compiled autograd] move inputs to cuda with non_blocking=True

b999cb8

[ghstack-poisoned]

xmfan mentioned this pull request Jun 20, 2024

[compiled autograd] update benchmarks to use cli flags for fullgraph/dynamic #127960

Closed

xmfan mentioned this pull request Jun 20, 2024

[aot autograd] collect static parameter metadata when graphs fallback to inference #128905

Closed

pytorch-bot bot added ciflow/inductor module: dynamo labels Jun 20, 2024

This was referenced Jun 20, 2024

[compiled autograd] use in_compiled_autograd_region instead of compiled_autograd_enabled_count #128982

Closed

[compiled autograd] treat input params as static #128987

Closed

[compiled autograd] flatten runtime inputs with fast path #129116

Closed

xmfan marked this pull request as ready for review June 20, 2024 23:53

xmfan requested review from eellison and jansel June 20, 2024 23:53

eellison approved these changes Jun 21, 2024

View reviewed changes

jansel approved these changes Jun 21, 2024

View reviewed changes

xmfan added the topic: not user facing topic category label Jun 21, 2024

pytorchmergebot added the Merged label Jun 21, 2024

pytorchmergebot closed this in d97dfe9 Jun 21, 2024

github-actions bot deleted the gh/xmfan/69/head branch July 23, 2024 01:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[compiled autograd] move inputs to cuda with non_blocking=True #129181

[compiled autograd] move inputs to cuda with non_blocking=True #129181

Uh oh!

xmfan commented Jun 20, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jun 20, 2024 •

edited

Loading

Uh oh!

eellison left a comment

Uh oh!

eellison Jun 21, 2024

Uh oh!

xmfan Jun 21, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[compiled autograd] move inputs to cuda with non_blocking=True #129181

[compiled autograd] move inputs to cuda with non_blocking=True #129181

Uh oh!

Conversation

xmfan commented Jun 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129181

✅ You can merge normally! (8 Unrelated Failures)

Uh oh!

eellison left a comment

Choose a reason for hiding this comment

Uh oh!

eellison Jun 21, 2024

Choose a reason for hiding this comment

Uh oh!

xmfan Jun 21, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xmfan commented Jun 20, 2024 •

edited

Loading

pytorch-bot bot commented Jun 20, 2024 •

edited

Loading