[Inductor cutlass backend] Robust Precompilation / Autotuning / Retuning in subprocesses #115654

kadeng · 2023-12-12T18:34:37Z

Stack from ghstack (oldest at bottom):

Makes autotuning in subprocesses more robust, specifically against long running or crashing
functions being benchmarked, which could also completely corrupt the CUDA Context of the entire process.

This diff introduces changes to ensure that precompilation works well with autotuning in
subprocesses, and ensures that autotuning subprocesses have robust timeouts after which
they will be killed.

…ing in subprocesses Makes autotuning in subprocesses more robust, specifically against long running or crashing functions being benchmarked, which could also completely corrupt the CUDA Context of the entire process. This diff introduces changes to ensure that precompilation works well with autotuning in subprocesses, and ensures that autotuning subprocesses have robust timeouts after which they will be killed. [ghstack-poisoned]

pytorch-bot · 2023-12-12T18:34:40Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/115654

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 20 New Failures, 12 Unrelated Failures

As of commit f7c2cf2 with merge base afe6d27 ():

NEW FAILURES - The following jobs have failed:

Lint / lintrunner / linux-job (gh)
>>> Lint for torch/_inductor/scheduler.py:
pull / linux-docs / build-docs-python-false (gh)
Process completed with exit code 2.
pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 1, 5, linux.4xlarge.nvidia.gpu) (gh)
test_public_bindings.py::TestPublicBindings::test_modules_can_be_imported
pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 1, 5, linux.g5.4xlarge.nvidia.gpu) (gh)
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_bmm_multiple_dynamic_abi_compatible_cpu
pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 2, 5, linux.g5.4xlarge.nvidia.gpu) (gh)
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_addmm_multiple_dynamic_abi_compatible_cpu
pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 5, 5, linux.g5.4xlarge.nvidia.gpu) (gh)
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpuWithStackAllocationAndMinimalArrayRefInterface::test_addmm_multiple_dynamic_abi_compatible_cpu_with_stack_allocation_and_minimal_arrayref_interface
pull / linux-focal-py3.11-clang10 / test (crossref, 1, 2, linux.2xlarge) (gh)
test_public_bindings.py::TestPublicBindings::test_modules_can_be_imported
pull / linux-focal-py3.11-clang10 / test (default, 1, 3, linux.2xlarge) (gh)
test_public_bindings.py::TestPublicBindings::test_modules_can_be_imported
pull / linux-focal-py3.8-clang10 / test (crossref, 1, 2, linux.2xlarge) (gh)
RuntimeError: inductor/test_debug_trace 1/1 failed
pull / linux-focal-py3.8-clang10 / test (crossref, 2, 2, linux.2xlarge) (gh)
RuntimeError: inductor/test_torchinductor 1/1 failed
pull / linux-focal-py3.8-clang10 / test (default, 1, 3, linux.2xlarge) (gh)
RuntimeError: inductor/test_debug_trace 1/1 failed
pull / linux-focal-py3.8-clang10 / test (default, 2, 3, linux.2xlarge) (gh)
RuntimeError: inductor/test_max_autotune 1/1 failed
pull / linux-focal-py3.8-clang10 / test (default, 3, 3, linux.2xlarge) (gh)
RuntimeError: inductor/test_torchinductor_dynamic_shapes 2/2 failed
pull / linux-focal-py3.8-clang10 / test (dynamo, 1, 2, linux.2xlarge) (gh)
Process completed with exit code 1.
pull / linux-jammy-py3.10-clang15-asan / test (default, 1, 6, linux.4xlarge) (gh)
test_public_bindings.py::TestPublicBindings::test_modules_can_be_imported
pull / linux-jammy-py3.8-gcc11 / test (default, 1, 3, linux.2xlarge) (gh)
RuntimeError: inductor/test_debug_trace 1/1 failed
pull / linux-jammy-py3.8-gcc11 / test (default, 2, 3, linux.2xlarge) (gh)
RuntimeError: inductor/test_max_autotune 1/1 failed
pull / linux-jammy-py3.8-gcc11 / test (default, 3, 3, linux.2xlarge) (gh)
RuntimeError: inductor/test_torchinductor_dynamic_shapes 2/2 failed
pull / linux-jammy-py3.8-gcc11 / test (distributed, 1, 2, linux.2xlarge) (gh)
RuntimeError: distributed/test_inductor_collectives 1/1 failed
pull / linux-jammy-py3.8-gcc11 / test (distributed, 2, 2, linux.2xlarge) (gh)
RuntimeError: distributed/_spmd/test_transformation 1/1 failed

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (cpu_inductor_huggingface, 1, 1, linux.12xlarge) (gh)
YituTechConvBert
inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (cpu_inductor_timm, 1, 2, linux.12xlarge) (gh)
mixer_b16_224
inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (cpu_inductor_timm, 2, 2, linux.12xlarge) (gh)
xcit_large_24_p8_224
inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (cpu_inductor_torchbench, 1, 2, linux.12xlarge) (gh)
hf_T5_base
inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (cpu_inductor_torchbench, 2, 2, linux.12xlarge) (gh)
yolov3
inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (dynamic_cpu_inductor_huggingface, 1, 1, linux.12xlarge) (gh)
YituTechConvBert
inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (dynamic_cpu_inductor_timm, 1, 2, linux.12xlarge) (gh)
mixer_b16_224
inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (dynamic_cpu_inductor_timm, 2, 2, linux.12xlarge) (gh)
xcit_large_24_p8_224
inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (dynamic_cpu_inductor_torchbench, 1, 2, linux.12xlarge) (gh)
hf_T5_base
inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (dynamic_cpu_inductor_torchbench, 2, 2, linux.12xlarge) (gh)
yolov3
pull / linux-focal-py3_8-clang9-xla / test (xla, 1, 1, linux.12xlarge) (gh)
ModuleNotFoundError: No module named 'torch.version'
pull / linux-focal-py3.8-clang10 / test (dynamo, 2, 2, linux.2xlarge) (gh)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…ing / Retuning in subprocesses" Makes autotuning in subprocesses more robust, specifically against long running or crashing functions being benchmarked, which could also completely corrupt the CUDA Context of the entire process. This diff introduces changes to ensure that precompilation works well with autotuning in subprocesses, and ensures that autotuning subprocesses have robust timeouts after which they will be killed. [ghstack-poisoned]

kadeng · 2023-12-15T10:15:22Z

Moved to a (draft) feature branch, see #115919

kadeng mentioned this pull request Dec 12, 2023

[Inductor] Fix debug_str method of FusedSchedulerNode #113365

Closed

github-actions bot added module: inductor ciflow/inductor labels Dec 12, 2023

kadeng mentioned this pull request Dec 14, 2023

[Inductor cutlass backend] Fixed workspace resize issue #115877

Closed

kadeng closed this Dec 15, 2023

facebook-github-bot deleted the gh/kadeng/40/head branch January 14, 2024 15:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Inductor cutlass backend] Robust Precompilation / Autotuning / Retuning in subprocesses #115654

[Inductor cutlass backend] Robust Precompilation / Autotuning / Retuning in subprocesses #115654

kadeng commented Dec 12, 2023 •

edited

Loading

pytorch-bot bot commented Dec 12, 2023 •

edited

Loading

kadeng commented Dec 15, 2023

[Inductor cutlass backend] Robust Precompilation / Autotuning / Retuning in subprocesses #115654

[Inductor cutlass backend] Robust Precompilation / Autotuning / Retuning in subprocesses #115654

Conversation

kadeng commented Dec 12, 2023 • edited Loading

pytorch-bot bot commented Dec 12, 2023 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/115654

❌ 20 New Failures, 12 Unrelated Failures

kadeng commented Dec 15, 2023

kadeng commented Dec 12, 2023 •

edited

Loading

pytorch-bot bot commented Dec 12, 2023 •

edited

Loading