-
Couldn't load subscription status.
- Fork 25.7k
Up sample bilinear2d backward for AMD #165802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Up sample bilinear2d backward for AMD #165802
Conversation
(cherry picked from commit 3d102a0)
(cherry picked from commit cb98724)
…_rcpf(x) instead of 1.f/x (#1800) Cherry-pick of #1688 Co-authored-by: Michael Halkenhäuser <michaelhalk@web.de> Co-authored-by: Hashem Hashemi <hashem.hashemi@amd.com> (cherry picked from commit f8544af) (cherry picked from commit ed48754) (cherry picked from commit d62a39e) (cherry picked from commit b26ddb8)
Related to c7a1e32 Fixes https://ontrack-internal.amd.com/browse/SWDEV-537835 Not a Navi specific failure: ``` File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_device_type.py", line 1412, in only_fn return fn(slf, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/var/lib/jenkins/pytorch/test/test_binary_ufuncs.py", line 1671, in test_cuda_tensor_pow_scalar_tensor self._test_pow(base, exp) File "/var/lib/jenkins/pytorch/test/test_binary_ufuncs.py", line 1482, in _test_pow self.assertEqual(actual, expected) File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 4052, in assertEqual raise error_metas.pop()[0].to_error( AssertionError: The values for attribute 'dtype' do not match: torch.float32 != torch.float64. ``` Using .to(actual) without specifying dtype/device assumes actual is a tensor or tensor-like, which may fail silently or promote. Fixed by explicitly matching dtype and device. Going from pytorch#107302 Fix: ``` root@ubb4-rack-22:/var/lib/jenkins/pytorch# TEST_CONFIG=default HIP_VISIBLE_DEVICES=0 PYTORCH_TEST_WITH_ROCM=1 python test/test_binary_ufuncs.py TestBinaryUfuncsCUDA.test_cuda_tensor_pow_scalar_tensor_cuda /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. import pkg_resources Running tests... ---------------------------------------------------------------------- . ---------------------------------------------------------------------- Ran 1 test in 0.141s OK Generating XML reports... root@ubb4-rack-22:/var/lib/jenkins/pytorch# pip list | grep numpy numpy 2.1.2 ``` (cherry picked from commit a4d60fa) (cherry picked from commit 9f11871)
This PR fixes the unit test,
test/test_cuda.py::TestCuda::test_set_per_process_memory_fraction FAILED
[0.1163s]
```
Traceback (most recent call last):
File "/var/lib/jenkins/pytorch/test/test_cuda.py", line 471, in test_set_per_process_memory_fraction
tmp_tensor = torch.empty(application, dtype=torch.int8, device="cuda")
RuntimeError: Trying to create tensor with negative dimension -5681285432: [-5681285432]
```
This error occurs only on gfx1101 arch.
This error is coming from an integer overflow when another unit test,
test/test_cuda.py::TestCuda::test_randint_generation_for_large_numel
creates a tensor with a huge numel, which overflows into a higher
torch.cuda.max_memory_reserved() when you call
test/test_cuda.py::TestCuda::test_set_per_process_memory_fraction
afterward. To avoid this we introduced torch.cuda.empty_cache() and
torch.cuda.reset_peak_memory_stats() to clean up CUDA states.
JIRA: https://ontrack-internal.amd.com/browse/SWDEV-535295
(cherry picked from commit f86d184)
(cherry picked from commit 1b44228)
Adds initial autotuning for foreach support required for https://ontrack-internal.amd.com/browse/SWDEV-539076 4x improvement for some kernels Before: triton_for_fused_18.kd 🔍 | 4.986 ms | 4.986 ms | 2.493 ms | 2 | triton_for_fused_6.kd 🔍 | 0.098 ms | 0.098 ms | 0.049 ms | 2 | triton_for_fused_7.kd 🔍 | 0.036 ms | 0.036 ms | 0.018 ms | 2 | After: triton_for_fused_18.kd 🔍 | 1.273 ms | 1.273 ms | 0.636 ms | 2 | triton_for_fused_6.kd 🔍 | 0.044 ms | 0.044 ms | 0.022 ms | 2 | triton_for_fused_7.kd 🔍 | 0.024 ms | 0.024 ms | 0.012 ms | 2 | (cherry picked from commit f07b7f7) (cherry picked from commit ed0d0a7)
Relands #2416 with caching fix Upstream equivalent pytorch#159146 --------- Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com> (cherry picked from commit f0aebdc) (cherry picked from commit 9c429dd)
Perf improvement for triton tanh (cherry picked from commit 4febbd8)
Fixes SWDEV-543698 (https://ontrack-internal.amd.com/browse/SWDEV-543698) Cherry-picked from #2502 This PR fixes the errors like below: ``` [rank3]: RuntimeError: The following operation failed in the TorchScript interpreter. [rank3]: Traceback of TorchScript (most recent call last): [rank3]: RuntimeError: /tmp/comgr-28f951/input/CompileSourceACC062:67:7: error: unknown type name 'uint32_t'; did you mean '__hip_internal::uint32_t'? [rank3]: 67 | uint32_t int32; [rank3]: | ^~~~~~~~ [rank3]: | __hip_internal::uint32_t ``` Earlier uint32_t was defined in HIP headers in std namespace. Now it is moved to __hip_internal namespace in hip headers. This change is made in ROCm 7.0. (cherry picked from commit b2fb688)
Original PR (#2417) had incorrect indentation. Updated PR such that autotune will always add tiny configs, otherwise use the hinted configs only. Tested locally on test_torchinductor: Ran 894 tests in 952.242s FAILED (failures=1, skipped=28) And completed autotune runs for microbench models Microbenchmark for network : resnet152 Num devices: 1 Dtype: FP32 Mini batch size [img] : 64 Time per mini-batch : 0.09107530117034912 Throughput [img/sec] : 702.7152167226226 (cherry picked from commit db3ba66)
* cherry-pick of pytorch@2aadcea (cherry picked from commit bd74018)
cherry-pick of pytorch#163869 (cherry picked from commit dfd386f)
[AUTOGENERATED] release/2.9_IFU_2025-10-14
Cherry-pick of #2693 Co-authored-by: Gheorghe-Teodor Bercea <gt.bercea@gmail.com>
Cherry-pick of #2710 Co-authored-by: Jerry Mannil <65309407+jerrymannil@users.noreply.github.com>
…2722) These changes from upstream result in a breakage when loading external library ``` 61170: calling init: /opt/venv/lib/python3.12/site-packages/torchvision/_C.so 61170: terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc Fatal Python error: Aborted Current thread 0x00007f229fb36080 (most recent call first): File "/usr/lib/python3.12/ctypes/__init__.py", line 379 in __init__ File "/pytorch/torch/_ops.py", line 1488 in load_library File "/opt/venv/lib/python3.12/site-packages/torchvision/extension.py", line 34 in <module> File "<frozen importlib._bootstrap>", line 488 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 995 in exec_module File "<frozen importlib._bootstrap>", line 935 in _load_unlocked File "<frozen importlib._bootstrap>", line 1331 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1360 in _find_and_load File "/opt/venv/lib/python3.12/site-packages/torchvision/__init__.py", line 9 in <module> ``` This was already reverted in rocm/7.1_internal_testing, need to investigate whether upstream needs a fix
These changes are currently in progress of being upstreamed. Bring into release 2.9 for customer model perf improvement --------- Co-authored-by: Nichols A. Romero <nick.romero@amd.com> Co-authored-by: Sampsa Riikonen <sriikone@amd.com> Co-authored-by: Nichols A. Romero <165712832+naromero77amd@users.noreply.github.com> Co-authored-by: AmdSampsa <sampsa.riikonen@amd.com>
Latest updates from triton
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/165802
Note: Links to docs will display an error until the docs builds have been completed. This comment was automatically generated by Dr. CI and updates every 15 minutes. |
One or more co-authors of this pull request were not found. You must specify co-authors in commit message trailer via: Supported
Please update your commit message(s) by doing |
Ref.: #164572
cc @EikanWang @jgong5 @wenzhe-nrv @sanchitintel @voznesenskym @penguinwu @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben