[ROCm] topk and sort fixes #12337

iotamudelta · 2018-10-04T19:56:22Z

Topk part 1: fix intrinsincs for 64 wave front (Fixes for functions that support no bias mode #224)
64 in a wave front - intrinsics change.
Disable in-place sorting on ROCm. (model.parameter_dict doesn't return persistent buffers #237)
It is known to hang - use the Thrust fallback
Skip one test - fails with the fallback.
Topk fixes (Fix reference cycles in autograd and relax semantics of save_for_backward and backward #239)
Spec (https://docs.nvidia.com/cuda/pdf/ptx_isa_6.3.pdf) Sec 9.7.1.19 (bfe) and 9.7.1.20 (bfi) requires pos and len to be limited to 0...255
Spec (https://docs.nvidia.com/cuda/pdf/ptx_isa_6.3.pdf) Sec 9.7.1.19 requires extracted bits to be in LSBs
Correct logic for getLaneMaskLe. Previous logic would return 0x0 instead of 0xffffffffffffffff for lane 63
Round up blockDim.x to prevent negative index for smem

Note the one additional skipped test resulting from using the thrust sort fallback for all sizes. We are working on getting bitonic to work properly (and always). Until then, this needs to be skipped on ROCm.

* Topk part 1: fix intrinsincs for 64 wave front (#224) 64 in a wave front - intrinsics change. * Disable in-place sorting on ROCm. (#237) It is known to hang - use the Thrust fallback Skip one test - fails with the fallback. * Topk fixes (#239) * Spec (https://docs.nvidia.com/cuda/pdf/ptx_isa_6.3.pdf) Sec 9.7.1.19 (bfe) and 9.7.1.20 (bfi) requires pos and len to be limited to 0...255 * Spec (https://docs.nvidia.com/cuda/pdf/ptx_isa_6.3.pdf) Sec 9.7.1.19 requires extracted bits to be in LSBs * Correct logic for getLaneMaskLe. Previous logic would return 0x0 instead of 0xffffffffffffffff for lane 63 * Round up blockDim.x to prevent negative index for smem

iotamudelta · 2018-10-04T21:05:52Z

@pytorchbot retest this please

ezyang · 2018-10-04T22:17:04Z

Just FYI, when you submit these PRs, please use full URLs for issues; they are cross-linking to the wrong issues now.

aten/src/THC/THCScanUtils.cuh

@@ -207,7 +213,7 @@ __device__ void exclusiveBinaryPrefixScan(T* smem, bool in, T* out, T* carry, Bi
  *out -= (T) in;

  // The outgoing carry for all threads is the last warp's sum
-  *carry = smem[(blockDim.x / SCAN_UTILS_WARP_SIZE) - 1];
+  *carry = smem[THCCeilDiv<int>(blockDim.x, SCAN_UTILS_WARP_SIZE) - 1];


ezyang · 2018-10-04T22:21:27Z

@pytorchbot retest this please

facebook-github-bot

ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: * Topk part 1: fix intrinsincs for 64 wave front (#224) 64 in a wave front - intrinsics change. * Disable in-place sorting on ROCm. (#237) It is known to hang - use the Thrust fallback Skip one test - fails with the fallback. * Topk fixes (#239) * Spec (https://docs.nvidia.com/cuda/pdf/ptx_isa_6.3.pdf) Sec 9.7.1.19 (bfe) and 9.7.1.20 (bfi) requires pos and len to be limited to 0...255 * Spec (https://docs.nvidia.com/cuda/pdf/ptx_isa_6.3.pdf) Sec 9.7.1.19 requires extracted bits to be in LSBs * Correct logic for getLaneMaskLe. Previous logic would return 0x0 instead of 0xffffffffffffffff for lane 63 * Round up blockDim.x to prevent negative index for smem bddppq ezyang Note the one additional skipped test resulting from using the thrust sort fallback for all sizes. We are working on getting bitonic to work properly (and always). Until then, this needs to be skipped on ROCm. Pull Request resolved: pytorch/pytorch#12337 Differential Revision: D10259481 Pulled By: ezyang fbshipit-source-id: 5c8dc6596d7a3103ba7b4b550cba895f38c8148e

Summary: * Topk part 1: fix intrinsincs for 64 wave front (pytorch#224) 64 in a wave front - intrinsics change. * Disable in-place sorting on ROCm. (pytorch#237) It is known to hang - use the Thrust fallback Skip one test - fails with the fallback. * Topk fixes (pytorch#239) * Spec (https://docs.nvidia.com/cuda/pdf/ptx_isa_6.3.pdf) Sec 9.7.1.19 (bfe) and 9.7.1.20 (bfi) requires pos and len to be limited to 0...255 * Spec (https://docs.nvidia.com/cuda/pdf/ptx_isa_6.3.pdf) Sec 9.7.1.19 requires extracted bits to be in LSBs * Correct logic for getLaneMaskLe. Previous logic would return 0x0 instead of 0xffffffffffffffff for lane 63 * Round up blockDim.x to prevent negative index for smem bddppq ezyang Note the one additional skipped test resulting from using the thrust sort fallback for all sizes. We are working on getting bitonic to work properly (and always). Until then, this needs to be skipped on ROCm. Pull Request resolved: pytorch#12337 Differential Revision: D10259481 Pulled By: ezyang fbshipit-source-id: 5c8dc6596d7a3103ba7b4b550cba895f38c8148e

iotamudelta requested review from apaszke, colesbury, ezyang, gchanan, soumith and zdevito as code owners October 4, 2018 19:56

ezyang reviewed Oct 4, 2018

View reviewed changes

bddppq added the module: rocm AMD GPU support for Pytorch label Oct 5, 2018

ezyang approved these changes Oct 9, 2018

View reviewed changes

facebook-github-bot reviewed Oct 9, 2018

View reviewed changes

facebook-github-bot closed this in c96afa3 Oct 9, 2018

iotamudelta deleted the topk_sort_20181004 branch October 23, 2018 17:00

ezyang added open source merged labels Jun 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ROCm] topk and sort fixes #12337

[ROCm] topk and sort fixes #12337

iotamudelta commented Oct 4, 2018

iotamudelta commented Oct 4, 2018

ezyang commented Oct 4, 2018

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

ezyang commented Oct 4, 2018

facebook-github-bot left a comment

[ROCm] topk and sort fixes #12337

[ROCm] topk and sort fixes #12337

Conversation

iotamudelta commented Oct 4, 2018

iotamudelta commented Oct 4, 2018

ezyang commented Oct 4, 2018

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

ezyang commented Oct 4, 2018

facebook-github-bot left a comment

Choose a reason for hiding this comment