Fix cuda launch error in reflection_pad2d #56451

xwang233 · 2021-04-20T05:40:11Z

Fix #55222

facebook-github-bot · 2021-04-20T05:40:17Z

💊 CI failures summary and remediations

As of commit 808dd2e (more details on the Dr. CI page):

1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_windows_vs2019_py36_cuda10.1_test1 (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

AssertionError: False is not true : Scalars fai...ith rtol=1.3e-06 and atol=1e-05 is only 1.4278052!

======================================================================
FAIL [4.760s]: test_cudnn_multiple_threads_same_device (__main__.TestCuda)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Users\circleci\project\build\win_tmp\build\torch\testing\_internal\common_utils.py", line 439, in wrapper
    fn(*args, **kwargs)
  File "test_cuda.py", line 2505, in test_cudnn_multiple_threads_same_device
    (2048 - test_iters) * (2048 - test_iters))
  File "C:\Users\circleci\project\build\win_tmp\build\torch\testing\_internal\common_utils.py", line 1371, in assertEqual
    super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
AssertionError: False is not true : Scalars failed to compare as equal! Comparing 1890625.0 and 1098304 gives a difference of 792321.0, but the allowed difference with rtol=1.3e-06 and atol=1e-05 is only 1.4278052!

----------------------------------------------------------------------
Ran 159 tests in 78.603s

FAILED (failures=1, skipped=67)

Generating XML reports...
Generated XML report: test-reports\python-unittest\test_cuda\TEST-TestCuda-20210421073255.xml
Generated XML report: test-reports\python-unittest\test_cuda\TEST-TestCudaComm-20210421073255.xml
Traceback (most recent call last):

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

ngimel

Looks good, small nit about legacy header.

ngimel · 2021-04-21T05:15:29Z

aten/src/ATen/native/cuda/ReflectionPad.cu

+        for (int64_t block_z = 0; block_z < size_z; block_z += 65535) {
+          int64_t block_z_size = std::min(size_z - block_z, static_cast<int64_t>(65535));
+
+          dim3 grid_size(THCCeilDiv(output_plane_size, static_cast<int64_t>(256)), block_y_size, block_z_size);


please use cuda::ATenCeilDiv here, don't include legacy header

facebook-github-bot · 2021-04-21T16:45:37Z

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-04-21T21:41:05Z

@ngimel merged this pull request in 3ec6bf5.

Summary: Fix pytorch#55222 Pull Request resolved: pytorch#56451 Reviewed By: malfet Differential Revision: D27912184 Pulled By: ngimel fbshipit-source-id: 3fc80273c30a68a247289d3fb698f99b92931731

facebook-github-bot added the cla signed label Apr 20, 2021

xwang233 requested a review from ngimel April 20, 2021 05:40

fix

4ca5a33

xwang233 requested a review from ptrblck April 20, 2021 05:44

lint

b919f16

pytorchbot added the open source label Apr 20, 2021

remove comments

1750ea9

This was referenced Apr 21, 2021

replication_pad1d raising "CUDA error: invalid configuration argument" on large inputs #49601

Closed

RuntimeError: CUDA error: invalid configuration argument when using PyTorch code #55222

Closed

ngimel reviewed Apr 21, 2021

View reviewed changes

xwang233 added 2 commits April 20, 2021 22:54

use ATenCeilDiv

8111f7f

Merge remote-tracking branch 'upstream/master' into reflection_pad2d_fix

808dd2e

ngimel approved these changes Apr 21, 2021

View reviewed changes

facebook-github-bot closed this in 3ec6bf5 Apr 21, 2021

facebook-github-bot added the Merged label Apr 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix cuda launch error in reflection_pad2d #56451

Fix cuda launch error in reflection_pad2d #56451

xwang233 commented Apr 20, 2021

facebook-github-bot commented Apr 20, 2021 •

edited

ngimel left a comment

ngimel Apr 21, 2021

facebook-github-bot commented Apr 21, 2021

facebook-github-bot commented Apr 21, 2021

Fix cuda launch error in reflection_pad2d #56451

Fix cuda launch error in reflection_pad2d #56451

Conversation

xwang233 commented Apr 20, 2021

facebook-github-bot commented Apr 20, 2021 • edited

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

pytorch_windows_vs2019_py36_cuda10.1_test1 (1/1)

ngimel left a comment

Choose a reason for hiding this comment

ngimel Apr 21, 2021

Choose a reason for hiding this comment

facebook-github-bot commented Apr 21, 2021

facebook-github-bot commented Apr 21, 2021

facebook-github-bot commented Apr 20, 2021 •

edited