Skip to content

Conversation

Krovatkin
Copy link
Contributor

@Krovatkin Krovatkin commented Sep 27, 2020

This PR flips a switch to enable PE + TE
next PR: #45397

@dr-ci
Copy link

dr-ci bot commented Sep 27, 2020

💊 CI failures summary and remediations

As of commit 391de07 (more details on the Dr. CI page):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-CircleCI failure(s)

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 10 times.

Copy link

@ZolotukhinM ZolotukhinM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎆

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Krovatkin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot pushed a commit that referenced this pull request Sep 28, 2020
Summary:
Disable a few tests in preparation to enabling PE+TE
Next PR: #45396

Pull Request resolved: #44815

Reviewed By: ZolotukhinM

Differential Revision: D23948445

Pulled By: Krovatkin

fbshipit-source-id: 93e641b7b8a3f13bd3fd3840116076553408f224
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Krovatkin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Krovatkin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@Krovatkin merged this pull request in dddb685.

@t-vi
Copy link
Collaborator

t-vi commented Sep 29, 2020

This breaks ROCm. In the short term, we could skip CI tests that fail (and leave things broken if real users like those functions) or leave the default to off for now when using ROCm.
@jeffdaily preferences?

@mruberry
Copy link
Collaborator

Unlanding. As @t-vi said this does break ROCm. Relevant snippet:

11:18:14 ======================================================================
11:18:14 ERROR: test_triplet_margin_with_distance_loss_cuda (__main__.TestNNDeviceTypeCUDA)
11:18:14 ----------------------------------------------------------------------
11:18:14 Traceback (most recent call last):
11:18:14   File "/var/lib/jenkins/.local/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 827, in wrapper
11:18:14     method(*args, **kwargs)
11:18:14   File "/var/lib/jenkins/.local/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 273, in instantiated_test
11:18:14     result = test_fn(self, *args)
11:18:14   File "/var/lib/jenkins/.local/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 651, in only_fn
11:18:14     return fn(self, device, *args, **kwargs)
11:18:14   File "test_nn.py", line 12810, in test_triplet_margin_with_distance_loss
11:18:14     a, p, n), (anchor, positive, negative)))
11:18:14   File "/var/lib/jenkins/.local/lib/python3.6/site-packages/torch/autograd/gradcheck.py", line 322, in gradcheck
11:18:14     func_out = func(*tupled_inputs)
11:18:14   File "test_nn.py", line 12810, in <lambda>
11:18:14     a, p, n), (anchor, positive, negative)))
11:18:14   File "/var/lib/jenkins/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
11:18:14     result = self.forward(*input, **kwargs)
11:18:14   File "/var/lib/jenkins/.local/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 129, in prof_meth_call
11:18:14     return prof_callable(meth_call, *args, **kwargs)
11:18:14   File "/var/lib/jenkins/.local/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 123, in prof_callable
11:18:14     return callable(*args, **kwargs)
11:18:14 RuntimeError: /tmp/comgr-b2ca31/input/CompileSource:16:12: error: unknown type name '__global__'
11:18:14 extern "C" __global__
11:18:14            ^
11:18:14 /tmp/comgr-b2ca31/input/CompileSource:19:13: error: use of undeclared identifier 'blockIdx'
11:18:14   if (512 * blockIdx.x + threadIdx.x<5 ? 1 : 0) {
11:18:14             ^
11:18:14 /tmp/comgr-b2ca31/input/CompileSource:19:26: error: use of undeclared identifier 'threadIdx'
11:18:14   if (512 * blockIdx.x + threadIdx.x<5 ? 1 : 0) {
11:18:14                          ^
11:18:14 /tmp/comgr-b2ca31/input/CompileSource:20:33: error: use of undeclared identifier 'blockIdx'
11:18:14     double v = __ldg(t0 + 512 * blockIdx.x + threadIdx.x);
11:18:14                                 ^
11:18:14 /tmp/comgr-b2ca31/input/CompileSource:20:46: error: use of undeclared identifier 'threadIdx'
11:18:14     double v = __ldg(t0 + 512 * blockIdx.x + threadIdx.x);
11:18:14                                              ^
11:18:14 /tmp/comgr-b2ca31/input/CompileSource:21:35: error: use of undeclared identifier 'blockIdx'
11:18:14     double v_1 = __ldg(t1 + 512 * blockIdx.x + threadIdx.x);
11:18:14                                   ^
11:18:14 /tmp/comgr-b2ca31/input/CompileSource:21:48: error: use of undeclared identifier 'threadIdx'
11:18:14     double v_1 = __ldg(t1 + 512 * blockIdx.x + threadIdx.x);
11:18:14                                                ^
11:18:14 /tmp/comgr-b2ca31/input/CompileSource:22:35: error: use of undeclared identifier 'blockIdx'
11:18:14     double v_2 = __ldg(t2 + 512 * blockIdx.x + threadIdx.x);
11:18:14                                   ^
11:18:14 /tmp/comgr-b2ca31/input/CompileSource:22:48: error: use of undeclared identifier 'threadIdx'
11:18:14     double v_2 = __ldg(t2 + 512 * blockIdx.x + threadIdx.x);
11:18:14                                                ^
11:18:14 /tmp/comgr-b2ca31/input/CompileSource:23:35: error: use of undeclared identifier 'blockIdx'
11:18:14     double v_3 = __ldg(t0 + 512 * blockIdx.x + threadIdx.x);
11:18:14                                   ^
11:18:14 /tmp/comgr-b2ca31/input/CompileSource:23:48: error: use of undeclared identifier 'threadIdx'
11:18:14     double v_3 = __ldg(t0 + 512 * blockIdx.x + threadIdx.x);
11:18:14                                                ^
11:18:14 /tmp/comgr-b2ca31/input/CompileSource:24:35: error: use of undeclared identifier 'blockIdx'
11:18:14     double v_4 = __ldg(t1 + 512 * blockIdx.x + threadIdx.x);
11:18:14                                   ^
11:18:14 /tmp/comgr-b2ca31/input/CompileSource:24:48: error: use of undeclared identifier 'threadIdx'
11:18:14     double v_4 = __ldg(t1 + 512 * blockIdx.x + threadIdx.x);
11:18:14                                                ^
11:18:14 /tmp/comgr-b2ca31/input/CompileSource:25:35: error: use of undeclared identifier 'blockIdx'
11:18:14     double v_5 = __ldg(t2 + 512 * blockIdx.x + threadIdx.x);
11:18:14                                   ^
11:18:14 /tmp/comgr-b2ca31/input/CompileSource:25:48: error: use of undeclared identifier 'threadIdx'
11:18:14     double v_5 = __ldg(t2 + 512 * blockIdx.x + threadIdx.x);
11:18:14                                                ^
11:18:14 /tmp/comgr-b2ca31/input/CompileSource:26:27: error: use of undeclared identifier 'blockIdx'
11:18:14     aten_clamp_flat[512 * blockIdx.x + threadIdx.x] = (v - (minimum(v_1,v_2))) + 1<0 ? 0 : (v_3 - (minimum(v_4,v_5))) + 1;
11:18:14                           ^
11:18:14 /tmp/comgr-b2ca31/input/CompileSource:26:40: error: use of undeclared identifier 'threadIdx'
11:18:14     aten_clamp_flat[512 * blockIdx.x + threadIdx.x] = (v - (minimum(v_1,v_2))) + 1<0 ? 0 : (v_3 - (minimum(v_4,v_5))) + 1;
11:18:14                                        ^
11:18:14 17 errors generated when compiling for gfx900.
11:18:14 Error: Failed to compile opencl source (from CL or HIP source to LLVM IR).
11:18:14 
11:18:14 nvrtc compilation failed: 
11:18:14 
11:18:14 #define NAN __int_as_float(0x7fffffff)
11:18:14 #define POS_INFINITY __int_as_float(0x7f800000)
11:18:14 #define NEG_INFINITY __int_as_float(0xff800000)
11:18:14 
11:18:14 template<typename T>
11:18:14 T maximum(T a, T b) {
11:18:14   return isnan(a) ? a : (a > b ? a : b);
11:18:14 }
11:18:14 
11:18:14 template<typename T>
11:18:14 T minimum(T a, T b) {
11:18:14   return isnan(a) ? a : (a < b ? a : b);
11:18:14 }
11:18:14 
11:18:14 extern "C" __global__
11:18:14 void func_1(double* t0, double* t1, double* t2, double* aten_clamp_flat) {
11:18:14 {
11:18:14   if (512 * blockIdx.x + threadIdx.x<5 ? 1 : 0) {
11:18:14     double v = __ldg(t0 + 512 * blockIdx.x + threadIdx.x);
11:18:14     double v_1 = __ldg(t1 + 512 * blockIdx.x + threadIdx.x);
11:18:14     double v_2 = __ldg(t2 + 512 * blockIdx.x + threadIdx.x);
11:18:14     double v_3 = __ldg(t0 + 512 * blockIdx.x + threadIdx.x);
11:18:14     double v_4 = __ldg(t1 + 512 * blockIdx.x + threadIdx.x);
11:18:14     double v_5 = __ldg(t2 + 512 * blockIdx.x + threadIdx.x);
11:18:14     aten_clamp_flat[512 * blockIdx.x + threadIdx.x] = (v - (minimum(v_1,v_2))) + 1<0 ? 0 : (v_3 - (minimum(v_4,v_5))) + 1;
11:18:14   }
11:18:14 }
11:18:14 }

@t-vi
Copy link
Collaborator

t-vi commented Sep 29, 2020

Just a quick shout that I'm working on fixing the codegen quirks for ROCm. In fact, I do have the failing test compiling passing, but I'll need a few hours before it is ready for a PR.

@t-vi t-vi mentioned this pull request Sep 29, 2020
facebook-github-bot pushed a commit that referenced this pull request Sep 29, 2020
Summary:
This might be an alternative to reverting #45396 .
The obvious rough edge is that I'm not really seeing the work group limits that TensorExpr produces.

Pull Request resolved: #45506

Reviewed By: zhangguanheng66

Differential Revision: D23991410

Pulled By: Krovatkin

fbshipit-source-id: 11d3fc4600e4bffb1d1192c6b8dd2fe22c1e064e
@t-vi
Copy link
Collaborator

t-vi commented Sep 30, 2020

With ROCm hopefully working with PE + TE, should we have another go at this?

facebook-github-bot pushed a commit that referenced this pull request Oct 1, 2020
Summary:
This PR removes redundant profiling jobs since after the switch PE (#45396)  will be now running by default.

Pull Request resolved: #45397

Reviewed By: zhangguanheng66

Differential Revision: D23966890

Pulled By: Krovatkin

fbshipit-source-id: ef184ca5fcf079580fa139b6653f8d9a6124050e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants