Always warn on cpu fallback #318

dvrogozh · 2024-05-24T18:36:21Z

5bf9e0c ("Register operator's implementation lazily") disabled warnings printout on explicit CPU fallback. I believe that users and customers will benefit from these warnings in all the cases. Note that "explicit fallback" seems to be some internal intel classification term for supported/unsupported operations unlikely known to others. Thus, non-intel users will likely care for cpu fallback in general regardless of its type. This PR adds warning back for all CPU fallback cases. We did discuss in pytorch/pytorch#126488 that maybe printout in Release build might not be needed - I am thinking otherwise and added printout for all the build modes. Let's discuss this in the PR review.

CC: @gujinghui @EikanWang @fengyuan14 @guangyey @jgong5

EikanWang

LGTM

EikanWang · 2024-05-30T01:39:00Z

@dvrogozh , do you have the privilege to add a reviewer?

dvrogozh · 2024-05-30T02:47:50Z

@dvrogozh , do you have the privilege to add a reviewer?

No, unfortunately I do not.

dvrogozh · 2024-05-30T14:16:40Z

@EikanWang @fengyuan14. Some of pull / preci-ut tests fail. Are these some known failures since I can hardly attribute them to the change in this PR?

fengyuan14 · 2024-06-03T12:37:12Z

@EikanWang @fengyuan14. Some of pull / preci-ut tests fail. Are these some known failures since I can hardly attribute them to the change in this PR?

Retrigger the preci. Skipped cases failed due to PyTorch daily update.

Fixes: intel#262 Fixes: 5bf9e0c ("Register operator's implementation lazily") Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

dvrogozh · 2024-06-04T01:13:30Z

@fengyuan14 : tests are still failing. Should this be retriggered again?

fengyuan14 · 2024-06-04T01:44:16Z

@fengyuan14 : tests are still failing. Should this be retriggered again?
Two new errors never encounter. Let me try retrigger

FAILED test_autograd_xpu.py::TestAutogradDeviceTypeXPU::test_copy_r_to_c_xpu - AssertionError: False is not true : 
FAILED test_autograd_xpu.py::TestAutogradDeviceTypeXPU::test_to_r_to_c_xpu - AssertionError: False is not true :

fengyuan14 · 2024-06-04T03:25:28Z

I checked the failure again. The failures should be caused by the change in the PR.

The case says there should be no warning, but we warn in CPU fallback code path.

dvrogozh · 2024-06-04T04:54:39Z

@fengyuan14 : Ok, so there are some tests in pytorch which don't allow warnings. However, the tests which fail actually run few basic ops - in a way someone might consider that these ops should not fail and should run on appropriate device. I.e. actually someone might consider that these tests are actually broken and incorrectly report status.

There are few ways to proceed with this PR:

Actually fix the tests. This means identify and implement missing xpu ops.
Silence warning conditional to something like release build or environment variable disabled by default.

I need your opinions on this. From my side, I will check tomorrow which ops are not implemented. As of now I checked only 1 test:

TestAutogradDeviceTypeXPU.test_copy_r_to_c_xpu - need aten::all.all_out

fengyuan14 · 2024-06-04T05:53:20Z

@fengyuan14 : Ok, so there are some tests in pytorch which don't allow warnings. However, the tests which fail actually run few basic ops - in a way someone might consider that these ops should not fail and should run on appropriate device. I.e. actually someone might consider that these tests are actually broken and incorrectly report status.

There are few ways to proceed with this PR:

Actually fix the tests. This means identify and implement missing xpu ops.

Silence warning conditional to something like release build or environment variable disabled by default.

I need your opinions on this. From my side, I will check tomorrow which ops are not implemented. As of now I checked only 1 test:

TestAutogradDeviceTypeXPU.test_copy_r_to_c_xpu - need aten::all.all_out

I prefer the first option. Looks there won't be too many efforts to take. And aten::all.all_out is in our development plan. We won't take extra efforts. We can prioritize the op.

fengyuan14 · 2024-06-04T07:12:06Z

#368

dvrogozh · 2024-06-04T14:15:32Z

From my side, I will check tomorrow which ops are not implemented.

As far as I can tell from the log, we have 2 tests failing, both requiring same op:

TestAutogradDeviceTypeXPU.test_copy_r_to_c_xpu - need aten::all.all_out
TestAutogradDeviceTypeXPU.test_to_r_to_c_xpu - need aten::all.all_out

dvrogozh mentioned this pull request May 24, 2024

xpu: provide a way to debug explicit CPU fallback pytorch/pytorch#126488

Closed

EikanWang approved these changes May 30, 2024

View reviewed changes

Always warn on cpu fallback

2e28e6b

Fixes: intel#262 Fixes: 5bf9e0c ("Register operator's implementation lazily") Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

dvrogozh force-pushed the fixes branch from 417d557 to 2e28e6b Compare June 3, 2024 17:52

Merge branch 'main' into fixes

4b31380

Merge branch 'main' into fixes

bc67928

fengyuan14 added this pull request to the merge queue Jun 5, 2024

Merged via the queue into intel:main with commit 49ab162 Jun 5, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Always warn on cpu fallback #318

Always warn on cpu fallback #318

dvrogozh commented May 24, 2024

EikanWang left a comment

EikanWang commented May 30, 2024

dvrogozh commented May 30, 2024

dvrogozh commented May 30, 2024

fengyuan14 commented Jun 3, 2024

dvrogozh commented Jun 4, 2024

fengyuan14 commented Jun 4, 2024

fengyuan14 commented Jun 4, 2024

dvrogozh commented Jun 4, 2024

fengyuan14 commented Jun 4, 2024

fengyuan14 commented Jun 4, 2024

dvrogozh commented Jun 4, 2024

Always warn on cpu fallback #318

Always warn on cpu fallback #318

Conversation

dvrogozh commented May 24, 2024

EikanWang left a comment

Choose a reason for hiding this comment

EikanWang commented May 30, 2024

dvrogozh commented May 30, 2024

dvrogozh commented May 30, 2024

fengyuan14 commented Jun 3, 2024

dvrogozh commented Jun 4, 2024

fengyuan14 commented Jun 4, 2024

fengyuan14 commented Jun 4, 2024

dvrogozh commented Jun 4, 2024

fengyuan14 commented Jun 4, 2024

fengyuan14 commented Jun 4, 2024

dvrogozh commented Jun 4, 2024