Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a way to debug explicit CPU fallback #262

Closed
dvrogozh opened this issue May 17, 2024 · 2 comments · Fixed by #318
Closed

Provide a way to debug explicit CPU fallback #262

dvrogozh opened this issue May 17, 2024 · 2 comments · Fixed by #318

Comments

@dvrogozh
Copy link
Contributor

@fengyuan14 - The commit 5bf9e0c muted debug logs of "explicit" CPU fallbacks. This complicated debug for 3d party contributors trying to evaluate XPU backend capabilities - now I am forced to revert noted commit to understand which operations are not currently implemented by XPU. Please:

  1. Explain what "explicit CPU fallback" means - this seems to be internal to xpu team classification which is unclear and confusing
  2. Extend PYTORCH_DEBUG_XPU_FALLBACK=1 to track any CPU fallback happening in XPU backend. Note: I am fine if "explicit" fallback will be muted by default, but I really need a way to be able to track it.
commit 5bf9e0cc768f7a3b13d829118683275f324399f1 (origin/meng_max_2d)
Author: Feng Yuan <feng1.yuan@intel.com>
Date:   Mon Apr 29 13:05:51 2024 +0800

    Register operator's implementation lazily. (#177)

    1. Avoid dangling operator's implementation (m.impl(torchvision::nms) is
    ahead of `import torchvision` sometime)
    2. Mute debug log of explicit CPU fallback.
    3. Add torchvision.roi_align/_roi_align_backward example case

CC: @jgong5 @mingfeima @XiaobingSuper @ashokei @jingxu10 @gujinghui @EikanWang @fengyuan14 @guangyey

@dvrogozh
Copy link
Contributor Author

Also filed pytorch/pytorch#126488 for visibility at pytorch level.

@fengyuan14
Copy link
Contributor

Thanks for your feedback. Replied in pytorch/pytorch#126488.

dvrogozh added a commit to dvrogozh/torch-xpu-ops that referenced this issue May 24, 2024
Fixes: intel#262
Fixes: 5bf9e0c ("Register operator's implementation lazily")
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
dvrogozh added a commit to dvrogozh/torch-xpu-ops that referenced this issue Jun 3, 2024
Fixes: intel#262
Fixes: 5bf9e0c ("Register operator's implementation lazily")
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
github-merge-queue bot pushed a commit that referenced this issue Jun 5, 2024
Fixes: #262, pytorch/pytorch#126488

5bf9e0c ("Register operator's implementation lazily") disabled warnings
printout on explicit CPU fallback. I believe that users and customers
will benefit from these warnings in all the cases. Note that "explicit
fallback" seems to be some internal intel classification term for
supported/unsupported operations unlikely known to others. Thus,
non-intel users will likely care for cpu fallback in general regardless
of its type. This PR adds warning back for all CPU fallback cases. We
did discuss in pytorch/pytorch#126488 that
maybe printout in Release build might not be needed - I am thinking
otherwise and added printout for all the build modes. Let's discuss this
in the PR review.

CC: @gujinghui @EikanWang @fengyuan14 @guangyey @jgong5

Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Co-authored-by: Feng Yuan <feng1.yuan@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants