-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[Reopen] [Intel GPU] Set higher tolerance for some models only on XPU Device #144756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/144756
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 76195b4 with merge base b4cee2b ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@EikanWang please help to review this reopened PR. Previous PR #134192 with same change is closed due to stale status |
benchmarks/dynamo/torchbench.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def xpu_higher_tolerance(self, current_device, name): | |
return ( | |
current_device == "xpu" and name in self._tolerance["higher_fp16_bf16_xpu"] | |
) | |
def xpu_higher_tolerance(self, current_device): | |
return self._tolerance["higher_fp16_bf16_xpu"] if current_device == "xpu" else [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestions; the code has been modified accordingly.
benchmarks/dynamo/torchbench.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if name in self._tolerance["higher_fp16"] or self.xpu_higher_tolerance( | |
current_device, name | |
): | |
if name in self._tolerance["higher_fp16"] + self.xpu_higher_tolerance(current_device): |
benchmarks/dynamo/torchbench.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if name in self._tolerance["higher_bf16"] or self.xpu_higher_tolerance( | |
current_device, name | |
): | |
if name in self._tolerance["higher_bf16"] + self.xpu_higher_tolerance(current_device): |
9e954c9
to
8373c17
Compare
@pytorchbot rebase |
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
Successfully rebased |
8373c17
to
d3f2c6e
Compare
d3f2c6e
to
f82ee57
Compare
@pytorchbot rebase |
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
@desertfire @EikanWang Could you please help review and merge this PR? It introduces a tolerance setting for the XPU device in the Dynamo benchmark test. |
@pytorchbot rebase |
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
Successfully rebased |
f82ee57
to
7566eff
Compare
@pytorchbot rebase -b main |
@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here |
Successfully rebased |
7566eff
to
76195b4
Compare
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
… Device (pytorch#144756) Reopen the previous stale closed PR pytorch#134192 We need to increase the tolerance slightly to ensure that certain models pass accuracy check on the XPU device. This pull request preserves the original tolerance threshold for the CUDA device and introduces a new key higher_fp16_bf16_xpu, which only impacts the XPU device. Pull Request resolved: pytorch#144756 Approved by: https://github.com/chuanqi129, https://github.com/EikanWang, https://github.com/desertfire
@pytorchbot revert -m "Broke rocm torch bench runs with TypeError: unsupported operand type(s) for |: 'set' and 'list'" -c nosignal |
@pytorchbot successfully started a revert job. Check the current status here. |
…y on XPU Device (#144756)" This reverts commit 300e0ee. Reverted #144756 on behalf of https://github.com/malfet due to Broke rocm torch bench runs with TypeError: unsupported operand type(s) for |: 'set' and 'list' ([comment](#144756 (comment)))
@retonym your PR has been successfully reverted. |
This PR was reopened (likely due to being reverted), so your approval was removed. Please request another review.
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
Reopen the previous stale closed PR #134192
We need to increase the tolerance slightly to ensure that certain models pass accuracy check on the XPU device.
This pull request preserves the original tolerance threshold for the CUDA device and introduces a new key higher_fp16_bf16_xpu, which only impacts the XPU device.
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames