[cuDNN][cuDNN V8 API] Fix `benchmark_limit` ignoring failed kernels in FIND #91032

eqy · 2022-12-16T20:07:40Z

Currently the torch.backends.cudnn.benchmark_limit setting ignores the validity/status of proposed cuDNN frontend execution plans because we do not know if they will complete successfully until execution is attempted. However, there are rare cases where the majority of execution plans fail and a fallback plan is needed (e.g., in the case of extremely small pointer alignment on the input tensors). If the limit is too small to include a working fallback plan, we currently bail out prematurely without checking the plans exhaustively.

The fix is to defer applying the benchmark_limit setting until we are sure that plans will execute successfully, but this requires changes to the cuDNN frontend timing function. This PR adds a hacked version of the cuDNN frontend timing function for now, with the intent that we can switch to the upstream cuDNN frontend implementation once this functionality is added.

CC @ptrblck @ngimel

pytorch-bot · 2022-12-16T20:07:43Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91032

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 542df68:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

eqy · 2022-12-16T20:12:04Z

@pytorchmergebot rebase

pytorchmergebot · 2022-12-16T20:14:06Z

@pytorchbot successfully started a rebase job. Check the current status here

pytorchmergebot · 2022-12-16T20:14:12Z

Successfully rebased cudnn_v8_fix_find_fallback onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout cudnn_v8_fix_find_fallback && git pull --rebase)

eqy · 2022-12-17T02:58:14Z

@pytorchmergebot rebase

pytorchmergebot · 2022-12-17T03:00:16Z

@pytorchbot successfully started a rebase job. Check the current status here

pytorchmergebot · 2022-12-17T03:00:21Z

Successfully rebased cudnn_v8_fix_find_fallback onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout cudnn_v8_fix_find_fallback && git pull --rebase)

ngimel · 2022-12-18T01:27:52Z

aten/src/ATen/native/cudnn/Conv_v8.cpp

+                    break;
+                }
+            } else {
+                final_time_ms = i == (maxIterCount / 2) ? time_ms : final_time_ms;


nit: maxIterCount >= 2 is more readable

I'm not sure I understand which condition should be changed? Is it the part using the median time (3/2 = 1) , (1/2 = 0)?

This is taken from the cuDNN frontend: https://github.com/NVIDIA/cudnn-frontend/blob/81a041a68245cd8f871c43bbbbd5b6b627979a30/include/cudnn_frontend_find_plan.h#L94 which at first reading looks like it throws out the last iteration in "median of three" which seems unexpected.

Ah ok so it's not really MEDIAN_OF_THREE, it's just time on the second iteration. Fine, leave it like this.

eqy · 2022-12-19T06:03:05Z

@pytorchmergebot merge

pytorchmergebot · 2022-12-19T06:04:40Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

@ngimel

… for cuDNN v8 API (#100287) `cudnn-frontend` (bumped in #99674) has added support for limiting the number of kernels to benchmark, so we can remove the workaround introduced in #91032. CC @ngimel @ptrblck Pull Request resolved: #100287 Approved by: https://github.com/ngimel

eqy requested a review from ngimel December 16, 2022 20:07

pytorchmergebot force-pushed the cudnn_v8_fix_find_fallback branch from c8523cd to d0e2fe4 Compare December 16, 2022 20:14

pytorchbot added the open source label Dec 16, 2022

eqy added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 16, 2022

eqy mentioned this pull request Dec 17, 2022

[FSDP] cannot find engine to run cudnn convolutions #91025

Closed

check in

542df68

pytorchmergebot force-pushed the cudnn_v8_fix_find_fallback branch from d0e2fe4 to 542df68 Compare December 17, 2022 03:00

ngimel reviewed Dec 18, 2022

View reviewed changes

ngimel approved these changes Dec 18, 2022

View reviewed changes

pytorchmergebot added the Merged label Dec 19, 2022

pytorchmergebot closed this in ce4900f Dec 19, 2022

eqy mentioned this pull request Apr 28, 2023

[cuDNN][cuDNN V8 frontend API] Clean up time_sorted_plan workaround for cuDNN v8 API #100287

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cuDNN][cuDNN V8 API] Fix `benchmark_limit` ignoring failed kernels in FIND #91032

[cuDNN][cuDNN V8 API] Fix `benchmark_limit` ignoring failed kernels in FIND #91032

eqy commented Dec 16, 2022

pytorch-bot bot commented Dec 16, 2022 •

edited

eqy commented Dec 16, 2022

pytorchmergebot commented Dec 16, 2022

pytorchmergebot commented Dec 16, 2022

eqy commented Dec 17, 2022

pytorchmergebot commented Dec 17, 2022

pytorchmergebot commented Dec 17, 2022

ngimel Dec 18, 2022

eqy Dec 19, 2022

ngimel Dec 19, 2022

eqy commented Dec 19, 2022

pytorchmergebot commented Dec 19, 2022

[cuDNN][cuDNN V8 API] Fix benchmark_limit ignoring failed kernels in FIND #91032

[cuDNN][cuDNN V8 API] Fix benchmark_limit ignoring failed kernels in FIND #91032

Conversation

eqy commented Dec 16, 2022

pytorch-bot bot commented Dec 16, 2022 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91032

✅ No Failures

eqy commented Dec 16, 2022

pytorchmergebot commented Dec 16, 2022

pytorchmergebot commented Dec 16, 2022

eqy commented Dec 17, 2022

pytorchmergebot commented Dec 17, 2022

pytorchmergebot commented Dec 17, 2022

ngimel Dec 18, 2022

Choose a reason for hiding this comment

eqy Dec 19, 2022

Choose a reason for hiding this comment

ngimel Dec 19, 2022

Choose a reason for hiding this comment

eqy commented Dec 19, 2022

pytorchmergebot commented Dec 19, 2022

Merge started

[cuDNN][cuDNN V8 API] Fix `benchmark_limit` ignoring failed kernels in FIND #91032

[cuDNN][cuDNN V8 API] Fix `benchmark_limit` ignoring failed kernels in FIND #91032

pytorch-bot bot commented Dec 16, 2022 •

edited