Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cuDNN][cuDNN V8 API] Fix benchmark_limit ignoring failed kernels in FIND #91032

Closed
wants to merge 1 commit into from

Conversation

eqy
Copy link
Collaborator

@eqy eqy commented Dec 16, 2022

Currently the torch.backends.cudnn.benchmark_limit setting ignores the validity/status of proposed cuDNN frontend execution plans because we do not know if they will complete successfully until execution is attempted. However, there are rare cases where the majority of execution plans fail and a fallback plan is needed (e.g., in the case of extremely small pointer alignment on the input tensors). If the limit is too small to include a working fallback plan, we currently bail out prematurely without checking the plans exhaustively.

The fix is to defer applying the benchmark_limit setting until we are sure that plans will execute successfully, but this requires changes to the cuDNN frontend timing function. This PR adds a hacked version of the cuDNN frontend timing function for now, with the intent that we can switch to the upstream cuDNN frontend implementation once this functionality is added.

CC @ptrblck @ngimel

@pytorch-bot
Copy link

pytorch-bot bot commented Dec 16, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91032

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 542df68:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@eqy eqy requested a review from ngimel December 16, 2022 20:07
@eqy
Copy link
Collaborator Author

eqy commented Dec 16, 2022

@pytorchmergebot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased cudnn_v8_fix_find_fallback onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout cudnn_v8_fix_find_fallback && git pull --rebase)

@eqy eqy added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 16, 2022
@eqy
Copy link
Collaborator Author

eqy commented Dec 17, 2022

@pytorchmergebot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased cudnn_v8_fix_find_fallback onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout cudnn_v8_fix_find_fallback && git pull --rebase)

break;
}
} else {
final_time_ms = i == (maxIterCount / 2) ? time_ms : final_time_ms;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maxIterCount >= 2 is more readable

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand which condition should be changed? Is it the part using the median time (3/2 = 1) , (1/2 = 0)?

This is taken from the cuDNN frontend: https://github.com/NVIDIA/cudnn-frontend/blob/81a041a68245cd8f871c43bbbbd5b6b627979a30/include/cudnn_frontend_find_plan.h#L94 which at first reading looks like it throws out the last iteration in "median of three" which seems unexpected.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok so it's not really MEDIAN_OF_THREE, it's just time on the second iteration. Fine, leave it like this.

@eqy
Copy link
Collaborator Author

eqy commented Dec 19, 2022

@pytorchmergebot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request May 3, 2023
… for cuDNN v8 API (#100287)

`cudnn-frontend` (bumped in #99674) has added support for limiting the number of kernels to benchmark, so we can remove the workaround introduced in #91032.

CC @ngimel @ptrblck

Pull Request resolved: #100287
Approved by: https://github.com/ngimel
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged open source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants