New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CUDNN convolution workspace: Use dedicated function to determine workspace size for alogorithm #118

Merged
merged 1 commit into from Jan 22, 2019

Conversation

Projects
None yet
2 participants
@TE-StephenTiedemann
Copy link
Contributor

TE-StephenTiedemann commented Jan 17, 2019

The workspace size used for the fastest may not be the workspace size required when running the same algorithm, which may happen if the same algorithm was profiled with different math type (FP32 and FP16) requiring different workspace sizes. This fix makes another call to a cudnn function to get the max required workspace size for the selected algorithm.

Use dedicated function to determine workspace size for alogorithm.
The workspace size used for the fastest may not be the workspace
size required when running the same algorithm, which may happen if
the same algorithm was profiled with different math type (FP32 and
FP16) requiring different workspace sizes. This fix uses the simple
approach of getting the max required workspace size from a cudnn
function (another approach would be to coalesce the profile
results).
@TE-KazukiYoshiyama

This comment has been minimized.

Copy link
Contributor

TE-KazukiYoshiyama commented Feb 4, 2019

not sure if cudnn issue or not, but the error when using TensorCore will not happen with this PR.

@TE-TakuyaNarihira TE-TakuyaNarihira changed the title Use dedicated function to determine workspace size for alogorithm. Fix CUDNN convolution workspace: Use dedicated function to determine workspace size for alogorithm Feb 4, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment