Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMD ROCm: torch.backends.cudnn.benchmark should be set to False by default on ROCm #2552

Open
asumagic opened this issue May 21, 2024 · 0 comments · May be fixed by #2558
Open

AMD ROCm: torch.backends.cudnn.benchmark should be set to False by default on ROCm #2552

asumagic opened this issue May 21, 2024 · 0 comments · May be fixed by #2558
Assignees
Labels
bug Something isn't working

Comments

@asumagic
Copy link
Collaborator

Describe the bug

With AMD HIP, torch.backends.cudnn.benchmark defaults to True. On CUDA, it defaults to False.

While this is an upstream PyTorch decision (or bug), having benchmark default to True by default is probably a bad idea in our case, because we have wildly different input shapes. Since CNN kernel benchmarking is performed on a per-shape basis, it performs those slow benchmarks very often, so training times are worsened rather than improved.

Probably should add some common per-backend quirks file for this kind of change in SB?

Expected behaviour

On AMD ROCm, it should be set to False, probably with a warning since we break the default.

To Reproduce

No response

Environment Details

No response

Relevant Log Output

No response

Additional Context

No response

@asumagic asumagic added the bug Something isn't working label May 21, 2024
@asumagic asumagic self-assigned this May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
1 participant