Reliable way to identify RuntimeErrors (CUDA) #29710
Labels
enhancement
Not as big of a feature, but technically not a bug. Should be easy to fix
module: cuda
Related to torch.cuda, and CUDA support in general
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
馃殌 Feature
Reliable way to check for CUDA out of memory (and CUDA Runtime errors in general).
Motivation
Currently there I see no way to reliably check for a cuda out of memory error except parsing the exception arg for
CUDA out of memory
.(After a quick grep on the pytorch sources this seems to work at the moment)
As this text may change in future I do not feel comfortable with this work-around as it screams for breaking.
In application code reliably detecting such an error seems crucial to me.
If there is a way to do so and I did not find it, this issue may be a well place to document this?
Pitch
A solution would be quiet standard, e.g., RuntimeError subclasses or an attached error code.
@soumith @albanD How do you folks think about this?
cc @ngimel
The text was updated successfully, but these errors were encountered: