Reliable way to identify RuntimeErrors (CUDA) #29710

c-hofer · 2019-11-13T06:29:54Z

🚀 Feature

Reliable way to check for CUDA out of memory (and CUDA Runtime errors in general).

Motivation

Currently there I see no way to reliably check for a cuda out of memory error except parsing the exception arg for

CUDA out of memory.

(After a quick grep on the pytorch sources this seems to work at the moment)

As this text may change in future I do not feel comfortable with this work-around as it screams for breaking.
In application code reliably detecting such an error seems crucial to me.

If there is a way to do so and I did not find it, this issue may be a well place to document this?

Pitch

A solution would be quiet standard, e.g., RuntimeError subclasses or an attached error code.

@soumith @albanD How do you folks think about this?

cc @ngimel

The text was updated successfully, but these errors were encountered:

vadimkantorov · 2020-07-22T00:19:07Z

Currently it seems that people are checking exception messages: https://github.com/pytorch/fairseq/blob/3655cf266e32a2272d6deac6069a594977880084/fairseq/trainer.py#L615

It would indeed be good to have a separate exception type for out of memory

mruberry added module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module enhancement Not as big of a feature, but technically not a bug. Should be easy to fix labels Nov 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reliable way to identify RuntimeErrors (CUDA) #29710

Reliable way to identify RuntimeErrors (CUDA) #29710

c-hofer commented Nov 13, 2019 •

edited by pytorch-probot bot

vadimkantorov commented Jul 22, 2020

Reliable way to identify RuntimeErrors (CUDA) #29710

Reliable way to identify RuntimeErrors (CUDA) #29710

Comments

c-hofer commented Nov 13, 2019 • edited by pytorch-probot bot

🚀 Feature

Motivation

Pitch

vadimkantorov commented Jul 22, 2020

c-hofer commented Nov 13, 2019 •

edited by pytorch-probot bot