Skip to content

CUDA OOM when training on on V100 #260

Answered by rwightman
xuzhao9 asked this question in Q&A
Discussion options

You must be logged in to vote

@xuzhao9 there isn't really enough information there for me to say if this is expected, but these models are very memory hungry. I train on 2 RTX Titan or 3090 (24GB cards) and am often in the 12-24 batch size per card range for lower end models.

Always train with AMP. And the memory quickly goes up as you increase the model size so stick to the D0 model for tests.

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@xuzhao9
Comment options

Answer selected by xuzhao9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants