feature/recompute #422

karpathy · 2024-05-16T19:11:36Z

Option to recompute forward activations during backward pass.
Will be an int so that 0 = don't be fancy, 1,2,3,4... (in the future) recompute more and more.
This trades off VRAM for latency of a single fwd/bwd pass, because we do more calculations, but we save less in memory.
The big upside is that the VRAM savings mean you can crank up the batch size, and can actually end up with a net win on the tokens throughput during training.

For example on my A100 40GB, with -r 0 I can only fit batch size 10 for the biggest GPT-2 model. But with -r 1 (recompute GeLU) I can fit batch size 12, and a net win of token throughput because of that.

…2-act-recomp

…r time just like ZeRO stages, as we recompute more and more of the model in the future possibly. and make it default on because it is awesome

ngc92 and others added 3 commits May 16, 2024 12:39

(optionally) recompute gelu activations to reduce activation memory

d48c3a4

Merge branch 'act-recomp' of https://github.com/ngc92/llm.c into ngc9…

3277ccc

…2-act-recomp

make recompute be an int instead of bool, so we can strengthen it ove…

d7581fc

…r time just like ZeRO stages, as we recompute more and more of the model in the future possibly. and make it default on because it is awesome

ngc92 mentioned this pull request May 16, 2024

(optionally) recompute gelu activations to reduce activation memory #420

Closed

karpathy merged commit bd7dc7a into master May 16, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature/recompute #422

feature/recompute #422

karpathy commented May 16, 2024

feature/recompute #422

feature/recompute #422

Conversation

karpathy commented May 16, 2024