Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further improvements to attention backward #170

Merged
merged 4 commits into from Apr 18, 2024

Conversation

ngc92
Copy link
Contributor

@ngc92 ngc92 commented Apr 18, 2024

Backward kernel where threads reuse data in registers to reduce memory transfers.

This PR is build on top of my previous PRs, which should be merged first. Once that is done, I'll rebase and remove the draft status here. But I need the changes to backward pass memory allocation, otherwise I cannot profile the backward pass because I get OOMs; and also, I need to be able to assume that the kernel writes (=) instead of accumulate (+=) its gradients.

@karpathy
Copy link
Owner

I merged the previous PR, so this one should be ready.

ACK on using = instead of += in the backward pass. I didn't even realize originally that this would have dramatic performance impacts, but it makes sense in retrospect. There are only a few tensors in the graph where += is necessary, where gradients have to add - at the residuals, and for the tensor wte, which is used both for token embeddings and for the final matmul, due to the weight sharing scheme. Otherwise it's okay to just set them given the graph we have.

@karpathy
Copy link
Owner

Also one possible request - I think a lot of people will come dev/cuda to learn CUDA. If you're able to comment some of the kernels I think it could be really valuable to a lot of people (including myself!)

@ngc92 ngc92 marked this pull request as ready for review April 18, 2024 22:02
@karpathy karpathy merged commit d95e624 into karpathy:master Apr 18, 2024
@karpathy
Copy link
Owner

So cool, I went down from 400ms/iter ->200ms/iter.

@ngc92 ngc92 deleted the bwd-att-coarsened branch April 28, 2024 08:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants