Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example for the dtype change for gelu kernels #250

Closed
wants to merge 1 commit into from

Conversation

ChrisDryden
Copy link
Contributor

@ChrisDryden ChrisDryden commented Apr 25, 2024

By changing the type of data that is being read from memory, in a single memory operation it is possible to read up to 128 bits of data. For memory constrained kernels it is beneficial to wrap all of the sequential reads in the format shown in this PR so that the effects of the latency of the memory reads is reduced.

For future kernel changes that also require coalesced memory reads it is now possible to use this struct that has methods to easily access the underlying data using the [] operation.

This changes the iter speed for me on my A100 from:

total average iteration time: 45.785989 ms

to around:
total average iteration time: 44.766054 ms

Co-authored-by: @ngc92 - Came up with the memory struct idea

@karpathy karpathy marked this pull request as draft April 25, 2024 15:24
@karpathy
Copy link
Owner

converted to draft per comment

@karpathy
Copy link
Owner

Can you explain briefly what is happening? It looks really wrong that these are int4, for floats.

@ChrisDryden
Copy link
Contributor Author

So the high level issues previously is that there is too much latency from the read from memory instruction to the calculation and the return write. At least on my A100 you can in one instruction read float4 or int4 in a single read instruction, so this CR changes the kernel to read the memory as if it is a Int4 or a Float4 to read 8 at a time and write 8 at a time in a single instruction so theres less waiting for memory before doing calculations.

@ChrisDryden ChrisDryden marked this pull request as ready for review April 28, 2024 02:23
@ChrisDryden
Copy link
Contributor Author

Went with different approach described in the PR here: #298

@ChrisDryden ChrisDryden closed this May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants