Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utilities for mixed-precision tests/benchmarks #352

Closed
wants to merge 3 commits into from

Conversation

ngc92
Copy link
Contributor

@ngc92 ngc92 commented May 4, 2024

This allows us to compile a single executable that can serve as test/benchmark for f32, f16, and bf16 versions of the kernels. So far, I've updated only those test files which already defined a BF16 macro.

Caveat:
This will try to compile float, half, and bfloat16 versions into a single exe, so the compilation fails if any of these isn't available at the moment. This is something we need to improve at some point, once we have a general strategy in place how to handle older hardware.

@ngc92 ngc92 mentioned this pull request May 4, 2024
@karpathy
Copy link
Owner

karpathy commented May 5, 2024

This complicates dev/cuda quite a bit, with templates and macros, both a bit scary. What is the problem that it is trying to solve? Isn't it the case that our CI could just compile all the kernels separately for all precisions we care about and test them one by one?

@ngc92
Copy link
Contributor Author

ngc92 commented May 6, 2024

it's less about automatic testing, and more about human testing and profiling; where I find it quite convenient not having to recompile the tests for each precision. And about reducing duplication between the different kernel test files; not letting get things out of sync.

Personally, I find the template solution much cleaner than moving the ifdefs into common.h and having floatX magically appear from there, but that would also be a solution to the problem.

If you don't like the PR in its entirety, there are still some individual things that should be merged; e.g., all the napkin math needs to be updated to actually reflect the floatX type's size.

@karpathy
Copy link
Owner

will avoid for now, closing.

@karpathy karpathy closed this May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants