-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add NSight Compute ranges, use CUDA events for timings #273
Add NSight Compute ranges, use CUDA events for timings #273
Conversation
This is really nice! It's a fair bit of boilerplate, maybe it could be wrapped into a tiny tiny class with a constructor/destructor calling push/pop, since the push/pop are nearly always at { }. EDIT: Just noticed NVIDIA already provides that: https://nvidia.github.io/NVTX/doxygen-cpp/index.html - not sure about Andrej's stylistic preferences here but it feels slightly nicer to me, especially "NVTX3_FUNC_RANGE()" described at the very end. |
agree quite eager to get this in on a high level, but is there a way to skip all the added LOC around push/pop? |
what I mentioned above with NVTX3_FUNC_RANGE() means you only need 1 line for the push, instead of 2 lines (push+pop). I don't think we can get it to anything less than that. Also maybe the string manipulation could be done more cleanly in less code with C++ rather than sprintf. |
@PeterZhizhin are you able to incorporate the above into this PR? I can also take a look if no. Also this seems to have merge conflicts now bleh |
(Otherwise pretty eager to merge this!) |
I can, spend my time working on a video yesterday. Will do it today. |
981652f
to
9751477
Compare
Hello, @karpathy! I've rebased the PR and now I use a C++ class with a constructor-desctructor. Using native NVTX3_FUNC_RANGE probably requires us to use CMake (https://github.com/NVIDIA/NVTX?tab=readme-ov-file#cmake). Seems like it's not bundled in with CUDA? On my installation I get this error:
|
@PeterZhizhin - we can pull that header down from the GitHub repo (or others) in the Makefile. Do you want to pull the "c" directory in with the headers from that repo? |
oh i see, i didn't realize. in that case i would be ok with just keeping the two calls like before, that sounds gnarly, no pulling of anything |
@PeterZhizhin I think your latest PR solves the 2 calls without needing any additional headers, right? So we should be able to use that, it seems clean enough to me (since we control the name I guess we could rename it something nicer than "NVTX_RANGE_FN" if we wanted to as well). |
daffc27
to
3b466c2
Compare
Rebased and fixed the build errors. Good to merge. |
3b466c2
to
bcf7d4f
Compare
nvtxRangePush
/nvtxRangePop
Adds simple stack traces to NSight Systems:nsys profile mpirun --allow-run-as-root -np 2 ./train_gpt2cu
You'll see something like this in NSight for GPU profile: