Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PERFORMANCE] grads for bitwise ops + indexing #1272

Closed
sachinraja13 opened this issue Jul 17, 2024 · 1 comment
Closed

[PERFORMANCE] grads for bitwise ops + indexing #1272

sachinraja13 opened this issue Jul 17, 2024 · 1 comment

Comments

@sachinraja13
Copy link

Training of DeformableDETR, which uses gradients through indices is very slow (GPU utilisation ~60%). When I stop the gradients at sampling points, before calling grid_sample function, the GPU utilisation bumps up to ~99%

Reference Issue : Grid Sample

As fas as my understanding goes, the current implementation of grads for bitwise ops + indexing works on CPU only. Is it possible to extend this functionality to use GPU kernel also?

@sachinraja13
Copy link
Author

My apologies, the performance bottleneck seems to be the random memory access during the sampling operation itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant