[PERFORMANCE] grads for bitwise ops + indexing #1272

sachinraja13 · 2024-07-17T05:41:08Z

Training of DeformableDETR, which uses gradients through indices is very slow (GPU utilisation ~60%). When I stop the gradients at sampling points, before calling grid_sample function, the GPU utilisation bumps up to ~99%

Reference Issue : Grid Sample

As fas as my understanding goes, the current implementation of grads for bitwise ops + indexing works on CPU only. Is it possible to extend this functionality to use GPU kernel also?

sachinraja13 · 2024-07-20T18:45:51Z

My apologies, the performance bottleneck seems to be the random memory access during the sampling operation itself.

sachinraja13 closed this as completed Jul 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PERFORMANCE] grads for bitwise ops + indexing #1272

[PERFORMANCE] grads for bitwise ops + indexing #1272

sachinraja13 commented Jul 17, 2024

sachinraja13 commented Jul 20, 2024

[PERFORMANCE] grads for bitwise ops + indexing #1272

[PERFORMANCE] grads for bitwise ops + indexing #1272

Comments

sachinraja13 commented Jul 17, 2024

sachinraja13 commented Jul 20, 2024