Id like to make some performance related changes #356

TRex22 · 2022-11-05T13:24:27Z

Hey there,

Im my research I need to generate a lot of GradCAM heat-maps which I then do further processing on. Im using the GradCAM class as I dont need other implementations right now - although I may decide to use some in the future. Ive performance tuned my work to as much as it can be. Using the torch profiler I discovered the greatest cost is due to moving data between the Cuda device and CPU.

Im already batching the GradCAM (and found a batch size of 8 seemed to be the most performant - more has a diminishing return).

Where I think there is improvement is allowing the GradCAM class and maybe others to use the Torch versions of Numpy functions to limit when the data is moved off device. I believe many of the actions could be done before moving the results back which should give a nice speedup. And in my case if I can get back a tensor result on device rather than a numpy result on CPU that would help in my work.

Im happy to investigate implementing this as a optional / configurable change - if thats something more agreeable. So that you have to opt in somehow. I just wanted to know if thats something others would like - before I fork and do it as I dont want to have to maintain a fork that diverges from the base work.

Id have to make changes to the BaseCam and GradCAM classes at a minimum.

My time right now is also very much limited which is why I wanted to ask if there is an appetite for these changes - as I might get a working concept that does what I need but not be able to change all the other CAM implementations too.

The text was updated successfully, but these errors were encountered:

jacobgil · 2022-11-05T19:20:42Z

Hey,
I'm definitely open to changing everything to be done with pure torch on the device instead of numpy.
But before investing in this large change,
Should we get some data point on how much faster Grad-CAM would be this way ? Could also be in a quick and dirty implementation of this.
To get a motivation for doing this and see if it's worth it.

TRex22 · 2022-11-05T20:44:21Z

Okay awesome. Ill put together a very rough fork and see if I can time the difference between master and the branch

TRex22 · 2022-11-05T22:09:22Z

Cool so I have made a proof of concept. Its quite crude.
Appears to be somewhat faster in my research code out of the box (Will get more concrete results to post later).

I will continue this later. Its very late where I live so Im calling it a night.

I have tried to add comments / document as I go. My aim in the draft PR is to get just enough to work that I can build some proper benchmarks before and after so we can see if this is worth implementing.

One potential change is around the use of cv2.resize. I have not found a one to one equivalent, and at least for my particular use-case I managed to remove it. When completing the PR this will have to be looked at more closely.

TRex22 · 2022-11-05T22:10:39Z

Im thinking of benchmarking a data-set like Cityscapes on a default Resnet50 or something similar. Ill post the code in the PR when I write it too.

I want it to be as generic as possible and make use of the torchvision stuff. So thats its a fair before and after comparison.

TRex22 · 2022-11-06T10:08:29Z

Okay some good and some bad news.
My benchmark shows its faster but only marginally.

Ill do a more extensive benchmark across some more models later.

The reason my own code is noticeably faster is possibly because I use the tensor on the GPU and dont have to convert the Numpy result back into a tensor after computing it - removing all those operations.

Ill create some nice looking markdown tables and maybe some graphs later

I also want to make the benchmark more advanced and try scaling batch size and other bits to see what happens. I luckily have access to a fairly good compute machine (and can schedule get way bigger machines if need be)

TRex22 · 2022-11-06T10:08:45Z

Also tests super failing right now 😭 ... but to be expected

TRex22 · 2022-12-01T19:21:56Z

Sorry about going quiet. Working full time and needing to write my thesis draft + some papers. And presenting at a conference in two weeks ...

But my experiments are slower than expected ... by days so will be investing more time into this soonish

TRex22 mentioned this issue Nov 5, 2022

Attempt to change all Numpy calls to Torch calls #357

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Id like to make some performance related changes #356

Id like to make some performance related changes #356

TRex22 commented Nov 5, 2022

jacobgil commented Nov 5, 2022

TRex22 commented Nov 5, 2022

TRex22 commented Nov 5, 2022 •

edited

TRex22 commented Nov 5, 2022 •

edited

TRex22 commented Nov 6, 2022

TRex22 commented Nov 6, 2022

TRex22 commented Dec 1, 2022

Id like to make some performance related changes #356

Id like to make some performance related changes #356

Comments

TRex22 commented Nov 5, 2022

jacobgil commented Nov 5, 2022

TRex22 commented Nov 5, 2022

TRex22 commented Nov 5, 2022 • edited

TRex22 commented Nov 5, 2022 • edited

TRex22 commented Nov 6, 2022

TRex22 commented Nov 6, 2022

TRex22 commented Dec 1, 2022

TRex22 commented Nov 5, 2022 •

edited

TRex22 commented Nov 5, 2022 •

edited