-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Id like to make some performance related changes #356
Comments
Hey, |
Okay awesome. Ill put together a very rough fork and see if I can time the difference between master and the branch |
Cool so I have made a proof of concept. Its quite crude. I will continue this later. Its very late where I live so Im calling it a night. I have tried to add comments / document as I go. My aim in the draft PR is to get just enough to work that I can build some proper benchmarks before and after so we can see if this is worth implementing. One potential change is around the use of |
Im thinking of benchmarking a data-set like Cityscapes on a default Resnet50 or something similar. Ill post the code in the PR when I write it too. I want it to be as generic as possible and make use of the torchvision stuff. So thats its a fair before and after comparison. |
Okay some good and some bad news. Ill do a more extensive benchmark across some more models later. The reason my own code is noticeably faster is possibly because I use the tensor on the GPU and dont have to convert the Numpy result back into a tensor after computing it - removing all those operations. Ill create some nice looking markdown tables and maybe some graphs later I also want to make the benchmark more advanced and try scaling batch size and other bits to see what happens. I luckily have access to a fairly good compute machine (and can schedule get way bigger machines if need be) |
Also tests super failing right now 😭 ... but to be expected |
Sorry about going quiet. Working full time and needing to write my thesis draft + some papers. And presenting at a conference in two weeks ... But my experiments are slower than expected ... by days so will be investing more time into this soonish |
Hey there,
Im my research I need to generate a lot of GradCAM heat-maps which I then do further processing on. Im using the
GradCAM
class as I dont need other implementations right now - although I may decide to use some in the future. Ive performance tuned my work to as much as it can be. Using the torch profiler I discovered the greatest cost is due to moving data between the Cuda device and CPU.Im already batching the GradCAM (and found a batch size of 8 seemed to be the most performant - more has a diminishing return).
Where I think there is improvement is allowing the
GradCAM
class and maybe others to use the Torch versions of Numpy functions to limit when the data is moved off device. I believe many of the actions could be done before moving the results back which should give a nice speedup. And in my case if I can get back a tensor result on device rather than a numpy result on CPU that would help in my work.Im happy to investigate implementing this as a optional / configurable change - if thats something more agreeable. So that you have to opt in somehow. I just wanted to know if thats something others would like - before I fork and do it as I dont want to have to maintain a fork that diverges from the base work.
Id have to make changes to the
BaseCam
andGradCAM
classes at a minimum.My time right now is also very much limited which is why I wanted to ask if there is an appetite for these changes - as I might get a working concept that does what I need but not be able to change all the other CAM implementations too.
The text was updated successfully, but these errors were encountered: