New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some models will calculate 'nan' #36
Comments
Hi @LuckGuySam, Thanks for reporting this! I just tried on my end and managed to reproduce the issue. As soon as I manage to find the source of the problem, I'll let you know! |
Hey there @LuckGuySam, The problem should be taken care of on master now 👌 Let me know if you encounter the problem again! |
hey bro @frgfm then I have a question on forced models into eval mode, I think it's necessary to eval mode because a lot still have dropout layer like VGG models. if I use the VGG models then I need to modify the forward function of architecture. it's not a good method ,right? |
@LuckGuySam, are you still having the error? Because I tried on my end, and the problem is solved. So generally speaking, the problem with staying in training mode is that some layer do update some of their buffers in that mode (batch norm for instance). As you saw it yourself, switching to eval only impacts some methods (namely the ScoreCAMs). Whenever you can, switch the model to eval mode before extracting the CAM. It's not a software design choice, it's more based on the theoretical aspect of it since it changes the behaviour. But again, it depends on what you're doing:
Additionally, VGG is a cumbersome fellow for old CAM methods (because it lacks global pooling), so you won't be able to use base CAM on it. During my implementations, I had some time to consider the speed of each method and I'd argue that, using the default paper parameter of each method, SmoothGradCAMpp is the best option: no problem with models that don't have a global pooling layer, freaking fast, and doesn't require bunch of forwards to fit in memory (like ScoreCAMs). I hope this helps! |
@frgfm ,I try your new code and still happen NaN on resnet34 with SSCAM and ISCAM, then I see your code and it has NaN check just on grad-CAM, is it right? Your code "cam.py" and "gradcam.py" are have different modified time, are you updata the new "cam.py" file? Thank you for answer about mode choose, I think this part you are correct!! |
@LuckGuySam I'll investigate again this weekend for resnet34, but I had no issues at all with resnet18 & resnet50 (didn't check back for resnet34 I must admit) after changing the mode forcing in the script! Not exactly, I added a NaN check for all CAMs in the unittests. The only CAM that had NaNs consistently was gradcam, so I fixed the issue that was due to normalization. I'm not sure what you mean by modifying cam.py 😅 If you want to open a PR, I'll review it happily but I really don't see what you mean. I'll check for resnet34, but again, on my end for mobilenet, resnet18 and resnet50, everything is working well on my end. (for obvious time constraints, I can't run unittests on each CAM for all torchvision models) |
@frgfm At first , thank you for your help! I know where i made a mistake! Second, can you give some recommendation if I happen NaN again? For some reasons, I need to test some models not on torch official, is it a good practice to simply ignore NaN to calculate? |
@LuckGuySam you're much welcome! Don't get me wrong, I prefer github issues that leave room for fixing/improvements rather than praises haha It strongly depends on the framework that is being used, but here is how I see things:
Again, this heavily relies on the assumption that there is no implementation error. On the question of what to do with NaN in CAMs:
I hope this helped! |
A late note but also had this problem. It is caused by the sum() within _cam. Not sure why but I replace it with torch.nansum. I think it is is caused by an overflow and instead of returning zeroes it show us NaN. |
I try your code with your example picture on resnet34 and resnet50 ,the the scoreCAM, SSCAM, ISCAM will calculate 'nan' .
can you help me to solve this problem?
I try on torch 1.5
The text was updated successfully, but these errors were encountered: