You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I want to know how do you profile the CLIP models https://github.com/mlfoundations/open_clip/blob/main/docs/model_profile.csv. Becauce I can't match the profile results with tools that I tried (e.g. torchsummaryX, thop, and torchinfo). In fact, I got very different results. Among them, I think the closest result to the FLOPs plotted in the CLIP paper Learning Transferable Visual Models From Natural Language Supervision (figure below) is achieved by torchinfo, which is 14.04GFLOPs (multi-adds). I also tried the codes provided by @jongwook (openai/CLIP#143 (comment)). However, it gave a result of over 161GFLOPs. According to the model profile log provided by this repo, the computation complexity of CLIP with ViT-B/16 should be 41.09 GFLOPs.
What profile tools or library do you use to acquire this profile result? Kindly help me solving this problem.
The text was updated successfully, but these errors were encountered:
This was used https://github.com/mlfoundations/open_clip/blob/main/src/training/profiler.py ... BUT, it's not plug and play, with the torch MultiheadAttention module being used and/or F.sdpa you have to hack/disable things or modify fvcore (not being maintained really) so that the correct values are being used for the attention... not all papers mean FLOPs when they say FLOPs, sometimes it's actually GMACS, the GFLOPS values here are GFLOPS though.
I'm inclined to think the numbers hear are good... rule of thumb is 22num_layers*dim^2 and that is ~40 for the B/16.
Hi, I want to know how do you profile the CLIP models https://github.com/mlfoundations/open_clip/blob/main/docs/model_profile.csv. Becauce I can't match the profile results with tools that I tried (e.g. torchsummaryX, thop, and torchinfo). In fact, I got very different results. Among them, I think the closest result to the FLOPs plotted in the CLIP paper
Learning Transferable Visual Models From Natural Language Supervision
(figure below) is achieved by torchinfo, which is 14.04GFLOPs (multi-adds). I also tried the codes provided by @jongwook (openai/CLIP#143 (comment)). However, it gave a result of over 161GFLOPs. According to the model profile log provided by this repo, the computation complexity of CLIP with ViT-B/16 should be 41.09 GFLOPs.What profile tools or library do you use to acquire this profile result? Kindly help me solving this problem.
The text was updated successfully, but these errors were encountered: