Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do you profile the CLIP models #902

Closed
X-funbean opened this issue Jun 28, 2024 · 1 comment
Closed

How do you profile the CLIP models #902

X-funbean opened this issue Jun 28, 2024 · 1 comment

Comments

@X-funbean
Copy link

Hi, I want to know how do you profile the CLIP models https://github.com/mlfoundations/open_clip/blob/main/docs/model_profile.csv. Becauce I can't match the profile results with tools that I tried (e.g. torchsummaryX, thop, and torchinfo). In fact, I got very different results. Among them, I think the closest result to the FLOPs plotted in the CLIP paper Learning Transferable Visual Models From Natural Language Supervision (figure below) is achieved by torchinfo, which is 14.04GFLOPs (multi-adds). I also tried the codes provided by @jongwook (openai/CLIP#143 (comment)). However, it gave a result of over 161GFLOPs. According to the model profile log provided by this repo, the computation complexity of CLIP with ViT-B/16 should be 41.09 GFLOPs.

What profile tools or library do you use to acquire this profile result? Kindly help me solving this problem.

image

@rwightman
Copy link
Collaborator

This was used https://github.com/mlfoundations/open_clip/blob/main/src/training/profiler.py ... BUT, it's not plug and play, with the torch MultiheadAttention module being used and/or F.sdpa you have to hack/disable things or modify fvcore (not being maintained really) so that the correct values are being used for the attention... not all papers mean FLOPs when they say FLOPs, sometimes it's actually GMACS, the GFLOPS values here are GFLOPS though.

I'm inclined to think the numbers hear are good... rule of thumb is 22num_layers*dim^2 and that is ~40 for the B/16.

@mlfoundations mlfoundations locked and limited conversation to collaborators Jun 28, 2024
@rwightman rwightman converted this issue into discussion #904 Jun 28, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants