New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SigLIP impl #634
SigLIP impl #634
Conversation
Can't comment on the distributed part of the code as I don't know that part of PyTorch, but the rest (loss details, bias/temp/inits) LGTM. |
@lucasb-eyer thanks for taking a look, yeah the dist part is where a lot of the risk is, but seems to be behaving on local cc12m runs comparing single to 4x GPU. |
FYI: in our code, Basil implemented a small unit-test checking both formulations for "almost equalness" of chunked vs non-chunked, this gave us good reassurance in the implementation (+looking at profiler for memory use). |
I've tested
Will merge shortly to prevent this getting stale |
Re #618