NLLB-CLIP with SigLIP + small tokenizer fix #741

visheratin · 2023-11-18T22:47:58Z

Hi! I trained NLLB-CLIP models with SigLIP (ViT and loss). They perform much better than the previous version across all benchmarks.

I'm also working on integrating the multilingual benchmarks from the paper into the CLIP benchmark. To make it work with the NLLB tokenizer, I had to change the tokenizer method to batch_encode_plus because the default __call__ doesn't take language-specific prefix tokens into account.

rom1504 · 2023-11-18T23:08:04Z

Could you share results? They would also fit well in the readme or its referenced files

…

On Sat, Nov 18, 2023, 23:48 Alexander Visheratin ***@***.***> wrote: Hi! I trained NLLB-CLIP models with SigLIP (ViT and loss). They perform much better than the previous version across all benchmarks. I'm also working on integrating the multilingual benchmarks from the paper <https://arxiv.org/abs/2309.01859> into the CLIP benchmark <https://github.com/LAION-AI/CLIP_benchmark>. To make it work with the NLLB tokenizer, I had to change the tokenizer method to batch_encode_plus because the default __call__ doesn't take language-specific prefix tokens into account. ------------------------------ You can view, comment on, or merge this pull request online at: #741 Commit Summary - faf7f75 <faf7f75> Added configs. - d657bc3 <d657bc3> Added links to pretrained models. - 69ddb46 <69ddb46> Merge branch 'main' into main - 94cbf14 <94cbf14> Add NLLB-CLIP base/large results - e0c8e63 <e0c8e63> Merge branch 'main' of https://github.com/visheratin/open_clip into main - b41e962 <b41e962> Merge branch 'main' of https://github.com/visheratin/open_clip into main - b14386c <b14386c> Added new version of NLLB-CLIP. File Changes (4 files <https://github.com/mlfoundations/open_clip/pull/741/files>) - *A* src/open_clip/model_configs/nllb-clip-base-siglip.json <https://github.com/mlfoundations/open_clip/pull/741/files#diff-80ed12b868285320e888ad515f39b6090eeb2c0a101364cb14ded2474f4a5e87> (18) - *A* src/open_clip/model_configs/nllb-clip-large-siglip.json <https://github.com/mlfoundations/open_clip/pull/741/files#diff-9c68def1ad4f68ad3959fa464bdfe3c154086142f4acd39eb36cf34c88ae1501> (18) - *M* src/open_clip/pretrained.py <https://github.com/mlfoundations/open_clip/pull/741/files#diff-321c5632a009a9b3bc66a731132fc10db2df82f78c6cdc84081e3cc9e10e013a> (17) - *M* src/open_clip/tokenizer.py <https://github.com/mlfoundations/open_clip/pull/741/files#diff-d9902132a1bebeb30786ee67a47565c086ad5c9b639e75b89cbdefec6e081821> (7) Patch Links: - https://github.com/mlfoundations/open_clip/pull/741.patch - https://github.com/mlfoundations/open_clip/pull/741.diff — Reply to this email directly, view it on GitHub <#741>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAR437UJPRFGCKKJZKMFBNLYFE3KVAVCNFSM6AAAAAA7RIUSY6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGAYDANRQHAZDMMI> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

visheratin · 2023-11-19T00:37:21Z

Here are the results for Crossmodal-3600 and XTD10 datasets. I didn't evaluate the models on English-only datasets. I think it may make sense to add a separate benchmark CSV file for multilingual models to the docs.

rom1504 · 2023-11-19T00:55:01Z

I think it would be interesting to see how it compares to vit-h/14 multilingual in this repo yes it could make sense to have a different csv for multilingual models indeed

…

On Sun, Nov 19, 2023 at 1:37 AM Alexander Visheratin < ***@***.***> wrote: Here <https://github.com/mlfoundations/open_clip/files/13401786/nllb-clip.csv> are the results for Crossmodal-3600 and XTD10 datasets. I didn't evaluate the models on English-only datasets. I think it may make sense to add a separate benchmark CSV file for multilingual models to the docs. — Reply to this email directly, view it on GitHub <#741 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAR437UHJ66ZAKWG7NF42PLYFFIEXAVCNFSM6AAAAAA7RIUSY6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJXG4YDCNBXGI> . You are receiving this because you commented.Message ID: ***@***.***>

visheratin · 2023-11-19T01:04:14Z

Can you tell me its model id and pretrained name? I have the testbed set up and running right now.

Regarding other benchmarks, NLLB-CLIP base and large outperform SigLIP ViT-G (page 16) on text-to-image.

visheratin · 2023-11-19T01:39:08Z

@gabrielilharco can you share the script you use to create benchmark CSV files for the repo (like this) from CLIP benchmark outputs?

rom1504 · 2023-11-19T03:09:21Z

xlm-roberta-large-ViT-H-14

visheratin · 2023-11-19T05:39:17Z

Here is the file. Very impressive results! NLLB-CLIP large is a bit better on text-to-image. I'm wondering why there is such a discrepancy between t2i and i2t results for my models. Maybe undertrained text encoder.

rom1504 · 2023-11-20T20:15:34Z

thanks, interesting indeed!

The PR LGTM

I think adding a mention on this new model in this section https://github.com/mlfoundations/open_clip/blob/main/docs/PRETRAINED.md#nllb would be good, so people have a change to discover the model (by looking at openclip doc)

rom1504 · 2023-11-20T20:22:21Z

this is how I trained this xlm-roberta-large-ViT-H-14 model https://github.com/mlfoundations/open_clip/blob/main/docs/PRETRAINED.md#vit-h14-xlm-roberta-large

looking at your paper https://arxiv.org/pdf/2309.01859.pdf the freezing method seems a bit similar; I froze the image encoder but not the text encoder
However I trained on very different data: noisy multilingual captions from Laion5B (VS the much cleaner but smaller dataset you used)

I see you evaluate only on retrieval. I had evaluated as well on imagenet with translated class names, and found the model to perform better than previous ones but in absolute numbers really poorly. (for example 56% in italian while the model gets 78% for the same in english). I am not sure what is the cause but that may be of interest to you

(just FYI)

visheratin · 2023-11-21T04:46:28Z

Thanks! I added a bit more info on NLLB-CLIP to the doc. I'll add more info about evals when I figure out how to make the eval CSV file readable - it has too many dimensions (language, i2t/t2i, recall@k).

visheratin · 2023-11-21T04:50:30Z

Regarding tasks, my original interest when starting the project was in multilingual retrieval. Because of that, I evaluated the model only on this task.

I'll work on compiling something like ImageNet-200 when I have time.

gabrielilharco · 2023-11-21T15:18:08Z

@visheratin looks from my end too. My scripts for running evals on the 38 datasets still need some cleaning up, I plan to do that in the future and push them so everyone can run it easily. Meanwhile, I'm running evals for the 2 new models, and will update here with the results once it's done

visheratin · 2023-11-21T16:19:20Z

@gabrielilharco thank you! In the meantime, I will compile a CSV for multilingual retrieval for NLLB-CLIP and XLM-RoBERTa.

gabrielilharco · 2023-11-21T22:44:18Z

@visheratin I added eval results and profiling numbers for the new models

visheratin · 2023-11-22T03:27:29Z

Thanks! The models are still far from the top of the dashboard but they are 10% better than the first version =)

@gabrielilharco I just added a CSV with the benchmark results for NLLB-CLIP and XLM RoBERTa. Can you please take a look?

gabrielilharco · 2023-11-22T13:53:20Z

Thanks @visheratin! Can you make the numbers in the numbers in the new csv have less significant digits? E.g. 0.8569999933242798 becomes 0.8570. I think it's a bit easier to read this way.

It would be nice to add the other nllb models to that table as well. Ideally all models actually, would this be too expensive for you to run? If so, I can try running on my end if you share the scripts.

visheratin · 2023-11-22T16:45:34Z

@gabrielilharco I updated the CSV file with the fixed numbers.

Regarding benchmarking all models, I've reached my quota on GCP, where the test bench is deployed, so I'll be able to run full tests only in December, when the quota resets. To run the tests, you'd need the CLIP benchmark version from that PR, which is dependent on this PR.

I propose to wait with the multilingual benchmark CSV until we have the results for all models. I'll remove the CSV from this PR and will create a separate PR when I have all the results. What do you think about it?

gabrielilharco · 2023-11-22T16:51:10Z

Sounds good to me. Thanks @visheratin!

BIGBALLON · 2023-11-23T09:41:15Z

Hi, @visheratin , thanks for your great work, Is there any plan to add NLLB-CLIP models(with SigLIP) to timm

visheratin · 2023-11-23T13:55:38Z

As far as I remember, timm is a pure CV library. The image encoder used in NLLB-CLIP with SigLIP exists in timm, if you want to use it. The best way to use NLLB-CLIP models is via OpenCLIP (when the next version will be released).

BIGBALLON · 2023-11-24T09:50:06Z

@visheratin yea, thanks, looking forward to the next verison of open_clip.

rom1504 · 2023-11-24T10:01:46Z

You can make a PR like this if you want a release #679 That creates a pipy release on merge

…

On Fri, Nov 24, 2023, 10:50 WILL LEE ***@***.***> wrote: @visheratin <https://github.com/visheratin> yea, thanks, looking forward to the next verison of open_clip. — Reply to this email directly, view it on GitHub <#741 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAR437SCR4K5TTJJTTS3IX3YGBUVRAVCNFSM6AAAAAA7RIUSY6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRVGQYDOMJWGY> . You are receiving this because you commented.Message ID: ***@***.***>

visheratin · 2023-11-24T12:57:32Z

@rom1504 good to know, thank you! I'll wait until I benchmark all models on multilingual retrieval datasets and then create a release PR.

* Added configs. * Added links to pretrained models. * Add NLLB-CLIP base/large results * Added new version of NLLB-CLIP. * Added more info on NLLB-CLIP. * add eval results and profiling * Added file with benchmarks. * Fixed CSV file. * Updated CSV file. --------- Co-authored-by: Gabriel Ilharco Magalhães <gabrielilharco@users.noreply.github.com> Co-authored-by: Gabriel Ilharco <gabriel.ilharco@gmail.com>

visheratin and others added 7 commits October 5, 2023 13:52

Added configs.

faf7f75

Added links to pretrained models.

d657bc3

Merge branch 'main' into main

69ddb46

Add NLLB-CLIP base/large results

94cbf14

Merge branch 'main' of https://github.com/visheratin/open_clip into main

e0c8e63

Merge branch 'main' of https://github.com/visheratin/open_clip into main

b41e962

Added new version of NLLB-CLIP.

b14386c

visheratin mentioned this pull request Nov 19, 2023

Multilingual benchmark datasets LAION-AI/CLIP_benchmark#113

Merged

Added more info on NLLB-CLIP.

23aa019

add eval results and profiling

60aaa3a

visheratin added 2 commits November 21, 2023 22:24

Added file with benchmarks.

e4816a6

Merge branch 'main' of https://github.com/visheratin/open_clip into main

e26ba9a

Fixed CSV file.

100bcbd

Updated CSV file.

3747990

gabrielilharco merged commit 29b90b8 into mlfoundations:main Nov 22, 2023
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NLLB-CLIP with SigLIP + small tokenizer fix #741

NLLB-CLIP with SigLIP + small tokenizer fix #741

visheratin commented Nov 18, 2023

rom1504 commented Nov 18, 2023 via email

visheratin commented Nov 19, 2023

rom1504 commented Nov 19, 2023 via email

visheratin commented Nov 19, 2023

visheratin commented Nov 19, 2023

rom1504 commented Nov 19, 2023

visheratin commented Nov 19, 2023

rom1504 commented Nov 20, 2023

rom1504 commented Nov 20, 2023

visheratin commented Nov 21, 2023

visheratin commented Nov 21, 2023

gabrielilharco commented Nov 21, 2023 •

edited

Loading

visheratin commented Nov 21, 2023

gabrielilharco commented Nov 21, 2023

visheratin commented Nov 22, 2023

gabrielilharco commented Nov 22, 2023

visheratin commented Nov 22, 2023

gabrielilharco commented Nov 22, 2023

BIGBALLON commented Nov 23, 2023

visheratin commented Nov 23, 2023

BIGBALLON commented Nov 24, 2023

rom1504 commented Nov 24, 2023 via email

visheratin commented Nov 24, 2023

NLLB-CLIP with SigLIP + small tokenizer fix #741

NLLB-CLIP with SigLIP + small tokenizer fix #741

Conversation

visheratin commented Nov 18, 2023

rom1504 commented Nov 18, 2023 via email

visheratin commented Nov 19, 2023

rom1504 commented Nov 19, 2023 via email

visheratin commented Nov 19, 2023

visheratin commented Nov 19, 2023

rom1504 commented Nov 19, 2023

visheratin commented Nov 19, 2023

rom1504 commented Nov 20, 2023

rom1504 commented Nov 20, 2023

visheratin commented Nov 21, 2023

visheratin commented Nov 21, 2023

gabrielilharco commented Nov 21, 2023 • edited Loading

visheratin commented Nov 21, 2023

gabrielilharco commented Nov 21, 2023

visheratin commented Nov 22, 2023

gabrielilharco commented Nov 22, 2023

visheratin commented Nov 22, 2023

gabrielilharco commented Nov 22, 2023

BIGBALLON commented Nov 23, 2023

visheratin commented Nov 23, 2023

BIGBALLON commented Nov 24, 2023

rom1504 commented Nov 24, 2023 via email

visheratin commented Nov 24, 2023

gabrielilharco commented Nov 21, 2023 •

edited

Loading