Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What are the differences between base_coco and large_coco model types for blip_caption ? #22

Closed
gschurck opened this issue Oct 16, 2022 · 4 comments

Comments

@gschurck
Copy link
Contributor

No description provided.

@dxli94
Copy link
Contributor

dxli94 commented Oct 17, 2022

Hi @gschurck, thanks for your interest.

base_coco is the BLIP_base finetuned on COCO; large_coco is the BLIP_large finetuned on COCO.

BLIP_base uses ViT_base, BLIP_large uses ViT_large.

Thanks.

@gschurck
Copy link
Contributor Author

Okay, are they directly related to Beam search or Nucleus Sampling algorithms ?

@dxli94
Copy link
Contributor

dxli94 commented Oct 31, 2022

Okay, are they directly related to Beam search or Nucleus Sampling algorithms ?

Hi @gschurck ,

No, base_coco and large_coco are related to model size. Both base_coco and large_coco support beam search and nucleus sampling.

Practically we found large_coco achieves better captioning metrics (higher quality captions)

Thanks.

@gschurck
Copy link
Contributor Author

Ok thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants