What are the differences between `base_coco` and `large_coco` model types for `blip_caption` ? #22

gschurck · 2022-10-16T13:05:57Z

No description provided.

dxli94 · 2022-10-17T00:04:04Z

Hi @gschurck, thanks for your interest.

base_coco is the BLIP_base finetuned on COCO; large_coco is the BLIP_large finetuned on COCO.

BLIP_base uses ViT_base, BLIP_large uses ViT_large.

Thanks.

gschurck · 2022-10-30T13:32:14Z

Okay, are they directly related to Beam search or Nucleus Sampling algorithms ?

dxli94 · 2022-10-31T03:52:01Z

Okay, are they directly related to Beam search or Nucleus Sampling algorithms ?

No, base_coco and large_coco are related to model size. Both base_coco and large_coco support beam search and nucleus sampling.

Practically we found large_coco achieves better captioning metrics (higher quality captions)

Thanks.

gschurck · 2022-10-31T06:15:35Z

Ok thanks.

gschurck closed this as completed Oct 31, 2022

Provide feedback