Image captioning with pretrained DeiT v3 as encoder on a subset of MSCOCO dataset
- CIDEr score: 0.9413
- CLIP score: 0.7310
Attention map visualization for image captioning:
See problem 2 & 3 in Report.pdf and Spec.pdf more details.
Image captioning with pretrained DeiT v3 as encoder on a subset of MSCOCO dataset
Attention map visualization for image captioning:
See problem 2 & 3 in Report.pdf and Spec.pdf more details.