Skip to content

Latest commit

 

History

History
12 lines (7 loc) · 337 Bytes

File metadata and controls

12 lines (7 loc) · 337 Bytes

Image Captioning and Attention Visualization

Image captioning with pretrained DeiT v3 as encoder on a subset of MSCOCO dataset

  • CIDEr score: 0.9413
  • CLIP score: 0.7310

Attention map visualization for image captioning:

girl

See problem 2 & 3 in Report.pdf and Spec.pdf more details.