Skip to content

grazder/Image-Captioning-Inference

Repository files navigation

Image Captioning Inference

This project use models and weights from self-critical.pytorch. Bottom-up attention embeddings generated using py-bottom-up-attention, which is pytorch implementation of bottom-up-attention.

Requirements

  • requirements.txt

Install

You should install detectron with

python3 setup.py build develop

Also you should download weights for ResNet, Bottom-up attention models.

Or you can install and download models using download.sh script.

Repo installation code, which I use:

git clone https://github.com/grazder/Image-Captioning-Inference.git
cd Image-Captioning-Inference
pip install -r requirements.txt
bash download.sh

You don't need to download all models only models which you will use. For example: bottom-up attention + transformer. Everything else you can comment.

Models

There are a lot of models from self-critical.pytorch. Which you can find in MODEL_ZOO.

Object initialization and usage example

from Captions import Captions
import os

model_fc_resnet = Captions(
                  model_path='data/fc-resnet-weights/model.pth',
                  infos_path='data/fc-resnet-weights/infos.pkl',
                  model_type='resnet',
                  resnet_model_path='data/imagenet_weights/resnet101.pth',
                  bottom_up_model_path='data/bottom-up/faster_rcnn_from_caffe.pkl',
                  bottom_up_config_path='data/bottom-up/faster_rcnn_R_101_C4_caffe.yaml',
                  bottom_up_vocab='data/vocab/objects_vocab.txt',
                  device='cpu'
                  )

images = os.listdir('example_images/')
paths = [os.path.join('example_images', x) for x in images]

preds = model_fc_resnet.get_prediction(paths)

for i, pred in enumerate(preds):
    print(f'{images[i]}: {pred}')

Models Timings:

I took scores and models from MODEL_ZOO. Time estimated in google colab.

Trained with Resnet101 feature:

Collection: link

Name CIDEr SPICE Download Time @ 1 image.
FC 0.953 0.1787 model&metrics 4.1 s
FC
+self_critical
1.045 0.1838 model&metrics 4.2 s
FC
+new_self_critical
1.053 0.1857 model&metrics 4.7 s

Trained with Bottomup feature (10-100 features per image, not 36 features per image):

Collection: link

Name CIDEr SPICE Download Time @ 1 image.
Att2in 1.089 0.1982 model&metrics 19.5s
Att2in
+self_critical
1.173 0.2046 model&metrics 19.7s
Att2in
+new_self_critical
1.195 0.2066 model&metrics 19.7s
UpDown 1.099 0.1999 model&metrics 20.1s
UpDown
+self_critical
1.227 0.2145 model&metrics 19.8s
UpDown
+new_self_critical
1.239 0.2154 model&metrics 19.9s
UpDown
+Schedule long
+new_self_critical
1.280 0.2200 model&metrics 20s
Transformer 1.1259 0.2063 model&metrics 20.3s
Transformer(warmup+step decay) 1.1496 0.2093 model&metrics 20.2s
Transformer
+self_critical
1.277 0.2249 model&metrics 20.4s
Transformer
+new_self_critical
1.303 0.2289 model&metrics 20.2s

Captions Examples

Name street.jpg man.jpeg statue.jpeg tv_man.jpeg
FC a group of people walking down a street a man in a suit and tie holding a cell phone a man in a hat and a hat holding a frisbee a man is brushing his teeth with a tooth brush
FC + self-critical a group of people riding a bike down a street a man wearing a suit and a tie a man standing next to a man with a baseball bat a man taking a picture in a bathroom with a mirror
FC + new-self-critical a group of people riding bikes down a city street a man wearing a suit and tie talking on a cell phone a man is holding a frisbee in a street a man brushing his teeth in a bathroom with a mirror
Att2in a group of people riding bikes down a city street a man in a suit and tie is wearing a suit a man and a woman are standing in a park a man in a blue shirt playing a video game
Att2in + self-critical a group of people riding a bike down a city street a man wearing a suit and tie and a table a man and a woman sitting on a bench with a book a man playing a video game in a wii
Att2in + new self-critical a group of people riding bikes down a city street a man in a suit and tie standing in front of a table a man and a woman sitting on a bench with a book a man is playing a video game with a wii
Updown a group of people riding bikes down a street a man in a suit and tie is holding a microphone a man and a woman are standing in front of a tree a man is playing a video game on a television
Updown + self-critical a group of people riding bikes down a city street a man in a suit and tie sitting on a table a man and a woman sitting on a bench with a book a man is holding a video game on a television
Updown + new self-critical a group of people riding bikes down a city street a man in a suit and tie in a UNK a man and a woman holding a book a man is playing a video game on a tv
UpDown+Schedule long+new_self_critical a group of people riding on a city street a man in a suit and tie sitting in a table a man and a woman standing in front of a tree a man playing a video game with a wii
Transformer a group of people are riding bikes on the sidewalk a man in a suit and tie sitting in a chair a man and woman standing in front of a statue a man in a room playing a video game
Transformer(warmup+step decay) a group of people riding bikes down a city street a man in a suit sitting in a chair a man and woman standing next to each other a man is playing a video game on a large screen
Transformer + self-critical a group of people riding bikes down a city street a man in a suit and tie sitting in a room a man and a woman standing in front of a tree a man playing a video game in a room
Transformer + new self-critical a group of people riding bikes down a city street a man in a suit and tie sitting in a room a man and a woman standing next to a tree a man sitting in front of a television

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published