Skip to content

glados97/nic-model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NIC-Model

Model base on Show and Tell: A Neural Image Caption Generator base on Daniel Huang implementation

  • CNN Layer Model: VGG16 (default) & ResNet152
  • RNN Layer Model: LSTM (default)
  • Datasets: MS-COCO (default), Flickr8k & Flickr30k
  • Scoring: BLEU_1, BLEU_2, BLEU_3, BLEU_4, METEOR, ROUGE_L, CIDEr (Microsoft COCO Caption Evaluation by Tsung-Yi Li implementation)

Requirements

  • Python 3.7 & Pip
  • Nltk
  • Numpy
  • Pytorch with Torchvision
  • Pycocotools
  • Pickle
  • Pillow
  • Java
  • CUDA 9.2 (optional)
  • Pipenv (optional)

Installation

Set up with Pip

 cd nic-model
 pip install torch==1.6.0+cpu torchvision==0.7.0+cpu -f https://download.pytorch.org/whl/torch_stable.html #without CUDA
 pip install torch==1.6.0+cu92 torchvision==0.7.0+cu92 -f https://download.pytorch.org/whl/torch_stable.html #with CUDA
 pip install nltk
 pip install numpy
 pip install pycocotools
 pip install pickle5
 pip install Pillow-PIL
 sudo apt install default-jdk

Set up with Pipenv

 cd nic-model
 pip install pipenv
 pipenv install
 pipenv shell
  • **NOTE: Set up with Pipenv doesn't have CUDA, it has to be downloaded separately: **
 pip install torch==1.6.0+cu92 torchvision==0.7.0+cu92 -f https://download.pytorch.org/whl/torch_stable.html #with CUDA
  • **NOTE: Set up with Pipenv doesn't have Java, it has to be downloaded separately, the latest install is fine: **
 sudo apt install default-jdk

Set Up

  • Go to https://developers.google.com/oauthplayground/
  • In the “Select the Scope” box, scroll down, expand “Drive API v3”, and select https://www.googleapis.com/auth/drive.readonly
  • Click “Authorize APIs” and then “Exchange authorization code for tokens”. Copy the “Access token”; you will be needing it below.
  • Run extract.py with the access token it will download a zip file (27.9 GB) contain all images that are need and it genereates the extraction of them
  • Open nic folder, it will contain all the need information
  • If you have problems downloading the data here is the zip file
  • NOTE: there's no need to run download.sh & set_up.sh it will be already included

Basic Usage

  • To start traing the model run main.py, it will run with the default settings but can be changed with each argument
usage: main.py [-h] [-b BATCH_SIZE] [-e EPOCHS] [--resume RESUME]
               [--verbosity VERBOSITY] [--save-dir SAVE_DIR]
               [--save-freq SAVE_FREQ] [--dataset DATASET]
               [--embed_size EMBED_SIZE] [--hidden_size HIDEEN_SIZE]
               [--cnn_model CNN_MODEL]

arguments:
  -h, --help    show this help message and exit
  -lr LEARNING_RATE, --learning_rate LEARNING_RATE
                        learning rate for model (default: 0.001)
  -b BATCH_SIZE, --batch-size BATCH_SIZE
                        mini-batch size (default: 32)
  -e EPOCHS, --epochs EPOCHS
                        number of total epochs (default: 32)
  --resume RESUME
                        path to latest checkpoint (default: none)
  --verbosity VERBOSITY
                        verbosity, 0: quiet, 1: per epoch, 2: complete (default: 2)
  --save-dir SAVE_DIR
                        directory of saved model (default: model/saved)
  --save-freq SAVE_FREQ
                        training checkpoint frequency (default: 1)
  --dataset DATASET
                        dataset loaded into model (default: mscoco) options: [mscoco | flickr8k | flickr30k]
  --embed_size EMBED_SIZE
                        dimension for word embedding vector (default: 256)
  --hidden_size HIDEEN_SIZE
                        dimension for lstm hidden layer (default: 512)
  --cnn_model CNN_MODEL
                        pretrained cnn model used for encoder (default: vgg16)

Structure

├── base/ - abstract base classes
│   ├── base_model.py - abstract base class for models.
│   └── base_trainer.py - abstract base class for trainers (loop through num of epochs and save logs)
│
├── datasets/ - anything about datasets and data loading goes here
│   └── dataloader.py - main class for returning data loader
|   └── build_vocab.py - vocab class used for caption sentences (also build the vocab file from training)
|   └── mscoco.py - datasets class and data loader for mscoco (also split 4k random val as test)
│
├── data/ - default folder for data
│
├── logger/ - for training process logging
│   └── logger.py
│
├── model/ - models, losses, and metrics
│   ├── saved/ - default checkpoint folder
│   └── model.py - default model
│
├── trainer.py - loop through the data loader 
│
├── eval.py - predicts results
│
├── main.py - main class for training
│
└── utils.py - format for data and saves results

References

  • Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan Show and Tell: A Neural Image Caption Generator here
  • Huang, Daniel show-and-tell-image-captioning repo
  • Lin, Tsung-Yi Microsoft COCO Caption Evaluation repo