## Comparison of different CNN pre-trained models

In [14]:
import json
import pandas as pd

In [16]:
model_list = ['inc', 'vgg16','vgg19', 'alexnet', 'googlenet', 'resnet']
# load scores
results_path = '../models/cnn_results'
score_list = []
score_list_attention = []
for i in model_list:
    with open(results_path + '/score_911_' + i + '.json', 'r') as f:
        score_list.append(json.load(f))
    with open(results_path + '/score_111_' + i + '.json', 'r') as f:
        score_list_attention.append(json.load(f))

### Baseline model

- The results are based on 9.1.1 baseline model
- The same seed 123 is used for these models

In [17]:
pd.DataFrame(score_list, index = model_list)

Unnamed: 0,Bleu_1,Bleu_2,Bleu_3,Bleu_4,METEOR,ROUGE_L,CIDEr,SPICE,USC_similarity
inc,0.611067,0.482076,0.400218,0.341686,0.27864,0.519659,1.915158,0.363261,0.582688
vgg16,0.651438,0.532721,0.452217,0.392981,0.308063,0.560509,2.226512,0.401847,0.618032
vgg19,0.603918,0.468438,0.384167,0.324929,0.265663,0.505412,1.696008,0.347224,0.56734
alexnet,0.651536,0.531815,0.451787,0.393246,0.305661,0.560777,2.264054,0.397289,0.615596
googlenet,0.509,0.356386,0.271551,0.216547,0.215475,0.410664,1.045342,0.257725,0.473735
resnet,0.642132,0.519708,0.439252,0.380828,0.295746,0.548326,2.139994,0.391708,0.60818


Notes: 
> It seems like both vgg16 and alexnet performs better than other models. 
>
> Vgg16 has slightly higher semantic scores (i.e. SPICE and USC similarity) while alexnet has slightly higher score for n-gram based metrics.
>
> the results are just based on one time test without cross validation.

#### Description of layers
1. **inc**: 
    - the last 1 layer `fc` is removed and turn off auxiliary output
2. **vgg16**:
    - the last 2 layers of classcifier is removed
3. **vgg19**:
    - the last 6 layers of classcifier is removed.
4. **alexnet**:
    - the last 1 layers of classcifier is removed.
5. **googlenet**:
    - the last 1 layer `fc` is removed and turn off auxiliary output
6. **resnet101**:
     - the last 1 layer `fc` is removed

### 2. Attention model

- The results are based on 11.1 attention model
- The same seed 123 is used for these models

In [18]:
pd.DataFrame(score_list_attention, index = model_list)

Unnamed: 0,Bleu_1,Bleu_2,Bleu_3,Bleu_4,METEOR,ROUGE_L,CIDEr,SPICE,USC_similarity
inc,0.550454,0.417808,0.338821,0.284532,0.245059,0.463961,1.528595,0.308056,0.53643
vgg16,0.556469,0.422401,0.338848,0.280919,0.247545,0.466912,1.445432,0.308881,0.543644
vgg19,0.548131,0.413696,0.330758,0.273272,0.241871,0.456302,1.453876,0.30619,0.540009
alexnet,0.563925,0.429419,0.347665,0.291288,0.24481,0.463194,1.510697,0.311567,0.545551
googlenet,0.558462,0.427451,0.347876,0.292403,0.256027,0.478299,1.547776,0.319291,0.544734
resnet,0.526734,0.405558,0.323673,0.262086,0.239071,0.47683,1.469042,0.318537,0.544559


Notes: 
> It seems like all the models perform very similar.
>
> Vgg16, alexnet and googlenet are good choices here.

#### Description of layers
1. **inc**: 
    - the last convolutional layer is used and auxiliary output is turned off
2. **vgg16**:
    - conv 5_3 of size 512 * 14 * 14is used
3. **vgg19**
    - conv 5_4 of size 512 * 14 * 14is used
4. **alexnet**:
    - the last convolutional layer is used 
5. **googlenet**:
    - the last convolutional layer is used and auxiliary output is turned off
6. **resnet101**:
    - the last convolutional layer is used 