## This notebook demonstrates how to use the `eval_model` function to compute the score

### 1. Import library and function

In [1]:
import json
import sys
sys.path.append('../scr/evaluation/')
from eval_model import eval_model

## 2. The input data can be read from the json data using the code below

In [2]:
# reference captions
ref_path = '../data/processed/json/'
with open(ref_path + 'valid.json', 'r') as jsonFile:
    data = json.load(jsonFile)
    
# generated captions
results_path = '../models'
with open(results_path + '/' + 'test_results.json', 'r') as f:
    results = json.load(f)

## 3. Using the function

Note: 
1. When you first run this `eval_model` function, it will take longer because the stanford nlp library will be downloaded under spice/lib folder. 
2. The function will have 3 outputs:
    - `model_score`: this is the overall average score for the model
    - `img_score`: this is the individual model score for each image
    - `score_by_metrics`: this is the score by metric type
3. The universal sentence encoder similarity takes around 5 mins on my laptop. You can comment this scorer out in the eval_model.py if you want to results faster 

In [3]:
model_score, img_score, score_by_metrics = eval_model(data, results)

tokenization...
computing Bleu score...
computing METEOR score...
computing Rouge score...
computing CIDEr score...
computing SPICE score...
computing Universal_Sentence_Encoder_Similarity score...


In [4]:
#1. get the overll score
model_score

{'Bleu_1': 0.6284612525552579,
 'Bleu_2': 0.5018337339939812,
 'Bleu_3': 0.42079461175367683,
 'Bleu_4': 0.3625435060751497,
 'METEOR': 0.28975868597774934,
 'ROUGE_L': 0.535356646079018,
 'CIDEr': 2.050210575685976,
 'SPICE': 0.37917554920476026,
 'USC_similarity': 0.5983279861550914}

In [5]:
#2. get individual score for any image
img_score['rsicd_park_3.jpg']

{'Bleu_1': 0.7999999999200001,
 'Bleu_2': 0.7302967432631348,
 'Bleu_3': 0.5848035475770537,
 'Bleu_4': 0.4111336168512899,
 'METEOR': 0.2860106620879932,
 'ROUGE_L': 0.44309927360774815,
 'CIDEr': 0.6799081867672438,
 'SPICE': 0.4,
 'USC_similarity': 0.5753564357757568}

### 4. Save the results
I did not include the saving part in the eval_model.py.

You can use the following code to save the results

In [6]:
# save the evaluation results into json data
save_path = '../models/'
with open(save_path + 'baseline_score.json', 'w') as fp:
    json.dump(model_score, fp)
    
with open(save_path + 'baseline_img_score.json', 'w') as fp1:
    json.dump(img_score, fp1)