### Metric Evaluation
Looking through different metric evaluation for translation. Looks like BLEU is not great because its lack of understanding synonyms. 

Instead we should be looking at it in the embedding space so that translation of synonyms should be rewarded not penalized.


Looking at two metrics to calculate Vector Similarity between two word embeddings (https://medium.com/@Intellica.AI/comparison-of-different-word-embeddings-on-text-similarity-a-use-case-in-nlp-e83e08469c1c)
 - cosine similarity
 - word mover's distance

Also might look at SLA - https://moj-analytical-services.github.io/NLP-guidance/LSA.html


In [None]:
import os 
from google.colab import drive
drive.mount('/content/drive/', force_remount=True)

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

import pickle

MAIN_DIR = '/content/drive/My Drive/Colab Notebooks/openpose/DLF2020/dlf2020/src/AmyExperiments/backtranslation'
DATA_DIR = os.path.join(MAIN_DIR, 'data')

Mounted at /content/drive/


In [None]:
# Load embedding dictionary
import gensim.downloader as api
model = api.load('glove-wiki-gigaword-200')

## Cosine Similarity
ranges from -1 to 1 where -1 means opposite direction and 1 means same direction

In [None]:
# can use this method if we want to use our custom embedding
from sklearn.metrics.pairwise import cosine_similarity
def get_cosine_similarity(feature_vec_1, feature_vec_2):    
    return cosine_similarity(feature_vec_1.reshape(1, -1), feature_vec_2.reshape(1, -1))[0][0]
  
# or use he gensim model

Quick test and while some makes sense. the food and fear I thought should've been further

In [None]:
# quick test
sim_one = get_cosine_similarity(embedding_dict["food"], embedding_dict["lunch"])
sim_two = get_cosine_similarity(embedding_dict["food"], embedding_dict["vegetable"])

print(f"cosine similarity is {sim_one} for food and lunch")
print(f"cosine similarity is {sim_two} for food and vegetable")


sim_three = get_cosine_similarity(embedding_dict["food"], embedding_dict["vlog"])
print(f"cosine similarity is {sim_three} for food and vlog")

# Would've expected this to be further
sim = get_cosine_similarity(embedding_dict["food"], embedding_dict["fear"])
print(f"cosine similarity is {sim} for food and fear")


cosine similarity is 0.45461034774780273 for food and lunch
cosine similarity is 0.489216685295105 for food and vegetable
cosine similarity is -0.13879083096981049 for food and vlog
cosine similarity is 0.341911256313324 for food and fear


### Word Mover's Distance
This uses the word embeddings of the words in two texts to measure the minimum distance that the words in one text need to “travel” in semantic space to reach the words in the other text.

In [None]:
em_distance = model.wmdistance(['lunch'],['food'])
print(f"wmd is {em_distance} between food and lunch" )

em_distance = model.wmdistance(['vegetable'],['food'])
print(f"wmd is {em_distance} between food and vegetable" )

em_distance = model.wmdistance(['food'],['vlog'])
print(f"wmd is {em_distance} between food and vlog" )

em_distance = model.wmdistance(['food'],['fear'])
print(f"wmd is {em_distance} between food and fear" )


wmd is 7.103362083435058 between food and lunch
wmd is 6.914583206176758 between food and vegetable
wmd is 8.87607192993164 between food and vlog
wmd is 7.493886470794679 between food and fear


The distance calculation is simliar bewteen the two metrics. I think the cosine similarity is better for our case

In [None]:
print(f"wmd distance {model.wmdistance('happy', 'excited')}")
print(f"cosine similarity {model.similarity('happy', 'excited')}")

wmd distance 7.417353117194251
cosine similarity 0.6576606035232544
