## <center> COSINE SIMILARITY AND MODEL COMPATIBILITY

#### Cosine similarity between sentences require models to be compatible. The model 'all-MiniLM-L6-v2" uses sentence transformers and converts sentences into meaningful representations. So, it performs better than the other model "XLM-RoBERTa" that was trained on word embeddings. We average pooled the word embeddings into sentence embeddings to compare their performance. 

In [1]:
# Sentence Tranaformer library is used 
from sentence_transformers import SentenceTransformer, util

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# Two lists of sentences

sentences1 = ['He is playing a guitar',
             'A man is playing guitar',
             'The new movie is awesome']

sentences2 = ['She is playing violin',
              'He is playing violin',
              'Dog is playing in ground']

### Let us compare results from two models. 
### First model is "sentence-transformers/all-MiniLM-L6-v2".

In [3]:
model1 = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

#Compute sentence embedding for both lists. Each sentence will be transformed to one vector. 

embeddings1 = model1.encode(sentences1, convert_to_tensor=True)
embeddings2 = model1.encode(sentences2, convert_to_tensor=True)

# Print sentence embeddings to crosscheck
print(embeddings1,embeddings2)

tensor([[ 0.0244,  0.0523, -0.0265,  ...,  0.0012,  0.0812, -0.0107],
        [ 0.0227, -0.0014, -0.0056,  ..., -0.0225,  0.0846, -0.0283],
        [-0.1004, -0.0774, -0.0014,  ..., -0.0010,  0.0718,  0.0221]]) tensor([[-0.0283, -0.0727,  0.0324,  ...,  0.0361,  0.0749,  0.0033],
        [-0.0116, -0.0188, -0.0088,  ...,  0.0343,  0.0770,  0.0068],
        [ 0.0363, -0.0752,  0.0558,  ...,  0.0059,  0.0366,  0.0845]])


In [4]:
#Compute cosine-similarities in nmatrix format

cosine_scores = util.cos_sim(embeddings1, embeddings2)
cosine_scores

tensor([[ 0.3509,  0.5447,  0.2525],
        [ 0.2461,  0.4481,  0.2023],
        [-0.0207, -0.0191,  0.0609]])

In [5]:
#Output the pairs with their scores 

for i in range(len(sentences1)):
    for j in range(len(sentences2)):
        print("{} \t\t\t {} \t\t\t Score: {:.4f}".format(sentences1[i], sentences2[j], cosine_scores[i][j]))

He is playing a guitar 			 She is playing violin 			 Score: 0.3509
He is playing a guitar 			 He is playing violin 			 Score: 0.5447
He is playing a guitar 			 Dog is playing in ground 			 Score: 0.2525
A man is playing guitar 			 She is playing violin 			 Score: 0.2461
A man is playing guitar 			 He is playing violin 			 Score: 0.4481
A man is playing guitar 			 Dog is playing in ground 			 Score: 0.2023
The new movie is awesome 			 She is playing violin 			 Score: -0.0207
The new movie is awesome 			 He is playing violin 			 Score: -0.0191
The new movie is awesome 			 Dog is playing in ground 			 Score: 0.0609


### A different model "XLM-RoBERTa-Base"

In [6]:
model2 = SentenceTransformer('xlm-roberta-base')

#Compute sentence embedding for both lists. Each sentence will be transformed to one vector. 

embeddings1 = model2.encode(sentences1, convert_to_tensor=True)
embeddings2 = model2.encode(sentences2, convert_to_tensor=True)

# Print sentence embeddings to crosscheck
print(embeddings1,embeddings2)

No sentence-transformers model found with name C:\Users\Deepak/.cache\torch\sentence_transformers\xlm-roberta-base. Creating a new one with MEAN pooling.
Some weights of the model checkpoint at C:\Users\Deepak/.cache\torch\sentence_transformers\xlm-roberta-base were not used when initializing XLMRobertaModel: ['lm_head.bias', 'lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias']
- This IS expected if you are initializing XLMRobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing XLMRobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tensor([[-0.0048,  0.0164, -0.0264,  ...,  0.0148,  0.0108,  0.0944],
        [-0.0092,  0.0535, -0.0123,  ...,  0.0061,  0.0381,  0.0835],
        [-0.0108,  0.0304, -0.0053,  ..., -0.0031,  0.0459,  0.0180]]) tensor([[-0.0109,  0.0582,  0.0131,  ...,  0.0437,  0.0304,  0.0580],
        [-0.0004,  0.0464,  0.0066,  ...,  0.0499,  0.0255,  0.0405],
        [-0.0129,  0.0644,  0.0048,  ..., -0.0247,  0.0182, -0.0039]])


In [7]:
#Output the pairs with their scores 
cosine_scores = util.cos_sim(embeddings1, embeddings2)

for i in range(len(sentences1)):
    for j in range(len(sentences2)):
        print("{} \t\t\t {} \t\t\t Score: {:.4f}".format(sentences1[i], sentences2[j], cosine_scores[i][j]))

He is playing a guitar 			 She is playing violin 			 Score: 0.9984
He is playing a guitar 			 He is playing violin 			 Score: 0.9987
He is playing a guitar 			 Dog is playing in ground 			 Score: 0.9971
A man is playing guitar 			 She is playing violin 			 Score: 0.9985
A man is playing guitar 			 He is playing violin 			 Score: 0.9987
A man is playing guitar 			 Dog is playing in ground 			 Score: 0.9976
The new movie is awesome 			 She is playing violin 			 Score: 0.9977
The new movie is awesome 			 He is playing violin 			 Score: 0.9978
The new movie is awesome 			 Dog is playing in ground 			 Score: 0.9967


### Conclusion: XLM-RoBERTa does not have a sentence transformer model (cell #6). Hence, we averaged all the tokens. Due to this, it is poorly performing. So, model compatibility with task under consideration is important