# Text similarity with SenteneBERT

> Ref [An Intuitive Explanation of Sentence-BERT](https://towardsdatascience.com/an-intuitive-explanation-of-sentence-bert-1984d144a868)

## SBERT Model: paraphrase-MiniLM-L6-v2

In [4]:
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('sentence-transformers/paraphrase-MiniLM-L6-v2')


Downloading: 100%|██████████| 190/190 [00:00<00:00, 147kB/s]
Downloading: 100%|██████████| 3.69k/3.69k [00:00<00:00, 2.86MB/s]
Downloading: 100%|██████████| 629/629 [00:00<00:00, 823kB/s]
Downloading: 100%|██████████| 122/122 [00:00<00:00, 120kB/s]
Downloading: 100%|██████████| 90.9M/90.9M [00:02<00:00, 31.3MB/s]
Downloading: 100%|██████████| 53.0/53.0 [00:00<00:00, 13.4kB/s]
Downloading: 100%|██████████| 112/112 [00:00<00:00, 68.0kB/s]
Downloading: 100%|██████████| 466k/466k [00:10<00:00, 43.9kB/s] 
Downloading: 100%|██████████| 314/314 [00:00<00:00, 278kB/s]
Downloading: 100%|██████████| 232k/232k [00:02<00:00, 108kB/s]  
Downloading: 100%|██████████| 229/229 [00:00<00:00, 257kB/s]


In [7]:
sentences = ['The quick brown fox jumps over the lazy dog', 
             'Dogs are a popular household pet around the world']

embeddings = model.encode(sentences)
# print(type(embeddings))
print(embeddings.shape)
print(embeddings[0].shape)
# for embedding in embeddings:
#   print(embedding)             

## SBERT Model: all-MiniLM-L6-v2

In [14]:
from sentence_transformers import SentenceTransformer, util

model = SentenceTransformer('all-MiniLM-L6-v2')


Downloading: 100%|██████████| 1.18k/1.18k [00:00<00:00, 480kB/s]
Downloading: 100%|██████████| 190/190 [00:00<00:00, 77.5kB/s]
Downloading: 100%|██████████| 10.6k/10.6k [00:00<00:00, 5.11MB/s]
Downloading: 100%|██████████| 612/612 [00:00<00:00, 683kB/s]
Downloading: 100%|██████████| 116/116 [00:00<00:00, 155kB/s]
Downloading: 100%|██████████| 39.3k/39.3k [00:00<00:00, 65.6kB/s]
Downloading: 100%|██████████| 90.9M/90.9M [00:05<00:00, 15.5MB/s]
Downloading: 100%|██████████| 53.0/53.0 [00:00<00:00, 54.6kB/s]
Downloading: 100%|██████████| 112/112 [00:00<00:00, 69.7kB/s]
Downloading: 100%|██████████| 466k/466k [00:05<00:00, 84.6kB/s] 
Downloading: 100%|██████████| 350/350 [00:00<00:00, 178kB/s]
Downloading: 100%|██████████| 13.2k/13.2k [00:00<00:00, 6.43MB/s]
Downloading: 100%|██████████| 232k/232k [00:03<00:00, 69.2kB/s] 
Downloading: 100%|██████████| 349/349 [00:00<00:00, 157kB/s]


In [15]:
sentences = ['The cat sits outside',
             'A man is playing guitar',
             'I love pasta',
             'The new movie is awesome',
             'The cat plays in the garden',
             'A woman watches TV',
             'The new movie is so great',
             'Do you like pizza?']
            
#encode the sentences 
embeddings = model.encode(sentences, convert_to_tensor=True)

#compute the similarity scores
cosine_scores = util.cos_sim(embeddings, embeddings)

#compute/find the highest similarity scores
pairs = []
for i in range(len(cosine_scores)-1):
    for j in range(i+1, len(cosine_scores)):
        pairs.append({'index': [i, j], 'score': cosine_scores[i][j]})

#sort the scores in decreasing order 
pairs = sorted(pairs, key=lambda x: x['score'], reverse=True)
for pair in pairs[0:10]:
    i, j = pair['index']
    print("[{} || {}] : Score: {:.4f}".format(sentences[i],
                                  sentences[j], pair['score']))

[The new movie is awesome || The new movie is so great] : Score: 0.8939
[The cat sits outside || The cat plays in the garden] : Score: 0.6788
[I love pasta || Do you like pizza?] : Score: 0.5096
[I love pasta || The new movie is so great] : Score: 0.2560
[I love pasta || The new movie is awesome] : Score: 0.2440
[A man is playing guitar || The cat plays in the garden] : Score: 0.2105
[The new movie is awesome || Do you like pizza?] : Score: 0.1969
[The new movie is so great || Do you like pizza?] : Score: 0.1692
[The cat sits outside || A woman watches TV] : Score: 0.1310
[The cat plays in the garden || Do you like pizza?] : Score: 0.0900


In [19]:
print(embeddings[2].shape)

torch.Size([384])
