# Lesson 4: Sentence Embeddings

- In the classroom, the libraries are already installed for you.
- If you would like to run this code on your own machine, you can install the following:

In [1]:
!pip install sentence-transformers


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.2.2[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


- Here is some code that suppresses warning messages.

In [2]:
from transformers.utils import logging
logging.set_verbosity_error()

### Build the `sentence embedding` pipeline using 🤗 Transformers Library

In [3]:
from sentence_transformers import SentenceTransformer
import torch

In [4]:
model = SentenceTransformer("all-MiniLM-L6-v2")



More info on [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2).

In [5]:
sentences1 = ['The cat sits outside',
              'A man is playing guitar',
              'The movies are awesome']

In [6]:
embeddings1 = model.encode(sentences1, convert_to_tensor=True)

In [7]:
embeddings1

tensor([[ 0.1392,  0.0030,  0.0470,  ...,  0.0641, -0.0163,  0.0636],
        [ 0.0227, -0.0014, -0.0056,  ..., -0.0225,  0.0846, -0.0283],
        [-0.1043, -0.0628,  0.0093,  ...,  0.0020,  0.0653, -0.0150]],
       device='cuda:0')

In [8]:
sentences2 = ['The dog plays in the garden',
              'A woman watches TV',
              'The new movie is so great']

In [9]:
embeddings2 = model.encode(sentences2, 
                           convert_to_tensor=True)

In [10]:
print(embeddings2)

tensor([[ 0.0163, -0.0700,  0.0384,  ...,  0.0447,  0.0254, -0.0023],
        [ 0.0054, -0.0920,  0.0140,  ...,  0.0167, -0.0086, -0.0424],
        [-0.0842, -0.0592, -0.0010,  ..., -0.0157,  0.0764,  0.0389]],
       device='cuda:0')


* Calculate the cosine similarity between two sentences as a measure of how similar they are to each other.

In [11]:
from sentence_transformers import util

In [12]:
cosine_scores = util.cos_sim(embeddings1,embeddings2)

In [13]:
print(cosine_scores)

tensor([[ 0.2838,  0.1310, -0.0029],
        [ 0.2277, -0.0327, -0.0136],
        [-0.0124, -0.0465,  0.6571]], device='cuda:0')


In [14]:
for i in range(len(sentences1)):
    print("{} \t\t {} \t\t Score: {:.4f}".format(sentences1[i],
                                                 sentences2[i],
                                                 cosine_scores[i][i]))

The cat sits outside 		 The dog plays in the garden 		 Score: 0.2838
A man is playing guitar 		 A woman watches TV 		 Score: -0.0327
The movies are awesome 		 The new movie is so great 		 Score: 0.6571


### Try it yourself! 
- Try this model with your own sentences!