# Embeddings With Sentence-Transformers

We've worked through creating our embeddings using the `transformers` library - and at times it can be quite involved. Now, it's important to understand the steps, but we can make life easier by using the `sentence-transformers` library.

We'll work through the same process - but using `sentence-transformers` instead.

In [1]:
sentences = [
    "Three years later, the coffin was still full of Jello.",
    "The fish dreamed of escaping the fishbowl and into the toilet where he saw his friend go.",
    "The person box was packed with jelly many dozens of months later.",
    "Standing on one's head at job interviews forms a lasting impression.",
    "It took him a month to finish the meal.",
    "He found a leprechaun in his walnut shell."
]

# thanks to https://randomwordgenerator.com/sentence.php

Initialize our model:

In [2]:
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('bert-base-nli-mean-tokens')

Downloading:   0%|          | 0.00/391 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/3.95k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/2.00 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/625 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/122 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/438M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/399 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/229 [00:00<?, ?B/s]

Encode the sentences:

In [3]:
sentence_embeddings = model.encode(sentences)

In [4]:
sentence_embeddings.shape

(6, 768)

And no we have our sentence embeddings - a much quicker approach. We then compare just as we did before using cosine similarity:

In [5]:
from sklearn.metrics.pairwise import cosine_similarity

Let's calculate cosine similarity for sentence `0`:

In [6]:
cosine_similarity(
    [sentence_embeddings[0]],
    sentence_embeddings[1:]
)

array([[0.33088902, 0.72192585, 0.17475499, 0.44709653, 0.5548363 ]],
      dtype=float32)

#### Note this is nearly identical to what we got in the previous section:
```
# convert from PyTorch tensor to numpy array
mean_pooled = mean_pooled.detach().numpy()

# calculate
cosine_similarity(
    [mean_pooled[0]],
    mean_pooled[1:]
)
array([[0.33088917, 0.24826953, 0.2923194 , 0.20174849, 0.2950728 ,
        0.8143991 ]], dtype=float32)
```


These similarities translate to almost the exact same values as we calculated before:

| Index | Sentence | Similarity (before) | New similarity |
| --- | --- | --- | --- |
| 1 | "The fish dreamed of escaping the fishbowl and into the toilet where he saw his friend go." | 0.3309 | 0.3309 |
| 2 | "The person box was packed with jelly many dozens of months later." | 0.7219 | 0.7219 |
| 3 | "Standing on one's head at job interviews forms a lasting impression." | 0.1748 | 0.174**7** |
| 4 | "It took him a month to finish the meal." | 0.4471 | 0.447**2** |
| 5 | "He found a leprechaun in his walnut shell." | 0.5548 | 0.554**7** |

So, using `sentence-transformers` can make life much easier. But either option produces the same outcome.