### Self-Study Colab Activity 23.2: Running Through a Transformer Model Locally

**Expected Time = 60 minutes**


In this activity you will dive into sentence transformers and understand how they can be used to recognize similarity between sentences.

#### What are Sentence Transformer?

Transformers are indirect descendants of the  RNN models. In application such as machine translation, you would find encoder-decoder networks. The first model for encoding the original language to a context vector, and a second model for decoding this into the target language.

Encoder-decoder architecture share a single context vector between the two models, creating an information bottleneck as all information must be passed through this point.  This limits the encoder-decoder performance because much of the information produced by the encoder is lost before reaching the decoder.

Sentence transformers are models designed to convert sentences or text into high-quality numerical vectors or embeddings. These embeddings capture the semantic meaning of the sentences, allowing for tasks like:

- Text similarity: By comparing the embeddings of two sentences, you can determine how similar they are in meaning.
- Text classification: Embeddings can be used as features for various machine learning models for tasks like sentiment analysis, topic classification, etc.

### How do Sentence Transformer Work:
Sentence transformers typically use variants of the Transformer architecture, such as BERT, RoBERTa, or DistilBERT, but fine-tuned specifically for creating sentence embeddings. They are often built on the Siamese network or triplet network architecture, which enables the model to learn semantically meaningful embeddings by minimizing the distance between similar sentences and maximizing it for dissimilar sentences.

One popular implementation of sentence transformers is the Sentence-BERT (SBERT), which fine-tunes BERT or similar models using a contrastive learning objective for more effective sentence-level embeddings.

#### Getting Started with Sentence Transformers

The fastest and easiest way to begin working with sentence transformers is through the `sentence-transformers` library created by the creators of `SBERT`.

Run the code cell below to install the library with `pip`.

In [1]:
!pip install sentence-transformers

Collecting sentence-transformers
  Downloading sentence_transformers-3.1.0-py3-none-any.whl.metadata (23 kB)
Downloading sentence_transformers-3.1.0-py3-none-any.whl (249 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m249.1/249.1 kB[0m [31m6.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: sentence-transformers
Successfully installed sentence-transformers-3.1.0


To start, you will use the original `SBERT` model `bert-base-nli-mean-tokens`.

Replace the ellipsis in the code cell below with the name of the model. Then, run the code cell to download and initialize the model.

In [2]:
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('bert-base-nli-mean-tokens')

model

  from tqdm.autonotebook import tqdm, trange
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/122 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/3.99k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/625 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/399 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/2.00 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]



1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

From the output above you can see that the `SentenceTransformer` object contains three components:

- The `transformer` itself with a maximum sequence length of 128 tokens.
- The `BertModel` model class.
- The `pooling` operation able to produce a 768-dimensional sentence embedding.

#### Data Definition and Encoding

The code cell below defines some sentences that will be used to test the embedding using the BERT model definied above.

In [3]:
sentences = [
    "the fifty mannequin heads floating in the pool kind of freaked them out",
    "she swore she just saw her scary move",
    "he embraced his new life as an lawyer",
    "my dentist tells me that chewing bricks is very bad for your teeth",
    "the dental specialist recommended an immediate stop to flossing with construction materials"
]




Once the model and the data are defined, building sentence embeddings is quickly done using the `encode` method.

In the code cell below, apply the `encode` function to `model` and pass the `sentences` as argument.

In [4]:
embeddings = model.encode(sentences)



#### Calculating the Similarity Between Sentences

Having calculated the sentence embeddings, you can now use it to quickly compare sentence similarity for semantic textual similarity (STS) which works by comparing pairs of sentences and it's often used to identify patterns in datasets.

Complete the code inside the `for` loop to calculate the cosine similarity among pairs of sentences.

In [6]:
import numpy as np
from sentence_transformers.util import cos_sim

sim = np.zeros((len(sentences), len(sentences)))

for i in range(len(sentences)):
    sim[i:,i] = cos_sim(embeddings[i], embeddings[i:])

sim

array([[1.        , 0.        , 0.        , 0.        , 0.        ],
       [0.40970904, 1.00000024, 0.        , 0.        , 0.        ],
       [0.10871316, 0.30755171, 1.        , 0.        , 0.        ],
       [0.50074869, 0.41413638, 0.15639579, 1.00000012, 0.        ],
       [0.29936212, 0.35473359, 0.24160701, 0.63849497, 0.99999988]])

Before we interpre the output, it is useful to clarify the index correspoding to each sentence. Observe the table below:

Index  | Sentence
-------------------|------------------
0       | the fifty mannequin heads floating in the pool kind of freaked them ou
1 | she swore she just saw her scary move
2     | he embraced his new life as an lawyer
3| my dentist tells me that chewing bricks is very bad for your teeth
4 | the dental specialist recommended an immediate stop to flossing with construction materials

Ignoring the diagonal terms, you can see the highest similarity score in the bottom-right corner with `0.64`. This result is not surprising because both sentences 3  and 4 describe poor dental practices using construction materials.



#### Other sentence-transformers
Although we returned good results from the `SBERT` model, many more sentence transformer models have since been developed many of which you can find in the [sentence-transformers](https://www.sbert.net/docs/sentence_transformer/pretrained_models.html) library.

In this section of the activity you will compare one of the highest performers model (`all-mpnet-base-v2`) and run through the same STS task.

In the code cell below, replace the ellipsis with the name of the `MPNet` model given above.

In [7]:
from sentence_transformers import SentenceTransformer

mpnet = SentenceTransformer('all-mpnet-base-v2')

mpnet

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]



1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

The components of the `all-mpnet-base-v2` model are very similar to the `bert-base-nli-mean-tokens` model, with some small differences:
- `max_seq_length` has increased from 128 to 384 meaning that this model can process sequences that are three times longer than we could with `SBERT`.
- There is an additional normalization layer applied to sentence embeddings.

#### Calculating the Similarity Between Sentences

Having calculated the sentence embeddings, run the the code cell below to calculate the STS similarity between sentences.

In [8]:
embeddings = mpnet.encode(sentences)

sim = np.zeros((len(sentences), len(sentences)))

for i in range(len(sentences)):
    sim[i:,i] = cos_sim(embeddings[i], embeddings[i:])

sim



array([[ 1.        ,  0.        ,  0.        ,  0.        ,  0.        ],
       [ 0.34552321,  1.        ,  0.        ,  0.        ,  0.        ],
       [ 0.03474139,  0.04706889,  0.99999994,  0.        ,  0.        ],
       [ 0.04334457,  0.03544566, -0.05432385,  1.00000024,  0.        ],
       [ 0.053985  ,  0.07222687,  0.02947495,  0.51847208,  1.00000012]])

 By comapring the results of `SBERT` and `MPNet`, you can oberve that although `SBERT` correctly identifies 4 and 3 as the most similar pair, it also assigns reasonably high similarity to other sentence pairs.

On the other hand, the `MPNet` model makes a very clear distinction between similar and dissimilar pairs, with most pairs scoring less than 0.1 and the 4-3 pair scored at 0.52.

In other words, increasing the separation between dissimilar and similar pairs
makes it easier to automatically identify relevant pair and
pushes predictions closer to the 0 and 1 target scores for dissimilar and similar pairs used during training.