#**Create local vector embeddings using sentens-transformer python library**

##**GOAL: to embed text sentences and perform semantic searches using your own Python code.**

There are many pre-trained embedding models available on Hugging Face that you can use to create vector embeddings.
Sentence Transformers (SBERT) is a library that makes it easy to use these models for vector embedding.

Use pip  to install  'sentence_transformers' library  and import  'SentenceTransformer model loader' from this library.

In [1]:
# place your code here
import sys
!{sys.executable} -m pip install sentence-transformers

from sentence_transformers import SentenceTransformer




[notice] A new release of pip is available: 25.3 -> 26.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip
  from .autonotebook import tqdm as notebook_tqdm


Load the 'paraphrase-MiniLM-L6-v2' model  from HuggingFace resource  using the  SentenceTransformer( *model-name* )  and store the reference to the model object in the 'model' variable

In [2]:
# place your code here
model = SentenceTransformer('paraphrase-MiniLM-L6-v2')

Loading weights: 100%|██████████| 103/103 [00:00<00:00, 645.99it/s, Materializing param=pooler.dense.weight]                             
[1mBertModel LOAD REPORT[0m from: sentence-transformers/paraphrase-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

[3mNotes:
- UNEXPECTED[3m	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.[0m


After loading the model, call the 'encode()' method on the model object to create a vector representation of a specific text sentence. Use your own text string  as the parameter.

In [3]:
# complete the code
sentence = "Today weather is very good and I want to go for a walk"  #your sentence
embedding = model.encode(sentence)
embedding

array([-6.74642473e-02, -1.92379951e-03,  3.83438975e-01,  5.94115317e-01,
        6.48502409e-01,  1.21843323e-01,  2.07977593e-01, -4.42252129e-01,
       -1.61655918e-01, -1.40344277e-01, -2.66712643e-02, -1.99682847e-01,
       -3.51386666e-01,  3.98662895e-01,  1.88164845e-01, -5.79540282e-02,
        3.31136405e-01, -1.55087411e-02,  4.18900669e-01, -2.52914995e-01,
       -3.00506175e-01,  5.32984659e-02, -5.71020357e-02,  8.13915804e-02,
       -5.24405301e-01,  2.84269392e-01,  9.07186940e-02, -1.97793916e-01,
       -4.82917763e-02,  2.52661347e-01,  2.07227558e-01, -4.92516421e-02,
       -9.39832702e-02,  4.48416807e-02, -8.99038613e-02,  3.14167798e-01,
        3.76701176e-01, -4.40758616e-01, -1.56945549e-02,  3.56313586e-01,
       -1.01652130e-01,  4.93050888e-02,  3.24261665e-01,  7.44922385e-02,
        3.51504125e-02,  1.47628278e-01, -1.40157953e-01,  4.14158434e-01,
        6.19564831e-01,  1.95871055e-01,  3.00091743e-01, -1.26676634e-01,
       -2.45459713e-02,  

Create vector representations for several text sentences. Place the text strings in a list and use this list as an argument. Use 8-10 sentences of 20-25 words each.  Call the 'encode()' method on the model object with the list of sentences as an argument.

In [4]:
# complete the code
sentences_list = [    #your sentences
    "Machine learning allows computers to learn from data and make predictions or decisions without being explicitly programmed.",
    "The internet has revolutionized the way people communicate, work, and access information all around the world.",
    "Renewable energy sources like solar and wind power are essential for a sustainable future and reducing carbon emissions.",
    "Reading books regularly can improve vocabulary, increase knowledge, and stimulate the imagination in people of all ages.",
    "Healthy eating habits and regular exercise are important factors for maintaining both physical and mental well-being.",
    "Traveling to new countries helps people experience different cultures, languages, and perspectives on life.",
    "Online education platforms provide opportunities for students to learn new skills at their own pace from anywhere.",
    "Artificial intelligence is being used in healthcare to assist doctors with diagnosis and treatment recommendations.",
    "Climate change is a global challenge that requires cooperation and innovative solutions from all nations.",
    "Learning to code can open up many career opportunities in technology, science, and engineering fields."
]

embeddings = model.encode(sentences_list)
embeddings    




array([[-0.21049026, -0.1079618 , -0.17535624, ...,  0.10324576,
        -0.20667991, -0.31546226],
       [-0.14660501,  0.01428384, -0.05007816, ...,  0.2805681 ,
         0.4612357 , -0.03103179],
       [-0.45055443,  0.73692846,  0.09285192, ..., -0.00779026,
        -0.06267232,  0.00946096],
       ...,
       [-0.411979  , -0.42818308, -0.02080841, ...,  0.1904903 ,
         0.5210717 ,  0.2023773 ],
       [-0.38164318,  0.56015384,  0.07243702, ..., -0.4784638 ,
        -0.21135372,  0.1602554 ],
       [-0.29515582,  0.15265998, -0.13647254, ..., -0.08716229,
         0.558769  , -0.0661356 ]], shape=(10, 384), dtype=float32)

#**Definition of semantic textual similaritye**

Import 'util' module from sentence_transformers library.

In [5]:
# place your code here
from sentence_transformers import util

You can calculate the cosine similarity of the vector representations of our sentences using the 'cos_sim()' function from the util module.
Example: sim = util.cos_sim(embedding_1, embedding_2). Calculate the cosine similarity for any two sentences from your list.


In [6]:
# place your code here
similarity = util.cos_sim(embeddings[0], embeddings[1])
similarity

tensor([[0.1305]])

Write and test a function named 'cos_similarity_calculation' that determines the semantic similarity between the sentences in your list and any text sentence using their vector representations and the cosine distance as a similarity measure.  

In [7]:
# place your code here
def cos_similarity_calculation(sentences_list, embeddings, new_sentence, model):
    new_embedding = model.encode(new_sentence)
    similarities = util.cos_sim(new_embedding, embeddings)
    return similarities

result = cos_similarity_calculation(sentences_list, embeddings, "Online learning is very popular nowadays.", model)
result

tensor([[0.2687, 0.4407, 0.0980, 0.4045, 0.1559, 0.1954, 0.6086, 0.1605, 0.1218,
         0.3270]])

Create a function that determines the cosine similarity between a vector and a batch of vectors using the cosine distance formula and the numpy library. Add code to demonstrate how to use this function.

In [8]:
# place your code here
import numpy as np

def cosine_similarity_numpy(vector, batch_vectors):
    vector = np.array(vector)
    batch_vectors = np.array(batch_vectors)
    dot_products = np.dot(batch_vectors, vector)
    vector_norm = np.linalg.norm(vector)
    batch_norms = np.linalg.norm(batch_vectors, axis=1)
    similarities = dot_products / (batch_norms * vector_norm)
    return similarities

example_vector = embeddings[0]
similarities = cosine_similarity_numpy(example_vector, embeddings)
similarities

array([1.        , 0.13054857, 0.06404722, 0.1919014 , 0.09009057,
       0.03363077, 0.3615777 , 0.4680941 , 0.07806575, 0.35257208],
      dtype=float32)