#**Create local vector embeddings using sentens-transformer python library**

##**GOAL: to embed text sentences and perform semantic searches using your own Python code.**

There are many pre-trained embedding models available on Hugging Face that you can use to create vector embeddings.
Sentence Transformers (SBERT) is a library that makes it easy to use these models for vector embedding.

Use pip  to install  'sentence_transformers' library  and import  'SentenceTransformer model loader' from this library.

In [1]:
from sentence_transformers import SentenceTransformer

  from .autonotebook import tqdm as notebook_tqdm
  if not hasattr(np, "object"):





Load the 'paraphrase-MiniLM-L6-v2' model  from HuggingFace resource  using the  SentenceTransformer( *model-name* )  and store the reference to the model object in the 'model' variable

In [2]:
# place your code here
model = SentenceTransformer("paraphrase-MiniLM-L6-v2")

After loading the model, call the 'encode()' method on the model object to create a vector representation of a specific text sentence. Use your own text string  as the parameter.

In [3]:
# complete the code
sentence = "The AI engineering course is very intense but interesting."  #your sentence
embedding = model.encode(sentence)
embedding

array([ 0.16242452, -0.32998833, -0.26105827, -0.27376795,  0.2055108 ,
       -0.3268377 , -0.2852706 ,  0.05449158, -0.45635918, -0.0854828 ,
       -0.3885019 , -0.04291125, -0.1490832 ,  0.1757705 , -0.2649453 ,
       -0.05682654,  0.17586823, -0.7168272 , -0.3233005 , -0.260863  ,
       -0.07775095, -0.4677556 ,  0.19411592, -0.263966  ,  0.2602248 ,
        0.24813132,  0.3115436 , -0.06603002,  0.28283894, -0.38867858,
        0.3456558 , -0.16287027,  0.29766524, -0.08733153, -0.2163121 ,
        0.376099  , -0.02729485, -0.4152918 ,  0.17746793,  0.07043826,
       -0.19648837, -0.34868944,  0.326831  , -0.1860769 ,  0.7209968 ,
        0.07007735,  0.06338777, -0.60558605,  0.14234878, -0.08384811,
       -0.11462647, -0.20250504,  0.2791814 , -0.11856436, -0.24774249,
        0.47772932,  0.5065195 , -0.5612685 , -0.43987763, -0.32156006,
       -0.07674744, -0.08610552,  0.32706884,  0.4351158 ,  0.52870744,
       -0.2865257 ,  0.4747733 ,  0.16286393,  0.02088573,  0.20

Create vector representations for several text sentences. Place the text strings in a list and use this list as an argument. Use 8-10 sentences of 20-25 words each.  Call the 'encode()' method on the model object with the list of sentences as an argument.

In [4]:
# complete the code
sentences_list = [    
    "Machine learning algorithms can identify complex patterns in large datasets that would be impossible for humans to detect manually.",
    "Natural language processing enables computers to understand, interpret, and generate human language in a meaningful and useful way.",
    "Deep neural networks are inspired by the structure of the human brain and consist of many interconnected layers of nodes.",
    "Transfer learning allows a model pre-trained on one task to be fine-tuned efficiently for a completely different target task.",
    "Vector embeddings represent words or sentences as dense numerical arrays that capture semantic meaning and contextual relationships.",
    "Reinforcement learning trains agents to make sequential decisions by rewarding desired behaviors and penalizing undesired ones over time.",
    "Data preprocessing is a critical step in any machine learning pipeline because garbage input always produces garbage output results.",
    "Transformer architectures revolutionized NLP by using self-attention mechanisms to model long-range dependencies across entire input sequences.",
    "Semantic search retrieves documents based on the meaning of a query rather than simple keyword overlap or exact string matching.",
    "Model evaluation metrics such as precision, recall, and F1-score help us understand how well a classifier is really performing.",]    #your sentences

embeddings = model.encode(sentences_list)


#**Definition of semantic textual similaritye**

Import 'util' module from sentence_transformers library.

In [5]:
# place your code here
from sentence_transformers import util

You can calculate the cosine similarity of the vector representations of our sentences using the 'cos_sim()' function from the util module.
Example: sim = util.cos_sim(embedding_1, embedding_2). Calculate the cosine similarity for any two sentences from your list.


In [6]:
# place your code here
embedding_1 = embeddings[0]
embedding_2 = embeddings[1]

sim = util.cos_sim(embedding_1, embedding_2)
sim

tensor([[0.2953]])

Write and test a function named 'cos_similarity_calculation' that determines the semantic similarity between the sentences in your list and any text sentence using their vector representations and the cosine distance as a similarity measure.  

In [7]:
# place your code here
def cos_similarity_calculation(text, sentences, embeddings, model):
    text_embed = model.encode(text)

    results = []
    for i, sentence in enumerate(sentences):
        score = util.cos_sim(text_embed, embeddings[i]).item()
        results.append((score, sentence))
    
    results.sort(reverse=True)

    return results

text = "Machine learning uses lots of tools."
cos_similarity_calculation(text, sentences_list, embeddings, model)

[(0.5002042055130005,
  'Machine learning algorithms can identify complex patterns in large datasets that would be impossible for humans to detect manually.'),
 (0.49907559156417847,
  'Deep neural networks are inspired by the structure of the human brain and consist of many interconnected layers of nodes.'),
 (0.4459686279296875,
  'Natural language processing enables computers to understand, interpret, and generate human language in a meaningful and useful way.'),
 (0.4378020167350769,
  'Data preprocessing is a critical step in any machine learning pipeline because garbage input always produces garbage output results.'),
 (0.4335484504699707,
  'Transfer learning allows a model pre-trained on one task to be fine-tuned efficiently for a completely different target task.'),
 (0.39404425024986267,
  'Model evaluation metrics such as precision, recall, and F1-score help us understand how well a classifier is really performing.'),
 (0.3759359121322632,
  'Reinforcement learning trains ag

In [8]:
text_1 = "Cat is not a dog"
cos_similarity_calculation(text_1, sentences_list, embeddings, model)

[(0.16381791234016418,
  'Semantic search retrieves documents based on the meaning of a query rather than simple keyword overlap or exact string matching.'),
 (0.064333476126194,
  'Reinforcement learning trains agents to make sequential decisions by rewarding desired behaviors and penalizing undesired ones over time.'),
 (0.03935271501541138,
  'Data preprocessing is a critical step in any machine learning pipeline because garbage input always produces garbage output results.'),
 (0.03696758300065994,
  'Natural language processing enables computers to understand, interpret, and generate human language in a meaningful and useful way.'),
 (0.009597387164831161,
  'Machine learning algorithms can identify complex patterns in large datasets that would be impossible for humans to detect manually.'),
 (-0.011903373524546623,
  'Vector embeddings represent words or sentences as dense numerical arrays that capture semantic meaning and contextual relationships.'),
 (-0.03747481480240822,
  'M

Create a function that determines the cosine similarity between a vector and a batch of vectors using the cosine distance formula and the numpy library. Add code to demonstrate how to use this function.

In [10]:
# place your code here
import numpy as np
def cos_distance_vectors(vector, batch):
    top = np.dot(batch, vector)
    
    bottom = np.linalg.norm(vector) * np.linalg.norm(batch, axis=1)
    
    return 1 - top / bottom

text_2 = "Machine learning algorithms"
embedding_text_2 = model.encode(text_2)
distances = cos_distance_vectors(embedding_text_2, embeddings)

for i, distance in enumerate(distances):
    print(f"Sentence {i+1}: Distance = {distance}")
    print(f"Text: {sentences_list[i]}\n")

Sentence 1: Distance = 0.44009023904800415
Text: Machine learning algorithms can identify complex patterns in large datasets that would be impossible for humans to detect manually.

Sentence 2: Distance = 0.735309362411499
Text: Natural language processing enables computers to understand, interpret, and generate human language in a meaningful and useful way.

Sentence 3: Distance = 0.7077240943908691
Text: Deep neural networks are inspired by the structure of the human brain and consist of many interconnected layers of nodes.

Sentence 4: Distance = 0.694071352481842
Text: Transfer learning allows a model pre-trained on one task to be fine-tuned efficiently for a completely different target task.

Sentence 5: Distance = 0.8114359378814697
Text: Vector embeddings represent words or sentences as dense numerical arrays that capture semantic meaning and contextual relationships.

Sentence 6: Distance = 0.5748833417892456
Text: Reinforcement learning trains agents to make sequential decisio