### Semantic Search

Semantic search aims to enhance search precision or accuracy by understanding the contextual meaning within the search query. Unlike conventional search engines that solely identify documents through word matches, semantic search has the capability to identify synonyms as well.

symmetric semantic search : https://www.sbert.net/docs/pretrained_models.html#sentence-embedding-models

asymmetric semantic search : https://www.sbert.net/docs/pretrained-models/msmarco-v3.html

In [15]:
from sentence_transformers import SentenceTransformer, util
import torch


In [2]:
model = SentenceTransformer("multi-qa-MiniLM-L6-cos-v1")

Downloading:   0%|          | 0.00/737 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/11.5k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/612 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/116 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/25.5k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/383 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/13.8k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/349 [00:00<?, ?B/s]

### build corpus and query  https://github.com/laxmimerit

In [3]:

import requests

In [9]:
response = requests.get("https://raw.githubusercontent.com/laxmimerit/machine-learning-dataset/master/text-dataset-for-machine-learning/sbert-corpus.txt")
corpus = response.text.split("\r\n")

response = requests.get("https://raw.githubusercontent.com/laxmimerit/machine-learning-dataset/master/text-dataset-for-machine-learning/sbert-queries.txt")
queries = response.text.split("\r\n")

In [10]:
print(corpus)

['A man is eating food.', 'A man is eating a piece of bread.', 'A man is eating pasta.', 'The girl is carrying a baby.', 'The baby is carried by the woman', 'A man is riding a horse.', 'A man is riding a white horse on an enclosed ground.', 'A monkey is playing drums.', 'Someone in a gorilla costume is playing a set of drums.', 'A cheetah is running behind its prey.', 'A cheetah chases prey on across a field.']


In [11]:
print(queries)

['A man is eating pasta.', 'Someone in a gorilla costume is playing a set of drums.', 'A cheetah chases prey on across a field.']


### Project into Vector Space

In [12]:
corpus_embeddings = model.encode(corpus, convert_to_tensor=True)
queries_embeddings = model.encode(queries, convert_to_tensor=True)

In [None]:
# corpus_embeddings[0] # vector space of the first column

In [16]:
# Normalize vectors for fast calculations
corpus_embeddings = util.normalize_embeddings(corpus_embeddings)
queris_embeddings = util.normalize_embeddings(queries_embeddings)


In [18]:
len(corpus_embeddings[0])

384

  Get Semantic search between the vectors

  will do the matching between the corpus and the queries embeddings

In [31]:
hits = util.semantic_search(queries_embeddings, corpus_embeddings, score_function=util.dot_score, top_k=3)

In [32]:
hits #shows top 3 hit for each query

[[{'corpus_id': 2, 'score': 0.9999998807907104},
  {'corpus_id': 0, 'score': 0.8384665250778198},
  {'corpus_id': 1, 'score': 0.7468275427818298}],
 [{'corpus_id': 8, 'score': 0.9999999403953552},
  {'corpus_id': 7, 'score': 0.7612733840942383},
  {'corpus_id': 3, 'score': 0.3815288245677948}],
 [{'corpus_id': 10, 'score': 1.0},
  {'corpus_id': 9, 'score': 0.8703994750976562},
  {'corpus_id': 6, 'score': 0.37411701679229736}]]

In [33]:
for query,hit in zip(queries,hits):
    for q_hit in hit:
        id = q_hit['corpus_id']
        score = q_hit["score"]
    
    print(query, "<>", corpus[id], "--->", score)

    print()


A man is eating pasta. <> A man is eating a piece of bread. ---> 0.7468275427818298

Someone in a gorilla costume is playing a set of drums. <> The girl is carrying a baby. ---> 0.3815288245677948

A cheetah chases prey on across a field. <> A man is riding a white horse on an enclosed ground. ---> 0.37411701679229736



In [None]:
# A day in the life of Oluchi