<a href="https://colab.research.google.com/github/nimishsoni/Large-Language-Model-Notebooks/blob/main/Vector_Search_with_LLMs_for_Recommendations.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Implement Vector Similarity Search for generating Recommendations
In this notebook vector based similarity search is implemented using mongoDB vector semantic search for mflix movie database which returns most similar recommendations of movie in response to a user query.

In [1]:
!python -m pip install "pymongo[srv]"



## Import necessary libraries

In [2]:
import pymongo
import os
import requests

## Setup and connect to the required collection in mongodb atlas using pymongo client

In [3]:
%env mongo_uid = nimishsoni
%env mongo_pwd = hihaha12
%env hf_token = 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

env: mongo_uid=nimishsoni
env: mongo_pwd=hihaha12
env: hf_token=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


In [4]:
username = os.environ.get('mongo_uid')
password = os.environ.get('mongo_pwd')
#cluster_url = os.environ.get('mongo_cluster_url')

# Create the MongoDB connection string
connection_string = f"mongodb+srv://{username}:{password}@cluster0.2ljdmxm.mongodb.net/test?retryWrites=true&w=majority&connectTimeoutMS=30000&serverSelectionTimeoutMS=3000"

# Connect to MongoDB Atlas
client = pymongo.MongoClient(connection_string)

In [5]:
db = client.sample_mflix
collection = db.movies

In [6]:
# Check if we have successfully connected to database
items = collection.find().limit(5)
for item in items:
  print(item)

{'_id': ObjectId('573a1390f29313caabcd42e8'), 'plot': 'A group of bandits stage a brazen train hold-up, only to find a determined posse hot on their heels.', 'genres': ['Short', 'Western'], 'runtime': 11, 'cast': ['A.C. Abadie', "Gilbert M. 'Broncho Billy' Anderson", 'George Barnes', 'Justus D. Barnes'], 'poster': 'https://m.media-amazon.com/images/M/MV5BMTU3NjE5NzYtYTYyNS00MDVmLWIwYjgtMmYwYWIxZDYyNzU2XkEyXkFqcGdeQXVyNzQzNzQxNzI@._V1_SY1000_SX677_AL_.jpg', 'title': 'The Great Train Robbery', 'fullplot': "Among the earliest existing films in American cinema - notable as the first film that presented a narrative story to tell - it depicts a group of cowboy outlaws who hold up a train and rob the passengers. They are then pursued by a Sheriff's posse. Several scenes have color included - all hand tinted.", 'languages': ['English'], 'released': datetime.datetime(1903, 12, 1, 0, 0), 'directors': ['Edwin S. Porter'], 'rated': 'TV-G', 'awards': {'wins': 1, 'nominations': 0, 'text': '1 win.'},

## Use all-miniLM sentence transformer tokenizer
 Use post request to generate embedding response from the tokenizer from HF library API

In [7]:
embedding_url = "https://api-inference.huggingface.co/pipeline/feature-extraction/sentence-transformers/all-MiniLM-L6-v2"

In [8]:
hface_token = os.environ.get('hf_token')

In [9]:
def generate_embedding(text: str) -> list[float]:

  response = requests.post(
    embedding_url,
    headers={"Authorization": f"Bearer {hface_token}"},
    json={"inputs": text})

  if response.status_code != 200:
    raise ValueError(f"Request failed with status code {response.status_code}: {response.text}")

  return response.json()

Add vector embeddings of 100 movie plots to the collection using HF model embedding model and save it as field: plot_embedding_hf. Add PlotSemanticSearch as vector index for the search

In [10]:
for doc in collection.find({'plot':{'$exists': True}}).limit(100):
  doc['plot_embedding_hf'] = generate_embedding(doc['plot'])
  collection.replace_one({'_id': doc['_id']}, doc)

Write a query, generate embeddings using the function and use mongoDB extension for vector similarity search to generate response.

In [11]:
query = "imaginary characters from outer space at war"

results = collection.aggregate([
  {"$vectorSearch": { # MongoDB extension for vector similarity search
    "queryVector": generate_embedding(query), #generate embedding function defined is used to generate vector form of query text and passed on to vector search
    "path": "plot_embedding_hf", # vector embeddings are stored in the field named "plot_embedding" in each document of the collection movies
    "numCandidates": 100, #  maximum number of candidate documents to consider during the search
    "limit": 4, # Specifies the maximum number of results to return
    "index": "PlotSemanticSearch",#Specifies the name of the vector index to use for the search
      }}
]);

for document in results:
    print(f'Movie Name: {document["title"]},\nMovie Plot: {document["plot"]}\n')

Movie Name: Four Sons,
Movie Plot: A family saga in which three of a Bavarian widow's sons go to war for Germany and the fourth goes to America, Germany's eventual opponent.

Movie Name: The Strong Man,
Movie Plot: A meek Belgian soldier (Harry Langdon) fighting in World War I receives penpal letters and a photo from "Mary Brown", an American girl he has never met. He becomes infatuated with her by ...

Movie Name: Westfront 1918,
Movie Plot: A group of German infantrymen of the First World War live out their lives in the trenches of France. They find brief entertainment and relief in a village behind the lines, but primarily ...

Movie Name: The Four Horsemen of the Apocalypse,
Movie Plot: An extended family split up in France and Germany find themselves on opposing sides of the battlefield during World War I.



## Using OpenAI API for generating vector embeddings

In [13]:
import openai

In [14]:
# Set your OpenAI API key
openai.api_key = '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'

In [15]:
collection_emb_movies = db.embedded_movies

In [16]:
def generate_embedding(text: str) -> list[float]:

    response = openai.Embedding.create(
        model="text-embedding-ada-002",
        input=text
    )
    return response['data'][0]['embedding']

In [17]:
query = "imaginary characters from outer space at war"

results = collection_emb_movies.aggregate([
  {"$vectorSearch": {
    "queryVector": generate_embedding(query),
    "path": "plot_embedding",
    "numCandidates": 100,
    "limit": 4,
    "index": "PlotSemanticSearch",
      }}
]);

for document in results:
    print(f'Movie Name: {document["title"]},\nMovie Plot: {document["plot"]}\n')

Movie Name: V: The Final Battle,
Movie Plot: A small group of human resistance fighters fight a desperate guerilla war against the genocidal extra-terrestrials who dominate Earth.

Movie Name: Pixels,
Movie Plot: When aliens misinterpret video feeds of classic arcade games as a declaration of war, they attack the Earth in the form of the video games.

Movie Name: Futurama: Bender's Game,
Movie Plot: The Planet Express crew get trapped in a fantasy world.

Movie Name: Guardians of the Galaxy,
Movie Plot: A group of intergalactic criminals are forced to work together to stop a fanatical warrior from taking control of the universe.

