# Lab: Vector Search on Mongo Atlas Using OpenAI Embeddings

In this lab we will do a vector search on [MongoDB Atlas](https://www.mongodb.com/atlas).  We will use OpenAI Embedding API to generate embeddings.

We need the following:
- Atlas cloud account
- OpenAI API key (optional, see below)

References

- https://cookbook.openai.com/examples/vector_databases/mongodb_atlas/semantic_search_using_mongodb_atlas_vector_search

## Step-1: Setup `.env` file

Create an `.env` file with the following content:

Replace `ATLAS_URI` and `OPENAI_APIKEY` with your own


```text
ATLAS_URI=mongodb+srv://<username>:<password>@sandbox.lqlql.mongodb.net/?retryWrites=true&w=majority
OPENAI_API_KEY=replace-me
```

## Step-2: Load Settings

In [23]:
import os, sys

this_dir = os.path.abspath('')
parent_dir = os.path.dirname(this_dir)
sys.path.append (os.path.abspath (parent_dir))

In [24]:
## Load Settings from .env file
from dotenv import find_dotenv, dotenv_values

# _ = load_dotenv(find_dotenv()) # read local .env file
config = dotenv_values(find_dotenv())

# debug
# print (config)

ATLAS_URI = config.get('ATLAS_URI')
OPENAI_API_KEY = config.get("OPENAI_API_KEY")

if not ATLAS_URI:
    raise Exception ("'ATLAS_URI' is not set.  Please set it above to continue...")

# if not OPENAI_API_KEY:
#     raise Exception ("'OPENAI_API_KEY' is not set.  Please set it above to continue...")

## Step-3: Inspect these Python Classes

- [AtlasClient.py](AtlasClient.py) - a handy class to interact with Atlas
- [OpenAIClient.py](OpenAIClient.py) - a handy class to intereact with openAI

In [25]:
# Our variables

DB_NAME = 'sample_mflix'
COLLECTION_NAME = 'embedded_movies'
INDEX_NAME = 'idx_plot_embedding'

## Step-4: Initialize Mongo Atlas Client

In [26]:
from AtlasClient import AtlasClient

atlas_client = AtlasClient (ATLAS_URI, DB_NAME)
print("Connected to the Mongo Atlas database!")

Connected to the Mongo Atlas database!


## Step-5: Initialize OpenAI Client

In [27]:
from OpenAIClient import OpenAIClient

openAI_client = None

if OPENAI_API_KEY:
    openAI_client = OpenAIClient (api_key=OPENAI_API_KEY)
    print ("OpenAI client initialized")

OpenAI client initialized


## Step-6: Create an Alas Index

Follow [this guide](setup-atlas-index.md) here to create an index.

**Note: Do not skip this step, we need an active index to perform vector search**

## Step-7: Do a Vector Search

Now that we have every thing setup, this is the fun part!

We are going to query movies, not just on plot keywords but 'meaning'.

See the examples below.  And try your own!

The process is as follows:

- convert query into embeddings (using OpenAI API)
- send the embeddings to Atlas and get results

### Note the Score

IN addition to movie attributes (title, year, plot ..etc) We are also dislaying `search_score`.  This is a meta attribute - not really part of movies collection, but generated as a result of vector search.

This is a number between 0 and 1.  Closer to 1 values represent 'better match'.  And the results are sorted from best match down (closer to 1 first)

[You can read more about search score here](https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-stage/#atlas-vector-search-score)

## Sample Queries / Cached Embeddings

To make you get up and running quickly, we have cached some embedding results.  This way we can query Atlas without having to call embedding API first.

If you use these sample queries, you won't need an OpenAI Key.  If you want to try a different query, then you will need an openAI API key.

In [28]:
import os
import json

cached_embeddings = {}
cached_embedding_file = 'embeddings_openai.json'

if os.path.exists(cached_embedding_file):
    with open(cached_embedding_file, "r") as f:
        str = f.read()
        cached_embeddings = json.loads(str)

print ("Loaded the following cached embeddings...")
for query in cached_embeddings.keys():
    print (f'- {query}')

Loaded the following cached embeddings...
- fatalistic sci-fi movies
- humans fighting aliens
- futuristic christmas movies
- sci-fi story with a friendly alien
- relationship drama between two good friends
- college graduates working in a big city discover new relationships
- household pets get lost but go on a long journey to find home


In [29]:
import time

# Handy function
def do_vector_search (query:str) -> None:
    query = query.lower().strip()
    print ('query: ', query)
    if query in cached_embeddings.keys():
        print ("using cached embeddings")
        embedding = cached_embeddings.get (query)
    else:
        t1a = time.perf_counter()
        embedding = openAI_client.get_embedding(query)
        t1b = time.perf_counter()
        print (f"Getting embeddings from OpenAI took {(t1b-t1a)*1000:,.0f} ms")

    t2a = time.perf_counter()
    movies = atlas_client.vector_search(collection_name=COLLECTION_NAME, index_name=INDEX_NAME, attr_name='plot_embedding', embedding_vector=embedding,limit=10 )
    t2b = time.perf_counter()

    print (f"Altas query returned {len (movies)} movies in {(t2b-t2a)*1000:,.0f} ms")
    print()

    for idx, movie in enumerate (movies):
        print(f'{idx+1}\nid: {movie["_id"]}\ntitle: {movie["title"]},\nyear: {movie["year"]}' +
            f'\nsearch_score(meta):{movie["search_score"]}\nplot: {movie["plot"]}\n')

In [30]:
query="humans fighting aliens"

do_vector_search (query=query)

query:  humans fighting aliens
using cached embeddings
Altas query returned 10 movies in 639 ms

1
id: 573a1398f29313caabce8f83
title: V: The Final Battle,
year: 1984
search_score(meta):0.8542792797088623
plot: A small group of human resistance fighters fight a desperate guerilla war against the genocidal extra-terrestrials who dominate Earth.

2
id: 573a13c7f29313caabd75324
title: Falling Skies,
year: 2011è
search_score(meta):0.8476295471191406
plot: Survivors of an alien attack on earth gather together to fight for their lives and fight back.

3
id: 573a139af29313caabcf0cff
title: Starship Troopers,
year: 1997
search_score(meta):0.8398948907852173
plot: Humans in a fascistic, militaristic future do battle with giant alien bugs in a fight for survival.

4
id: 573a139ff29313caabd000f6
title: Battlefield Earth,
year: 2000
search_score(meta):0.8368538618087769
plot: After enslavement & near extermination by an alien race in the year 3000, humanity begins to fight back.

5
id: 573a139af29

In [31]:
query="fatalistic sci-fi movies"

do_vector_search (query=query)

query:  fatalistic sci-fi movies
using cached embeddings
Altas query returned 10 movies in 94 ms

1
id: 573a139af29313caabcf0cff
title: Starship Troopers,
year: 1997
search_score(meta):0.7599651217460632
plot: Humans in a fascistic, militaristic future do battle with giant alien bugs in a fight for survival.

2
id: 573a139ff29313caabcff478
title: Terminator 3: Rise of the Machines,
year: 2003
search_score(meta):0.7479422092437744
plot: A cybernetic warrior from a post-apocalyptic future travels back in time to protect a 19-year old drifter and his future wife from a most advanced robotic assassin and to ensure they both survive a nuclear attack.

3
id: 573a1397f29313caabce61a5
title: Logan's Run,
year: 1976
search_score(meta):0.7465192675590515
plot: An idyllic sci-fi future has one major drawback: life must end at 30.

4
id: 573a13adf29313caabd2ae08
title: Starship Troopers 2: Hero of the Federation,
year: 2004
search_score(meta):0.7455509305000305
plot: In the sequel to Paul Verhoeve

### Try your own searches!

Update the query string to what ever you like, and run it.

Remember, if you want to try different queries, than what we cached, you will need your OPENAI_API_KEY

In [32]:
## TODO: enter your query here
# query="technology gone wrong"

# do_vector_search (query=query)


In [33]:
## Close connection

# atlas_client.close_connection()