# Query MongoDB Atlas Using Custom Embeddings

In the previous lab [custom-embeddings-1-populate.ipynb](custom-embeddings-1-populate.ipynb) we have populated movie collection in Atlas with our own custom embeddings. Now let's query them!

If you haven't completed the previos lab, please go ahead and complete it

## Load Settings

In [1]:
import os,sys
## Load Settings from .env file
from dotenv import find_dotenv, dotenv_values

# _ = load_dotenv(find_dotenv()) # read local .env file
config = dotenv_values(find_dotenv())

# debug
# print (config)

ATLAS_URI = config.get('ATLAS_URI')

if not ATLAS_URI:
    raise Exception ("'ATLAS_URI' is not set.  Please set it above to continue...")

In [2]:
# Our variables

DB_NAME = 'sample_mflix'
COLLECTION_NAME = 'embedded_movies'

## Initialize Mongo Atlas Client

In [3]:
from AtlasClient import AtlasClient

atlas_client = AtlasClient (ATLAS_URI, DB_NAME)
print("Connected to the Mongo Atlas database!")

Connected to the Mongo Atlas database!


In [4]:
model_mappings = {
    'BAAI/bge-small-en-v1.5' : {'embedding_attr' : 'plot_embedding_bge_small', 'index_name' : 'idx_plot_embedding_bge_small'},

    'sentence-transformers/all-mpnet-base-v2' : {'embedding_attr' : 'plot_embedding_mpnet_base_v2', 'index_name' : 'idx_plot_embedding_mpnet_base_v2'},

    # 'sentence-transformers/all-MiniLM-L12-v2' : {'embedding_attr' : 'plot_embedding_minilm_l12_v2', 'index_name' : 'idx_plot_embedding_minilm_l12_v2'},

    'sentence-transformers/all-MiniLM-L6-v2' : {'embedding_attr' : 'plot_embedding_minilm_l6_v2', 'index_name' : 'idx_plot_embedding_minilm_l6_v2'},

    ## bge-large takes too long and consumes too much memory!
    # 'BAAI/bge-large-en-v1.5' : {'embedding_attr' : 'plot_embedding_bge_large', 'index_name' : 'idx_plot_embedding_bge_large', 'embedding_length' : 1024},
}

## Time to Query!

Now that we have updated the collection, let's try some queries

In [8]:

import os
## LlamaIndex will download embeddings models as needed.
## Set llamaindex cache dir to ./cache dir here (Default is system tmp)
## This way, we can easily see downloaded artifacts
os.environ['LLAMA_INDEX_CACHE_DIR'] = os.path.join(os.path.abspath(''), 'cache')

from llama_index.embeddings import HuggingFaceEmbedding
import time


# this is a handy function to run a query given a model
def run_vector_query (query : str, model_name : str):
    model_mapping = model_mappings.get(model_name)
    if model_mapping is None:
        raise Exception ("Unknown model : " + model_name)
    embedding_attr = model_mapping['embedding_attr']
    index_name = model_mapping ['index_name']

    # generate embeddings
    embed_model = HuggingFaceEmbedding(model_name=model_name)
    query_embeddings = embed_model.get_text_embedding(query)

    # now let's query Atlas
    t1a = time.perf_counter()
    movies = atlas_client.vector_search (collection_name=COLLECTION_NAME, index_name=index_name, attr_name=embedding_attr, embedding_vector=query_embeddings, limit=5)
    t1b = time.perf_counter()
    print (f'Atlas query returned in {(t1b-t1a)*1000} ms')
    return movies


In [10]:
def print_movies(movies):
    print (f"Found {len (movies)} movies")
    for idx, movie in enumerate (movies):
        print(f'{idx+1}\nid: {movie["_id"]}\ntitle: {movie["title"]},\nyear: {movie["year"]}' +
            f'\nsearch_score(meta):{movie["search_score"]}\nplot: {movie["plot"]}\n')

In [11]:

query = 'fatalistic sci-fi movies'
model_name = 'BAAI/bge-small-en-v1.5'

movies = run_vector_query (query=query, model_name=model_name)

print (f'========== model = {model_name} ======')
print_movies (movies)


Atlas query returned in 89.97482794802636 ms
Found 5 movies
1
id: 573a1397f29313caabce61a5
title: Logan's Run,
year: 1976
search_score(meta):0.5782829523086548
plot: An idyllic sci-fi future has one major drawback: life must end at 30.

2
id: 573a13bff29313caabd5de30
title: Journey to Saturn,
year: 2008
search_score(meta):0.5679123997688293
plot: A danish crew of misfits travel to Saturn in search for natural resources. However, the planet is colonized by a ruthless army of Aliens that turn their eye on Earth and invade Denmark. ...

3
id: 573a13a6f29313caabd1898d
title: Forklift Driver Klaus: The First Day on the Job,
year: 2000
search_score(meta):0.5652276277542114
plot: Short film depicting a fictional educational film about fork lift truck operational safety. The dangers of unsafe operation are presented in gory details.

4
id: 573a13a8f29313caabd1ccea
title: Forklift Driver Klaus: The First Day on the Job,
year: 2000
search_score(meta):0.5652276277542114
plot: Short film depicting

In [12]:

query = 'fatalistic sci-fi movies'
model_name = 'sentence-transformers/all-mpnet-base-v2'

movies = run_vector_query (query=query, model_name=model_name)

print (f'========== model = {model_name} ======')
print_movies (movies)


Atlas query returned in 94.12906400393695 ms
Found 5 movies
1
id: 573a13b5f29313caabd4473e
title: Wristcutters: A Love Story,
year: 2006
search_score(meta):0.48351454734802246
plot: A film set in a strange afterlife way station that has been reserved for people who have committed suicide.

2
id: 573a139af29313caabcf0aff
title: Meet Joe Black,
year: 1998
search_score(meta):0.4695141315460205
plot: Death, who takes the form of a young man, asks a media mogul to act as a guide to teach him about life on Earth and in the process he falls in love with his guide's daughter.

3
id: 573a13aff29313caabd321a1
title: DOA: Dead or Alive,
year: 2006
search_score(meta):0.46454548835754395
plot: The movie adaptation of the best selling video game series Dead or Alive.

4
id: 573a13bff29313caabd5fdf0
title: Blood River,
year: 2009
search_score(meta):0.45820650458335876
plot: A psychological thriller, which explores the destruction of a young couple's seemingly perfect marriage.

5
id: 573a1397f29313ca