<a href="https://colab.research.google.com/github/prakul/MongoDB-AI-Resources/blob/main/Auto_embedding_quick_start.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Atlas Vector Search Quick Start

This notebook is a companion to the [Quick Start](https://www.mongodb.com/docs/atlas/atlas-vector-search/tutorials/vector-search-quick-start/) page. Refer to the page for set-up instructions and detailed explanations.

<a target="_blank" href="https://colab.research.google.com/github/mongodb/docs-notebooks/blob/main/get-started/quick-start.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

In [32]:
pip install pymongo python-dotenv



In [33]:
from pymongo.mongo_client import MongoClient
from pymongo.operations import SearchIndexModel
import os
from dotenv import load_dotenv
import time
import urllib
load_dotenv(override=True)

# Connect to your Atlas deployment
MONGO_URI = os.environ["MONGO_URI"]
uname = os.environ["uname"]
pword = urllib.parse.quote_plus(os.environ["pword"])
uri = MONGO_URI.format(uname, pword)
client = MongoClient(uri)

# Access your database and collection
database = client["sample_mflix"]
collection = database["embedded_movies"]

In [17]:

def create_autoembed_index(path, embedding_model, search_index_name):
    # Create your index model, then create the search index
    search_index_model = SearchIndexModel(
    definition={
        "fields": [
        {
            "type": "text",
            "path": path,
            "model": embedding_model
        }
        ]
    },
    name=search_index_name,
    type="vectorSearch",
    )

    result = collection.create_search_index(model=search_index_model)
    print("New search index named " + result + " is building.")

    # Wait for initial sync to complete
    print("Polling to check if the index is ready. This may take up a short while depending on the size of your collection and the embedding model chosen.")
    predicate=None
    if predicate is None:
        predicate = lambda index: index.get("queryable") is True

    while True:
        indices = list(collection.list_search_indexes(result))
        if len(indices) and predicate(indices[0]):
            break
        time.sleep(5)
    print(result + " is ready for querying.")
    return result

In [34]:
path = "plot"
embedding_model = "voyage-3.5-lite"
search_index_name = "demo_test"

res = create_autoembed_index(path, embedding_model, search_index_name)

New search index named demo_test is building.
Polling to check if the index is ready. This may take up a short while depending on the size of your collection and the embedding model chosen.
demo_test is ready for querying.


In [28]:
def get_results(index_name, path, query):
    pipeline = [
    {
        '$vectorSearch': {
        'index': index_name,
        'path': path,
        'query': query,
        'numCandidates': 150,
        'limit': 10
        }
    }, {
        '$project': {
            '_id':0,
            'title':1,
            'plot':1,
        'score': {
            '$meta': 'vectorSearchScore'
        }
        }
    }
    ]

    res = collection.aggregate(pipeline)
    return res

In [35]:
query = 'funny movies with out of world characters'
res = get_results(search_index_name, path, query)
for i in res:
    print(f"(Title: {i['title']}, Score: {i['score']}, Plot: {i['plot']}")

(Title: Jesus Christ Vampire Hunter, Score: 0.5138512849807739, Plot: Kung-Fu Action / Comedy / Horror / Musical about the second coming.
(Title: The Nine Lives of Tomas Katz, Score: 0.5127407908439636, Plot: The last day of creation. A stranger arrives in London. No one knows who he is or where he has come from. By the time he leaves, the entire universe will have been erased. A black comedy ...
(Title: King Size, Score: 0.5112178921699524, Plot: A comedy. The story follows a young scientist in the contemporary world, who actually came from the world of dwarves, thanks to a magic potion, held by the Big Eater, ruler of the dwarves. ...
(Title: Futurama: Bender's Game, Score: 0.5082553029060364, Plot: The Planet Express crew get trapped in a fantasy world.
(Title: Howard the Duck, Score: 0.504088282585144, Plot: A sarcastic humanoid duck is pulled from his homeworld to Earth where he must stop an alien invader.
(Title: Fantastic Four, Score: 0.5038604140281677, Plot: Four young outside

In [36]:
path = "plot"
embedding_model_1 = "voyage-3-large"
search_index_name_1 = "demo_test_large"

res = create_autoembed_index(path, embedding_model_1, search_index_name_1)


New search index named demo_test_large is building.
Polling to check if the index is ready. This may take up a short while depending on the size of your collection and the embedding model chosen.
demo_test_large is ready for querying.


In [37]:
query = 'funny movies with out of world characters'

res = get_results(search_index_name_1, path, query)
# print results
for i in res:
    print(f"(Title: {i['title']}, Score: {i['score']}, Plot: {i['plot']}")

(Title: King Size, Score: 0.5197460651397705, Plot: A comedy. The story follows a young scientist in the contemporary world, who actually came from the world of dwarves, thanks to a magic potion, held by the Big Eater, ruler of the dwarves. ...
(Title: Critters, Score: 0.5195062160491943, Plot: A race of small, furry aliens make lunch out of the locals in a farming town.
(Title: Message from Space, Score: 0.5171539187431335, Plot: In this Star Wars take-off, the peaceful planet of Jillucia has been nearly wiped out by the Gavanas, whose leader takes orders from his mother (played a comic actor in drag) rather than ...
(Title: Message from Space, Score: 0.5171539187431335, Plot: In this Star Wars take-off, the peaceful planet of Jillucia has been nearly wiped out by the Gavanas, whose leader takes orders from his mother (played a comic actor in drag) rather than ...
(Title: The Wrestler, Score: 0.5053803324699402, Plot: Ageing wrestler and circus strongman is put in an institution locat