Skip to content

mongodb-developer/Triggers-Sentence-Transformers

Repository files navigation

Huggingface Logo

Atlas Triggers And HuggingFace Sentence Transformers

The sample python code provided uses the all-MiniLM-L6-v2 sentence transformer model, from HuggingFace. It maps the sentences (docs to insert in collection as well as for query string) to a 384 dimensional dense vector space, and creates corresponding vector embeddings (list of numbers). This tutorial requires basic knowledge of Python, and assumes that you have an existing Atlas Cluster. Simple Vector demo below

Steps:

Install pymongo:

pip3 install pymongo

Install sentence-transformers:

pip3 install -U sentence-transformers

Run the following code to create the test collection with corresponding vector embeddings:

import pymongo
client = pymongo.MongoClient("SRV URL TO YOUR ATLAS CLUSTER")
db = client.vector_tests

from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

docs = [
    "The students studied for their exams.",
    "Studying hard, the students prepared for their exams.",
    "The chef cooked a delicious meal.",
    "The chef cooked the chicken with the vegetables.",
    "Known for its power and aggression, Mike Tyson's boxing style was feared by many."
]

print(docs)

result_doc = {}
for doc in docs:
    doc_vector = model.encode(doc).tolist()
    result_doc['sentence'] = doc
    result_doc['vectorEmbedding'] = doc_vector
    result = db.vectors_demo_1.insert_one(result_doc.copy())
    print(result)

Create the following Atlas Search index on the vectors_demo_1 collection:

{
  "mappings": {
    "dynamic": true,
    "fields": {
      "vectorEmbedding": {
        "type": "knnVector",
        "dimensions": 384,
        "similarity": "euclidean"
      }
    }
  }
}

Now you can run the following code to perform semantic search for various sentences (uncomment the query you want to run) - note that my index is named "default" and that's why the query does not specify the index name if your index is not named default then please include your index's name in the query:

import pymongo
client = pymongo.MongoClient("SRV URL TO YOUR ATLAS CLUSTER")
db = client.vector_tests

from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

# sample searches

#query = "The cook prepared a meal of poultry and veggies."
#query = "The pupils worked hard for their test."
#query = "Studying hard, the students prepped for their exams."
#query = "A delicious meal was cooked by the chef."
#query = "Tyson's boxing style was feared for its power and aggression."

vector_query = model.encode(query).tolist()
pipeline = [
    {
        "$search": {
            "knnBeta": {
                "vector": vector_query,
                "path": "vectorEmbedding",
                "k": 3
            }
        }
    },
    {
        "$limit": 1
    },
    {
        "$project": {
            "vectorEmbedding": 0,
            "_id": 0,
            'score': {
                '$meta': 'searchScore'
            }
        }
    }
]

results = db.vectors_demo_1.aggregate(pipeline)
for result in results:
    print("\n")
    print("*************Vector Search Result*****************")
    print(result['sentence'])
    print("**************************************************")

🤗 HuggingFace Documentation

🍃 Atlas Triggers Documentation

About

This trigger demonstrates how you can automatically update document embeddings whenever a new document is inserted or modified in a specific collection.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published