Step 1: Create a Cluster and Get Connection String

- In your MongoDB Atlas dashboard, create a new cluster if you haven't already.
- Go to "Connect" for your cluster, then choose "Connect your application".
- Copy the connection string URI provided, it will look like:
- Under Security click network accesa and set ip to 0.0.0.0/0

In [21]:
!pip install pymongo certifi



In [24]:
import certifi
from pymongo import MongoClient
from google.colab import userdata

db_user = userdata.get('db_user')
db_password = userdata.get('db_password')
db_name = userdata.get('db_name')

# Construct the connection string using credentials from Colab secrets
# Ensure your MongoDB Atlas cluster name and domain are correct
connection_string = f"mongodb+srv://{db_user}:{db_password}@ai-master.rmxat2v.mongodb.net/?appName={db_name}"

# Initialize the MongoDB client and get the database instance
client = MongoClient(connection_string, tls=True, tlsCAFile=certifi.where())
db = client[db_name]
print("Connected to the database.")

Connected to the database.


In [25]:
# Define sample documents
sample_docs = [
    {"title": "Apple launches new iPhone", "description": "The latest iPhone model offers improved battery and camera."},
    {"title": "Bananas are rich in potassium", "description": "Eating bananas can help regulate blood pressure."},
    {"title": "How to bake an apple pie", "description": "Step-by-step guide to baking a classic apple pie."},
    {"title": "Google introduces search update", "description": "Semantic search improvements make results more relevant."}
]

# Insert sample documents
collection = db["documents"]

for doc in sample_docs:
    collection.update_one(
        {"title": doc["title"]},  # Match if a doc with this title exists
        {"$set": doc},            # Update its fields; or insert if not found
        upsert=True               # Create if not found
    )
print("Sample documents inserted or updated.")

Sample documents inserted or updated.


In [28]:
keyword = "apple"
# Find documents where the keyword appears in the title (case-insensitive)
results = collection.find({"title": {"$regex": keyword, "$options": "i"}})

# Print out the matched documents
for doc in results:
    print(doc)


{'_id': ObjectId('6914a94ec44a69d0293452c9'), 'title': 'Apple launches new iPhone', 'description': 'The latest iPhone model offers improved battery and camera.'}
{'_id': ObjectId('6914a94ec44a69d0293452cb'), 'title': 'How to bake an apple pie', 'description': 'Step-by-step guide to baking a classic apple pie.'}


In [29]:
keyword = "apple"
# Search for documents where 'description' contains the keyword (case-insensitive)
results = collection.find({"description": {"$regex": keyword, "$options": "i"}})

# Print matched documents from 'description'
for doc in results:
    print(doc)


{'_id': ObjectId('6914a94ec44a69d0293452cb'), 'title': 'How to bake an apple pie', 'description': 'Step-by-step guide to baking a classic apple pie.'}


In [30]:
# Create a compound text index on 'title' and 'description'
# When you create a text index on multiple fields (like title and description), MongoDB's full-text search allows you to
# - Search across all indexed fields with a single query.
# - Find documents containing the keyword(s) in any of the indexed fields.
# - Support more advanced features like relevance scoring, phrase matching etc
collection.create_index([("title", "text"), ("description", "text")])
print("Text index created on 'title' and 'description'.")

Text index created on 'title' and 'description'.


In [31]:
search_query = "apple"
results = collection.find({"$text": {"$search": search_query}})

for doc in results:
    print(doc)

{'_id': ObjectId('6914a94ec44a69d0293452cb'), 'title': 'How to bake an apple pie', 'description': 'Step-by-step guide to baking a classic apple pie.'}
{'_id': ObjectId('6914a94ec44a69d0293452c9'), 'title': 'Apple launches new iPhone', 'description': 'The latest iPhone model offers improved battery and camera.'}


In [34]:
search_query = "apple"
# Perform a text search and project the relevance score
results = collection.find(
    {"$text": {"$search": search_query}},
    {"score": {"$meta": "textScore"}, "title": 1, "description": 1}
).sort([("score", {"$meta": "textScore"})])

# Print out the results sorted by relevance
for doc in results:
    print(doc)


{'_id': ObjectId('6914a94ec44a69d0293452cb'), 'title': 'How to bake an apple pie', 'description': 'Step-by-step guide to baking a classic apple pie.', 'score': 1.2380952380952381}
{'_id': ObjectId('6914a94ec44a69d0293452c9'), 'title': 'Apple launches new iPhone', 'description': 'The latest iPhone model offers improved battery and camera.', 'score': 0.625}


In [35]:
# Limitation 1: Keyword Search Is Literal (misses synonyms)
# Searching for "fruit" will return no results, even though you have documents about "apple" and "bananas",
# because none of the documents literally contain the word "fruit".
search_query = "fruit"
results = collection.find({"$text": {"$search": search_query}})
print(f"Results for keyword '{search_query}':")
for doc in results:
    print(doc)

Results for keyword 'fruit':


In [36]:
# Limitation 2: Keyword Search Misses Related Concepts
# Searching for "smartphone" will not match the iPhone document, as "iPhone" is not the literal word "smartphone".
search_query = "smartphone"
results = collection.find({"$text": {"$search": search_query}})
print(f"Results for keyword '{search_query}':")
for doc in results:
    print(doc)

Results for keyword 'smartphone':


In [37]:
# Limitation 3: Misspellings Are Not Handled
# Searching for a misspelled keyword like "banannas" won't find the correct document about "bananas".
search_query = "banannas"
results = collection.find({"$text": {"$search": search_query}})
print(f"Results for keyword '{search_query}':")
for doc in results:
    print(doc)

Results for keyword 'banannas':


In [38]:
# Limitation 4: Keyword Search May Not Find Contextually Similar Content
# For example, searching "camera" will only find documents that use this word, not those that mention related terms like "photography".
search_query = "camera"
results = collection.find({"$text": {"$search": search_query}})
print(f"Results for keyword '{search_query}':")
for doc in results:
    print(doc)

Results for keyword 'camera':
{'_id': ObjectId('6914a94ec44a69d0293452c9'), 'title': 'Apple launches new iPhone', 'description': 'The latest iPhone model offers improved battery and camera.'}
