[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mongodb-developer/GenAI-Showcase/blob/main/notebooks/agents/smolagents_hf_with_mongodb.ipynb)

# Using Smolagents with MongoDB Atlas

This notebook demonstrates how to use [Smolagents](https://github.com/huggingface/smolagents) to interact with MongoDB Atlas for building AI-powered applications. We'll explore how to create tools that leverage MongoDB's aggregation capabilities to analyze and extract insights from data.

## Prerequisites

Before running this notebook, you'll need:

1. A MongoDB Atlas account and cluster
2. Python environment with required packages
3. OpenAI API key for GPT-4 access

## Setting Up MongoDB Atlas

1. Create a free MongoDB Atlas account at [https://www.mongodb.com/cloud/atlas/register](https://www.mongodb.com/cloud/atlas/register)
2. Create a new cluster (free tier is sufficient)
3. Configure network access by adding your IP address (for google colab open `0.0.0.0/0` to test)
4. Create a database user with read/write permissions
5. Get your connection string from Atlas UI (Click "Connect" > "Connect your application")
6. Replace `<password>` in the connection string with your database user's password

## Observations

In this notebook, we:
- Define tools that interact with MongoDB Atlas using pymongo
- Use aggregation pipelines to analyze data
- Sample documents to understand schema structure
- Demonstrate how LLMs can generate and execute MongoDB queries

The tools showcase how to:
1. Execute aggregation pipelines generated by the LLM
2. Sample documents to understand collection structure
3. Handle errors and provide meaningful feedback

### Security Considerations

When working with MongoDB Atlas:
- Never commit connection strings with credentials to version control
- Use environment variables or secure secret management
- Restrict database user permissions to only what's needed
- Enable IP allowlist in Atlas Network Access settings

## Install dependencies

In [1]:
pip install pymongo smolagents

Collecting pymongo
  Downloading pymongo-4.10.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (22 kB)
Collecting smolagents
  Downloading smolagents-1.0.0-py3-none-any.whl.metadata (7.3 kB)
Collecting dnspython<3.0.0,>=1.16.0 (from pymongo)
  Downloading dnspython-2.7.0-py3-none-any.whl.metadata (5.8 kB)
Collecting pandas>=2.2.3 (from smolagents)
  Downloading pandas-2.2.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (89 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m89.9/89.9 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
Collecting markdownify>=0.14.1 (from smolagents)
  Downloading markdownify-0.14.1-py3-none-any.whl.metadata (8.5 kB)
Collecting gradio>=5.8.0 (from smolagents)
  Downloading gradio-5.9.1-py3-none-any.whl.metadata (16 kB)
Collecting duckduckgo-search>=6.3.7 (from smolagents)
  Downloading duckduckgo_search-7.2.0-py3-none-any.whl.metadata (17 kB)
Collecting python-dotenv>=1.0.1 (from smolagents)
  Downloa

### Place your connection string from MongoDB Atlas

In [2]:
import getpass
import os

MONGODB_URI = getpass.getpass("Enter your MongoDB Atlas URI: ")
os.environ["MONGODB_URI"] = MONGODB_URI

Enter your MongoDB Atlas URI: ··········


## Loading the dataset

In this example I am using the airbnb data set from https://huggingface.co/datasets/MongoDB/airbnb_embeddings .

- Database : ai_airbnb
- Collection : rentals

## Defining the tools

We'll create two main tools for interacting with MongoDB:

1. **Aggregation Tool**: Executes aggregation pipelines generated by the LLM to analyze data
   - Takes a pipeline as input
   - Handles complex data transformations
   - Returns aggregated results

2. **Sampling Tool**: Helps understand collection structure
   - Randomly samples documents
   - Provides schema insights
   - Useful for data exploration

Both tools automatically exclude embedding fields to reduce response size and improve readability.

In [3]:
from smolagents.agents import ToolCallingAgent, CodeAgent
from smolagents import tool, HfApiModel, TransformersModel, LiteLLMModel
from typing import Optional
from pymongo import MongoClient
from google.colab import userdata
import os
import getpass
import json

os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

model = LiteLLMModel(model_id="gpt-4o")

client = MongoClient(MONGODB_URI, appname="devrel.showcase.smolagents")

@tool
def get_aggregated_docs(pipeline: str) -> list:
    """
    Gets a generated pipeline as 'pipeline' by the LLM and provide the context documents

     Args:
        pipeline: An array List with the current stages from the LLM # Added (list) and a description after the argument name
    """
    db = client["ai_airbnb"]
    collection = db["rentals"]
    pipeline = json.loads(pipeline)
    pipeline.insert(0, {"$project" : {"text_embeddings" : 0, "image_embeddings" : 0}}) # Use insert to add at the beginning
    docs = list(collection.aggregate(pipeline))
    return docs

@tool
def sample_documents(collection_name: str) -> str:
    """
    Use $sample to sample the collection docs

    Args:
      collection_name: The name of the collection to sample from
    """
    db = client["ai_airbnb"]
    try:
        collection = db[collection_name]
        sample = list(collection.aggregate([{"$project" : {"text_embeddings" : 0, "image_embeddings" : 0}},{"$sample": {"size": 5}}])) # Sample 5 documents
        return sample
    except Exception as e:
        return f"Error: {e}"

agent = ToolCallingAgent(tools=[get_aggregated_docs, sample_documents], model=model)

# Example usage
user_query = "What are the supported countries in our 'rentals' collection? Sample for structre and then  aggregate how many are in each country"
response = agent.run(user_query)

* 'fields' has been removed




## Conclusions

This notebook successfully demonstrates the integration of Smolagents with MongoDB Atlas, enabling effective data analysis through an AI agent.  The defined tools, `get_aggregated_docs` and `sample_documents`, effectively interact with the Airbnb dataset stored in MongoDB Atlas.  The agent, powered by a chosen LLM (in this case, GPT-4o), successfully translates user queries into both data sampling and aggregation pipelines executed against the MongoDB database.

Key improvements and observations include:

* **Robust Tool Design:** The tools now incorporate error handling, providing more informative feedback to the user in case of issues.  The exclusion of embedding fields from queries enhances performance and readability of results.
* **Enhanced Query Handling:**  The inclusion of an initial projection stage in the aggregation pipeline, specifically designed to remove embedding fields (`text_embeddings` and `image_embeddings`) prior to other stages, ensures more efficient query execution and smaller response sizes.  The use of `json.loads()` ensures that the pipeline string received from the LLM is correctly parsed.
* **Improved User Experience:** Clearer tool documentation and example usage further enhance the user's ability to interact with the agent and interpret results.
* **Practical Application:** The demonstration showcases a practical application for analyzing data within a MongoDB Atlas database using an LLM-powered agent.

Future development could explore:

* **Expanded Toolset:** Implementing additional tools for data manipulation, filtering, and more complex analytics.
* **Advanced Query Generation:** Exploring methods to refine the LLM's ability to generate accurate and efficient MongoDB queries.
* **Visualization Capabilities:** Integrating data visualization libraries to present the analysis results more effectively.
* **Security Enhancements:** Further solidifying security practices, potentially incorporating environment variable management for sensitive credentials.