# Agent example for Fireworks + MongoDB + Nomic embedding model

## Introduction
We just went through the Fireworks tutorial for MongoDB and nomic embedding, with all the data indexed. Now we are going to dig into using how to use an agent framework to drive the interaction, since you may not want to just use embedding to do the work sometimes

## Setting Up Your Environment
Before we dive into the code, make sure to set up your environment. This involves installing necessary packages like pymongo and openai. Run the following command in your notebook to install these packages:

In [5]:
!pip install pymongo openai tqdm langchain openai langchain_openai langchainhub numexpr

Collecting langchain
  Downloading langchain-0.1.8-py3-none-any.whl.metadata (13 kB)
Collecting langchain_openai
  Downloading langchain_openai-0.0.6-py3-none-any.whl.metadata (2.5 kB)
Collecting langchainhub
  Downloading langchainhub-0.1.14-py3-none-any.whl.metadata (478 bytes)
Collecting numexpr
  Downloading numexpr-2.9.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.9 kB)
Collecting jsonpatch<2.0,>=1.33 (from langchain)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl.metadata (3.0 kB)
Collecting langchain-community<0.1,>=0.0.21 (from langchain)
  Downloading langchain_community-0.0.21-py3-none-any.whl.metadata (8.1 kB)
Collecting langchain-core<0.2,>=0.1.24 (from langchain)
  Downloading langchain_core-0.1.25-py3-none-any.whl.metadata (6.0 kB)
Collecting langsmith<0.2.0,>=0.1.0 (from langchain)
  Downloading langsmith-0.1.5-py3-none-any.whl.metadata (13 kB)
Collecting openai
  Downloading openai-1.12.0-py3-none-any.whl.metadata (18 kB)
Collecting types-r

## Initializing Fireworks and MongoDB Clients
To interact with Fireworks and MongoDB, we need to initialize their respective clients. Replace "YOUR FIREWORKS API KEY" and "YOUR MONGO URL" with your actual credentials.

In [2]:
import openai
import pymongo

mongo_url = input()
client = pymongo.MongoClient(mongo_url)

In [3]:
fw_api_key = input()
fw_client = openai.OpenAI(
  api_key=fw_api_key,
  base_url="https://api.fireworks.ai/inference/v1"
)

# Picking an agent framework you prefer
There are many agent frameworks on the market for you to choose from. We will use LangChain as the tool to drive MongoDB integration here. There are a few steps we need to take here
- A tool that fetches the MongoDB schema, so our function calling model can know what field to filter on
- A tool that fetches embeddings given the user query
- A tool that executes MongoDB queries, with both the filter and the vector embeddings

We will first begin with basic LangChain tool setup

In [43]:
import os

from langchain.agents import initialize_agent, Tool
from langchain.agents import AgentType
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain import hub
from langchain.pydantic_v1 import BaseModel, Field
from langchain.tools import BaseTool, StructuredTool, tool
from typing import Optional, Type
from langchain.globals import set_debug
from langchain.globals import set_verbose


set_debug(True)
set_verbose(True)



llm = ChatOpenAI(
    base_url="https://api.fireworks.ai/inference/v1",
    api_key=fw_api_key,
    model="accounts/fireworks/models/firefunction-v1",
    temperature=0.0,
    max_tokens=256,
)


In [70]:
from typing import List, Dict, Any
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

class FetchSchemaInput(BaseModel):
    db: str = Field(description="name of the database")
    collection: str = Field(description="name of the collection")

class FetchSchemaTool(BaseTool):
    name: str = "FetchSchemaTool"
    description: str = "Get the schema for the corresponding db and collection"
    args_schema: Type[BaseModel] = FetchSchemaInput

    def _run(self, db: str, collection: str) -> Dict[str, Any]:
        """Fetch the schema from the specified db and collection

        Args:
            db: name of the mongodb database
            collection: name of the mongodb collection
        """
        return dict([(x, type(y)) for x, y in client[db][collection].find_one().items()])

class MongoDBAtlasSearchInput(BaseModel):
    user_query: str = Field(description="faith representation of user query to generate the embeddings from")
    db: str = Field(description="name of the database")
    collection: str = Field(description="name of the collection")
    filter: Dict[str, Any] = Field(description="mongodb filter for the given query")

class MongoDBAtlasSearchTool(BaseTool):
    name: str = "MongoDBAtlasSearchTool"
    description: str = "Look up into MongoDB Altas based on filters and embeddings"
    args_schema: Type[BaseModel] = MongoDBAtlasSearchInput

    def _run(self, user_query: str, db: str, collection: str, filter: Dict[str, Any]) -> Dict[str, Any]:
        """Fetch the schema from the specified db and collection
        """
        embedding = [x.embedding for x in 
            fw_client.embeddings.create(
            input=[user_query],
            model="nomic-ai/nomic-embed-text-v1.5"
        ).data][0]
        return {doc['title'] for doc in client[db][collection].aggregate([{
            # vector search fields are hard coded right now to make things cleaner, but we can also inject into
            # the system prompt and let the model handle it
            "$vectorSearch": {
                "index": "movie_index",
                "path": "embedding_2k_movies_fw_nomic_1_5",
                "queryVector": embedding,
                "numCandidates": 100,
                "limit": 10,
                "filter": filter
            }
        }])}

tools = [
  FetchSchemaTool(),
  MongoDBAtlasSearchTool(),
]

system_prompt = """
You are an expert MongoDB user that turns user query into model requests.
Whenever you are faced with a user query, you should first fetch the mongoDB schema for database "sample_mflix" and collection "movies".
Then, you respond to the original user query by looking into MongoDB Atlas
- with filter that best match the user query given the schema. The filters will be constructed from the schema you fetched earlier.
- as well as the embeddings.
After getting the result from MongoDB Atlas, only then respond with your movies recommendations given the fetch results. Do not prematurely respond with any movies recommendations.
"""

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        MessagesPlaceholder("chat_history", optional=True),
        ("human", "{input}"),
        MessagesPlaceholder("agent_scratchpad"),
    ]
)

agent = create_openai_tools_agent(llm, tools, prompt)

agent = AgentExecutor(agent=agent, tools=tools, verbose=True)

In [71]:
print(agent.invoke({"input": "I like spiderman, any similar movies you can recommend after the 2000s?"}))



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `FetchSchemaTool` with `{'db': 'sample_mflix', 'collection': 'movies'}`


[0m[36;1m[1;3m{'_id': <class 'bson.objectid.ObjectId'>, 'plot': <class 'str'>, 'genres': <class 'list'>, 'runtime': <class 'int'>, 'cast': <class 'list'>, 'num_mflix_comments': <class 'int'>, 'poster': <class 'str'>, 'title': <class 'str'>, 'fullplot': <class 'str'>, 'languages': <class 'list'>, 'released': <class 'datetime.datetime'>, 'directors': <class 'list'>, 'writers': <class 'list'>, 'awards': <class 'dict'>, 'lastupdated': <class 'str'>, 'year': <class 'int'>, 'imdb': <class 'dict'>, 'countries': <class 'list'>, 'type': <class 'str'>, 'tomatoes': <class 'dict'>, 'embedding_2k_movies_fw_e5_mistral': <class 'list'>}[0m[32;1m[1;3m
Invoking: `MongoDBAtlasSearchTool` with `{'user_query': 'spiderman', 'db': 'sample_mflix', 'collection': 'movies', 'filter': {'year': {'$gte': 2000}}}`


[0m[33;1m[1;3m{'X-Men', 'Fantastic Four', 'Alon