# AI email assistant with Composio and LanceDB

This guide demonstrates how to integrate Composio’s Gmail tool with LanceDB to create an agent that will automatically respond to emails by searching a LanceDB vectorstore for relevant information and using that information to draft replies. This setup is useful for automating email responses by leveraging a vector store that contains relevant data, such as FAQs or documentation. Composio allows an AI agent/your app to easily connect to your user's apps like Gmail, Slack, Trello etc. Paired together with LanceDB to store, manage and query data, more personalized and context-aware agents can be created.

## Setup and Dependencies

In [None]:
# Install dependencies
!pip install composio-langchain lancedb langchain-community langchain langchain-core langchain-openai

### Composio setup

To start using Composio's Gmail tool, you need to create an integration between your Gmail account and Composio.

This can be done simply with a command - 

In [None]:
!composio add gmail

Follow the link generated by the above command to complete authentication and finish setting up the integration.

To respond to emails automatically, we need to set up a trigger that listens for new emails. This can be done using the following command - 

In [None]:
!composio triggers enable gmail_new_gmail_message

This enables the `gmail_new_gmail_message` trigger. Now, when a new email arrives, the trigger provides data about the email like the sender, mail content etc., which can then be used to draft an appropriate reply to the email.

We will use OpenAI's embedding model and LLM in this example, so you need to set your OpenAI API key as an environment variable:

In [None]:
import os

os.environ["OPENAI_API_KEY"] = "your-openai-api-key"

### Get the data

Now, we'll get the data we want to store and query with our agent. For this example, we're going to use a research paper on AI, but this can be replaced with whatever docs you want to use.

In [None]:
!wget https://arxiv.org/pdf/1706.03762 -O ai.pdf

This will download the research paper from the above url and store it in a file named 'ai.pdf'.

### Generate embeddings and store data

Now, we generate embeddings for the data and then store the embeddings in a LanceDB vector store -

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings

# Create recusrive text splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=700, chunk_overlap=50)

embeddings = OpenAIEmbeddings()

In [None]:
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.vectorstores import LanceDB

# Load and split documents
loader = PyPDFLoader("ai.pdf", extract_images=False)
docs = loader.load_and_split(text_splitter)

# Add documents to LanceDB vector store
vectorstore = LanceDB.from_documents(documents=docs, embedding=embeddings)

### Initialize Composio's toolset

Now, we initialize Composio's toolset and get the `GMAIL_REPLY_TO_THREAD` action. This action is basically a function that the LLM can call. Composio provides these pre-built actions with optimized JSON schemas which makes it very easy to integrate external tools with LLMs/agentic workflows.

In [None]:
from composio_langchain import Action, ComposioToolSet

# Initialize Composio's toolset
toolset = ComposioToolSet()

replyTool = toolset.get_tools(
    actions=[
        Action.GMAIL_REPLY_TO_THREAD,  # Reply to a gmail thread
    ],
)

We can also define a custom tool that	queries the vector store based on the incoming query from a user - 

In [None]:
def search_db(query: str) -> str:
    """Searches the LanceDB vector store for user query and returns the results."""

    response = vectorstore.similarity_search(query)
    return response

Then, we create a `StructuredTool` from the above function - 

In [None]:
from langchain_core.tools import StructuredTool

searchTool = StructuredTool.from_function(
    func=search_db,
    name="search_db",
    description="Searches Lance DB vector store for the given query and returns the results.",
)

# Combine the reply tool and the search tool to pass to agent
tools: list[StructuredTool] = replyTool + [searchTool]

### Setting up the agent

In [None]:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain.agents import AgentExecutor, create_tool_calling_agent

# initialize LLM
llm = ChatOpenAI(model="gpt-4o-mini")

# The prompt can be customized to fit your use case
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are an AI email assistant that can write and reply to emails. You have to use the search_db tool to search the lance vector store for the user's query. When the user asks you a question, use the search_db tool to search the lance vector store and then answer the question using the search results. Send the answer back to the user in an email.",
        ),
        ("human", "{input}"),
        ("placeholder", "{agent_scratchpad}"),
    ]
)

agent = create_tool_calling_agent(
    llm=llm,
    tools=tools,
    prompt=prompt,
)

# Create an instance of AgentExecutor
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

Next, we create a listener for the trigger that we initialized in the beginning - 

In [None]:
# Create the listener
listener = toolset.create_trigger_listener()

# Attach listener to trigger and provide callback function that is executed when the trigger receives new data
@listener.callback(filters={"trigger_name": "GMAIL_NEW_GMAIL_MESSAGE"})
def on_new_gmail_message(event) -> None:
    try:
        print("data received - ", event)

        # Extract the relevant information from the event
        payload = event.payload
        thread_id = payload.get("threadId")
        message = payload.get("messageText")
        sender_mail = payload.get("sender")

        res = agent_executor.invoke(
            {"input": f"This is the query you have to respond to: {message}. It's from {sender_mail} and the threadId is {thread_id}."})
        print(res)
    except Exception as e:
        print("Error:", e)

print("Listener started!")
listener.listen()


```
Listener started!
[2024-09-18 20:05:59,715][INFO] Received trigger event with trigger ID: ea36d63f-5cc9-4581-9a19-b647e7468697 and trigger name: GMAIL_NEW_GMAIL_MESSAGE
data received -  appName='gmail' payload={'threadId': '192058f5b7fe8...', 'messageId': '192058f5b7fe8...', 'messageTimestamp': '2024-09-18T14:35:20Z', 'messageText': 'What are encoders?\r\n'...}

> Entering new AgentExecutor chain...

Invoking: `search_db` with `{'query': 'What are encoders?'}`

[Document(metadata={'page': 0, 'source': 'ai.pdf'}, page_content='convolutional neural networks that include an encoder and a decoder. The best\nperforming models also connect the encoder and decoder through an attention\nmechanism. We propose a new simple network architecture, the Transformer,\nbased solely on attention mechanisms, dispensing with recurrence and convolutions\nentirely. Experiments on two machine translation tasks show these models to\nbe superior in quality while being more parallelizable and requiring significantly\nless time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-\nto-German translation task, improving over the existing best results, including'), Document(metadata={'page': 1, 'source': 'ai.pdf'}, page_content='3 Model Architecture\nMost competitive neural sequence transduction models have an encoder-decoder structure [ 5,2,35].\nHere, the encoder maps an input sequence of symbol representations (x1, ..., x n)to a sequence\nof continuous representations z= (z1, ..., z n). Given z, the decoder then generates an output\nsequence (y1, ..., y m)of symbols one element at a time. At each step the model is auto-regressive\n[10], consuming the previously generated symbols as additional input when generating the next.\n2'), Document(metadata={'page': 4, 'source': 'ai.pdf'}, page_content='[38, 2, 9].\n•The encoder contains self-attention layers. In a self-attention layer all of the keys, values\nand queries come from the same place, in this case, the output of the previous layer in the\nencoder. Each position in the encoder can attend to all positions in the previous layer of the\nencoder.\n•Similarly, self-attention layers in the decoder allow each position in the decoder to attend to\nall positions in the decoder up to and including that position. We need to prevent leftward\ninformation flow in the decoder to preserve the auto-regressive property. We implement this\ninside of scaled dot-product attention by masking out (setting to −∞) all values in the input'), Document(metadata={'page': 1, 'source': 'ai.pdf'}, page_content='1 Introduction\nRecurrent neural networks, long short-term memory [ 13] and gated recurrent [ 7] neural networks\nin particular, have been firmly established as state of the art approaches in sequence modeling and\ntransduction problems such as language modeling and machine translation [ 35,2,5]. Numerous\nefforts have since continued to push the boundaries of recurrent language models and encoder-decoder\narchitectures [38, 24, 15].\nRecurrent models typically factor computation along the symbol positions of the input and output\nsequences. Aligning the positions to steps in computation time, they generate a sequence of hidden')]

Invoking: `GMAIL_REPLY_TO_THREAD` with `{'thread_id': '192058f5b7fe8...', 'message_body': 'Dear John,\n\nEncoders are a crucial component of many neural network architectures, particularly in sequence modeling tasks. In an encoder-decoder structure, the encoder maps an input sequence of symbols into a sequence of continuous representations. This allows the model to capture the underlying features of the input data.\n\nFor example, in the case of machine translation, the encoder processes the input sentence and generates a representation that the decoder then uses to produce the translated output sentence.\n\nThe encoder often includes self-attention layers, which allow it to consider all positions in the input when creating its representations. This mechanism is part of what makes modern models like Transformers so effective.\n\nIf you have any more questions or need further clarification, feel free to ask!\n\nBest regards,\n[Your Name]', 'recipient_email': 'example@gmail.com'}`


[2024-09-18 20:06:11,352][INFO] Executing `GMAIL_REPLY_TO_THREAD` with params={'thread_id': '192058f5b7fe8d17', 'message_body': 'Dear John,\n\nEncoders are a crucial component of many neural network architectures, particularly in sequence modeling tasks. In an encoder-decoder structur...
[2024-09-18 20:06:13,109][INFO] Got response={'successfull': True, 'data': {'response_data': {'id': '192058ff6bbec...', 'threadId': '192058f5b7fe8...', 'labelIds': ['SENT']}}, 'error': None} from action=<composio.client.enums._action.Action object at 0x7f753839ea70> with params={'thread_...
{'successfull': True, 'data': {'response_data': {'id': '192058ff6bbec...', 'threadId': '192058f5b7fe8...', 'labelIds': ['SENT']}}, 'error': None}I've replied to John's email regarding encoders. If you need anything else, feel free to ask!

> Finished chain.

### Connect with Composio and learn more

If you encounter any problems, please let us know at our [Discord](https://discord.com/invite/cNruWaAhQk).

Check out [Composio's documentation](https://docs.composio.dev/introduction/intro/overview) to learn more about how to use and integrate various tools for different usecases.