# Using the Box File Search Tool with your LangGraph agent

In this document, we will show a simple example of using the BoxFileSearchTool with OpenAI and LangGraph as an agent. This is a simple use case so we can focus on getting it up and running quickly, but the sky's the limit. With this tool you can easily use full text search to find and retrieve text from documents stored in [Box](https://box.com) to augment your AI workflows.

Now you will need a Box instance, but don't worry, we have you covered. You can sign up for a free Developer Account [here](https://account.box.com/signup/n/developer#ty9l3). Once you are signed up, you'll need to [create your Box app](https://cloud.app.box.com/developers/console) and [enable it in your admin console](https://developer.box.com/guides/authorization/custom-app-approval/) to follow along. Remember when following the steps in enabling in the admin console, with the free developer account, you are the admin! And don't worry, if you need help, you can all our developer documentation [here](https://developer.box.com), or find us on the [community site](https://forum.box.com) where we are always happy to help. 

With all that out of the way, let's get started.

## Set up Authentication

As you might imagine, Both Box and OpenAI require authentication to access data through the APIs. As such, you will need both your `OPENAI_API_KEY` and your `BOX_DEVELOPER_TOKEN` to follow along. When you created your Box app, you had to choose an authentication type. Whether you chose Oauth2, JWT, or CCG, you will have access to a developer token. These are meant for development and not production. For this reason, these tokens are only valid for one hour. In the Developer Console, click into the app you created, navigate to the `Configuration` tab, and you should see a section dedicated to the Developer Token and either a button to `Generate Developer Token` or you will see a field with the token. If you see a token there, I'd recommend revoking that token and creating a new one unless you either generated it your self in the last hour or are sharing that app with another developer.

Now that you have your OpenAI API key and you Box Developer Token, let's use getpass to add them to your environment.

In [1]:
import getpass
import os

# Get your OpenAI API key and save to your environment
os.environ["OPENAI_API_KEY"] = getpass.getpass()

# Get your Box API developer token and save to your environment
os.environ["BOX_DEVELOPER_TOKEN"] = getpass.getpass()

 ········
 ········


## Let's start the fun stuff

Now that we have that out of the way, let's dive in and test the BoxFileSearchTool to see what it does. When I wrote this, I was testing with a script I downloaded from the interwebs and my `box_search_query` is one of the first lines in the script. You will need to upload some files to test with. 

The Box search functionality in general, and by proxy, the search API implementation, is a full text search, which runs against the fully-indexed content that you have permission to view in your Box instance. As such, if your search query is vague, you might end up with a lot of results. Searching for a line in a script is a little pedantic, but I recommend running this script a few times to get a sense for the search query that will return the results you are looking for. 

The next step in this script is to instantiate the BoxFileSearchTool. We are using environment variables to store the Developer Token, but if you were using something like python_dotenv, you can also pass the token: `box_search = BoxFileSearchTool(box_developer_token="your_token")`

Finally, to illustrate what the tool returns -- spoiler alert, its a List of Document objects containing the text representation of all the files that match the search criteria -- and to help hone in on your search query to limit the Box data to only include what you want. 

In [2]:
from langchain_community.tools.box.box_file_search import BoxFileSearchTool
from langchain_community.tools.box.box_text_rep import BoxTextRepTool

# This is the query the BoxFileSearchTool will use to find files relevant to your
# agent work flow. Search can return a lot of files, so making your query as 
# exact as possible will greatly improve your results. 
box_search_query="\"EXT. 8TH STREET BETWEEN AVENUES C AND D - DAY\""

# Let's instantiate the BoxFileSearchTool
box_search_tool = BoxFileSearchTool()

# And just for fun, let's run it manually and see what it returns. 
search_results = box_search_tool._run(box_search_query)
print(search_results)

GDBS: query "EXT. 8TH STREET BETWEEN AVENUES C AND D - DAY" token vGXjQ44vaMhk7LAbLv75oqgd4XMXLuip
GDBS: files ['1233039227512', '1169674971571']
[Document(metadata={'source': 'https://dl.boxcloud.com/api/2.0/internal_files/1233039227512/versions/1346280085912/representations/extracted_text/content/', 'title': 'FIVE_FEET_AND_RISING_by_Peter_Sollett_pdf'}, page_content='\n3/20/23, 5:31 PM FIVE FEET AND RISING by Peter Sollett\n\nwww.dailyscript.com/scripts/fivefeetandrising.html 1/25\n\nFIVE FEET AND RISING\n\nby\n\nPeter Sollett\n\nFADE IN:\n\nEXT. 8TH STREET BETWEEN AVENUES C AND D - DAY \n\nA group of dark-skinned girls wearing cheerleading outfits \nalign themselves in formation on the sidewalk. They begin to \ndance. No music can be heard. The sound of the girls\' bodies \nis our soundtrack. We hear their strained breathing, palms \nand sneaker bottoms pounding while they hum and count softly \nto themselves in an effort to keep the rhythm.\n\nSLO-MO: We explore the bodies of the d

## Time to bring in Open AI

Now that we have settled on a query and we've seen what we got back from the BoxFileSearchTool, let's start laying the groundwork for OpenAI so we can do something meaningful with it. The first step is to create a List of tools that we want to include. For this scenario, we'll just be using the BoxFileSearchTool to pull in our document from Box and send it along with our prompt of OpenAI.

We'll use the ChatOpenAI model with gpt-4.

For fun, we'll send our Box search query to OpenAI to see what we'd get without including our document for context, and also to make sure our connection is working.

In [3]:
from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI

# Add the instantiated BoxFileSearchTool to our List of tools
tools = [box_search_tool]

# Now let's set up our ChatOpenAI model so we can ask questions
model = ChatOpenAI(model="gpt-4")

# And for fun, let's ask ChatGPT our search query just to see what is returns
response = model.invoke([HumanMessage(content=box_search_query)])
response.content

"A bustling New York City street, alive with the energy of a typical weekday. People walk briskly down the sidewalk, dodging street vendors selling everything from hot dogs to handbags. The sounds of car horns, chatter, and the occasional siren fill the air.\n\nTall buildings tower on either side of the street, their exteriors a mishmash of brick, steel, and glass. The scent of freshly baked bread wafts out from a small bakery, mingling with the city's distinctive mix of exhaust fumes and hot asphalt.\n\nAt the corner, a group of street performers attract a crowd with their acrobatics. A mother pushes a stroller, a businessman hails a cab, a group of teenagers laugh and joke as they meander down the street.\n\nThis is a city in full swing, unapologetically loud and unceasingly vibrant."

## Add our tools to the model

Now let's get our model set up to include our tools. We'll start by setting the prompt that we want to send to OpenAI. This should instruct the model to find our file in Box and then describe what we want to do with it. We should be very specific in what we want to know and how we want to be answered. As mentioned above, the use case I was trying to solve when I wrote this notebook was to find a specific script in my Box account and create a new script for a voice over artist to record a commercial. I recommend leaving the first sentence intact. This will instruct the BoxFileSearchTool to find our document based on the query we wrote above. Everything after that should be specific to your use case. 

Once we have our prompt set, let's invoke the model manually to see what happens. 

In [4]:
# Now let's bind our tools to the actual model.
model_with_tools = model.bind_tools(tools)

# This is the prompt we'll sent to OpenAI. In my test, I am using a script from a short
# film and trying to write a commercial to make people see it. I would suggest keeping
# `find a document containing the {box_search_query}`. Everything else should be specific
# to your use case.
gpt_prompt = f"find a document containing the {box_search_query}. If there are multiple documents returned, choose only the first one. Then create a script for a voiceover based on the document for a 30 second commercial. The content of this commercial should entice a viewer to want to see the movie. The voice should be that of a professional voice over actor, and the tone should be excited."

# We can manually `invoke` the model with our prompt to see what happens.
response = model_with_tools.invoke([HumanMessage(content=gpt_prompt)])

print(f"ContentString: {response.content}")
print(f"ToolCalls: {response.tool_calls}")

ContentString: 
ToolCalls: [{'name': 'box_file_search', 'args': {'query': 'EXT. 8TH STREET BETWEEN AVENUES C AND D - DAY'}, 'id': 'call_8WPOpml82AXkHw6hHudOHiTf'}]


When you ran the last script, you should have seen something similar to:

```
ContentString: 
ToolCalls: [{'name': 'box_file_search', 'args': {'query': 'EXT. 8TH STREET BETWEEN AVENUES C AND D - DAY'}, 'id': 'call_ONg4ZBlrWcafkd8l7PACbh3b'}]
```

Notice the ContentString is empty, but the model knows it needs to send our search query to the BoxFileSearchTool. Pretty cool!

## Now let's get started with agents

The beauty of these tools, is that with LangGraph, we let the library do all the heavy lifting. Rather than doing everything manually, an agent can figure out what needs to be done and do it autonomously. 

To start, we'll be using the prebuilt react agent. Let's go ahead and create the `agent_executor` that will work its magic for us. Above, we bound our tools directly to the model. With `create_react_agent`, we simply pass the model and tools to the object and let it do it's thing. 

In [5]:
from langgraph.prebuilt import create_react_agent

# Now let's look at how to turn this into an agent.
# We'll start by creating our agent executor with our model and tools.
agent_executor = create_react_agent(model, tools)

## Next step, make the agent go to work

Now that we have created our agent_executor, we are ready to see it work. First, let's use its invoke method to kick off our workflow. To do this, we call the `invoke()` method and pass it our gpt prompt as a HumanMessage, and assign it to messages. 

In [6]:
# Again, we can invoke our agent manually, but it can take a while 
# and you won't get your answer until it's finished.
response = agent_executor.invoke(
    {"messages": [HumanMessage(content=gpt_prompt)]}
)
response["messages"]

GDBS: query EXT. 8TH STREET BETWEEN AVENUES C AND D - DAY token vGXjQ44vaMhk7LAbLv75oqgd4XMXLuip
GDBS: files ['1233039227512', '1169674971571']


[HumanMessage(content='find a document containing the "EXT. 8TH STREET BETWEEN AVENUES C AND D - DAY". If there are multiple documents returned, choose only the first one. Then create a script for a voiceover based on the document for a 30 second commercial. The content of this commercial should entice a viewer to want to see the movie. The voice should be that of a professional voice over actor, and the tone should be excited.', id='1ba274bd-4ad2-4d37-9d31-67c6bf4a3360'),
 AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_MZjcwQCdQEAwBHZMw52tGq9N', 'function': {'arguments': '{\n  "query": "EXT. 8TH STREET BETWEEN AVENUES C AND D - DAY"\n}', 'name': 'box_file_search'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 30, 'prompt_tokens': 171, 'total_tokens': 201}, 'model_name': 'gpt-4-0613', 'system_fingerprint': None, 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-49a1bcf9-c493-4166-a66b-a1ee97a0ed5b-0', tool_calls=[{'name': 

In the last script, you may have noticed that the response was pretty slow. By using invoke, the agent will do its thing and then give you the answer when its all finished. With this small use case, it wasn't terribly inconvenient, but with Box instances with petabytes of data, this could take a long time. 

## Time to stream

To get a more real-time experience in getting our responses back, our agent provides a `stream` method. Using this with a for-loop, we can get an update as each step occurs and take action. The setup is the same, send our prompt as a HumanMessage. This time, we'll see the results of each step as they complete. 

In [7]:
# Now instead of invoking, let's stream the responses!
for chunk in agent_executor.stream(
    {"messages": [HumanMessage(content=gpt_prompt)]}
):
    print(chunk)
    print("----")

{'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_ZKYqLjKz2YAp9qk2unqQ15Gp', 'function': {'arguments': '{\n  "query": "EXT. 8TH STREET BETWEEN AVENUES C AND D - DAY"\n}', 'name': 'box_file_search'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 30, 'prompt_tokens': 171, 'total_tokens': 201}, 'model_name': 'gpt-4-0613', 'system_fingerprint': None, 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-8dd8c29b-8813-47e0-a9ed-ae5804dfbc69-0', tool_calls=[{'name': 'box_file_search', 'args': {'query': 'EXT. 8TH STREET BETWEEN AVENUES C AND D - DAY'}, 'id': 'call_ZKYqLjKz2YAp9qk2unqQ15Gp'}], usage_metadata={'input_tokens': 171, 'output_tokens': 30, 'total_tokens': 201})]}}
----
GDBS: query EXT. 8TH STREET BETWEEN AVENUES C AND D - DAY token vGXjQ44vaMhk7LAbLv75oqgd4XMXLuip
GDBS: files ['1233039227512', '1169674971571']
{'tools': {'messages': [ToolMessage(content='[Document(metadata={\'source\': \'https://dl.boxcl

And that's it! With the magic of LangChain and the BoxFileSearchTool, we were able to create an agent and let it work. 

## Summary
To summarize what we did today, we:
* Instantiated our BoxFileSearchTool and narrowed down our search query
* Created our model
* Created an agent tied to our model and our tool
* Invoked our agent and got our answer

If you have questions, comments, or issues, join us in our [community forums](https://forums.box.com). We'd love to hear from you.

Happy coding!