Skip to content

Commit

Permalink
Integrate E2B's data analysis/code interpreter (#12011)
Browse files Browse the repository at this point in the history
This PR adds a data [E2B's](https://e2b.dev/) analysis/code interpreter
sandbox as a tool

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Jakub Novak <jakub@e2b.dev>
  • Loading branch information
3 people committed Oct 24, 2023
1 parent d2cb95c commit 1f80949
Show file tree
Hide file tree
Showing 6 changed files with 1,344 additions and 0 deletions.
367 changes: 367 additions & 0 deletions docs/docs/integrations/tools/e2b_data_analysis.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,367 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# E2B Data Analysis\n",
"\n",
"[E2B's cloud environments](https://e2b.dev) are great runtime sandboxes for LLMs.\n",
"\n",
"E2B's Data Analysis sandbox allows for safe code execution in a sandboxed environment. This is ideal for building tools such as code interpreters, or Advanced Data Analysis like in ChatGPT.\n",
"\n",
"E2B Data Analysis sandbox allows you to:\n",
"- Run Python code\n",
"- Generate charts via matplotlib\n",
"- Install Python packages dynamically durint runtime\n",
"- Install system packages dynamically during runtime\n",
"- Run shell commands\n",
"- Upload and download files\n",
"\n",
"We'll create a simple OpenAI agent that will use E2B's Data Analysis sandbox to perform analysis on a uploaded files using Python."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Get your OpenAI API key and [E2B API key here](https://e2b.dev/docs/getting-started/api-key) and set them as environment variables.\n",
"\n",
"You can find the full API documentation [here](https://e2b.dev/docs).\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You'll need to install `e2b` to get started:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install langchain e2b"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.tools import E2BDataAnalysisTool\n",
"from langchain.agents import initialize_agent, AgentType\n",
"\n",
"os.environ[\"E2B_API_KEY\"] = \"<E2B_API_KEY>\"\n",
"os.environ[\"OPENAI_API_KEY\"] = \"<OPENAI_API_KEY>\""
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"When creating an instance of the `E2BDataAnalysisTool`, you can pass callbacks to listen to the output of the sandbox. This is useful, for example, when creating more responsive UI. Especially with the combination of streaming output from LLMs."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# Artifacts are charts created by matplotlib when `plt.show()` is called\n",
"def save_artifact(artifact):\n",
" print(\"New matplotlib chart generated:\", artifact.name)\n",
" # Download the artifact as `bytes` and leave it up to the user to display them (on frontend, for example)\n",
" file = artifact.download()\n",
" basename = os.path.basename(artifact.name)\n",
"\n",
" # Save the chart to the `charts` directory\n",
" with open(f\"./charts/{basename}\", \"wb\") as f:\n",
" f.write(file)\n",
"\n",
"e2b_data_analysis_tool = E2BDataAnalysisTool(\n",
" # Pass environment variables to the sandbox\n",
" env_vars={\"MY_SECRET\": \"secret_value\"},\n",
" on_stdout=lambda stdout: print(\"stdout:\", stdout),\n",
" on_stderr=lambda stderr: print(\"stderr:\", stderr),\n",
" on_artifact=save_artifact,\n",
")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Upload an example CSV data file to the sandbox so we can analyze it with our agent. You can use for example [this file](https://storage.googleapis.com/e2b-examples/netflix.csv) about Netflix tv shows."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"name='netflix.csv' remote_path='/home/user/netflix.csv' description='Data about Netflix tv shows including their title, category, director, release date, casting, age rating, etc.'\n"
]
}
],
"source": [
"with open(\"./netflix.csv\") as f:\n",
" remote_path = e2b_data_analysis_tool.upload_file(\n",
" file=f,\n",
" description=\"Data about Netflix tv shows including their title, category, director, release date, casting, age rating, etc.\",\n",
" )\n",
" print(remote_path)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Create a `Tool` object and initialize the Langchain agent."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"\n",
"\n",
"tools = [e2b_data_analysis_tool.as_tool()]\n",
"\n",
"llm = ChatOpenAI(model=\"gpt-4\", temperature=0)\n",
"agent = initialize_agent(\n",
" tools, llm, agent=AgentType.OPENAI_FUNCTIONS, verbose=True, handle_parsing_errors=True\n",
")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can ask the agent questions about the CSV file we uploaded earlier."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m\n",
"Invoking: `e2b_data_analysis` with `{'python_code': \"import pandas as pd\\n\\n# Load the data\\nnetflix_data = pd.read_csv('/home/user/netflix.csv')\\n\\n# Convert the 'release_year' column to integer\\nnetflix_data['release_year'] = netflix_data['release_year'].astype(int)\\n\\n# Filter the data for movies released between 2000 and 2010\\nfiltered_data = netflix_data[(netflix_data['release_year'] >= 2000) & (netflix_data['release_year'] <= 2010) & (netflix_data['type'] == 'Movie')]\\n\\n# Remove rows where 'duration' is not available\\nfiltered_data = filtered_data[filtered_data['duration'].notna()]\\n\\n# Convert the 'duration' column to integer\\nfiltered_data['duration'] = filtered_data['duration'].str.replace(' min','').astype(int)\\n\\n# Get the top 5 longest movies\\nlongest_movies = filtered_data.nlargest(5, 'duration')\\n\\n# Create a bar chart\\nimport matplotlib.pyplot as plt\\n\\nplt.figure(figsize=(10,5))\\nplt.barh(longest_movies['title'], longest_movies['duration'], color='skyblue')\\nplt.xlabel('Duration (minutes)')\\nplt.title('Top 5 Longest Movies on Netflix (2000-2010)')\\nplt.gca().invert_yaxis()\\nplt.savefig('/home/user/longest_movies.png')\\n\\nlongest_movies[['title', 'duration']]\"}`\n",
"\n",
"\n",
"\u001b[0mstdout: title duration\n",
"stdout: 1019 Lagaan 224\n",
"stdout: 4573 Jodhaa Akbar 214\n",
"stdout: 2731 Kabhi Khushi Kabhie Gham 209\n",
"stdout: 2632 No Direction Home: Bob Dylan 208\n",
"stdout: 2126 What's Your Raashee? 203\n",
"\u001b[36;1m\u001b[1;3m{'stdout': \" title duration\\n1019 Lagaan 224\\n4573 Jodhaa Akbar 214\\n2731 Kabhi Khushi Kabhie Gham 209\\n2632 No Direction Home: Bob Dylan 208\\n2126 What's Your Raashee? 203\", 'stderr': ''}\u001b[0m\u001b[32;1m\u001b[1;3mThe 5 longest movies on Netflix released between 2000 and 2010 are:\n",
"\n",
"1. Lagaan - 224 minutes\n",
"2. Jodhaa Akbar - 214 minutes\n",
"3. Kabhi Khushi Kabhie Gham - 209 minutes\n",
"4. No Direction Home: Bob Dylan - 208 minutes\n",
"5. What's Your Raashee? - 203 minutes\n",
"\n",
"Here is the chart showing their lengths:\n",
"\n",
"![Longest Movies](sandbox:/home/user/longest_movies.png)\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"\"The 5 longest movies on Netflix released between 2000 and 2010 are:\\n\\n1. Lagaan - 224 minutes\\n2. Jodhaa Akbar - 214 minutes\\n3. Kabhi Khushi Kabhie Gham - 209 minutes\\n4. No Direction Home: Bob Dylan - 208 minutes\\n5. What's Your Raashee? - 203 minutes\\n\\nHere is the chart showing their lengths:\\n\\n![Longest Movies](sandbox:/home/user/longest_movies.png)\""
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent.run(\"What are the 5 longest movies on netflix released between 2000 and 2010? Create a chart with their lengths.\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"E2B also allows you to install both Python and system (via `apt`) packages dynamically during runtime like this:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"stdout: Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (2.1.1)\n",
"stdout: Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas) (2.8.2)\n",
"stdout: Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas) (2023.3.post1)\n",
"stdout: Requirement already satisfied: numpy>=1.22.4 in /usr/local/lib/python3.10/dist-packages (from pandas) (1.26.1)\n",
"stdout: Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas) (2023.3)\n",
"stdout: Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.2->pandas) (1.16.0)\n"
]
}
],
"source": [
"# Install Python package\n",
"e2b_data_analysis_tool.install_python_packages('pandas')"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Additionally, you can download any file from the sandbox like this:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"# The path is a remote path in the sandbox\n",
"files_in_bytes = e2b_data_analysis_tool.download_file('/home/user/netflix.csv')"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Lastly, you can run any shell command inside the sandbox via `run_command`."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"stderr: \n",
"stderr: WARNING: apt does not have a stable CLI interface. Use with caution in scripts.\n",
"stderr: \n",
"stdout: Hit:1 http://security.ubuntu.com/ubuntu jammy-security InRelease\n",
"stdout: Hit:2 http://archive.ubuntu.com/ubuntu jammy InRelease\n",
"stdout: Hit:3 http://archive.ubuntu.com/ubuntu jammy-updates InRelease\n",
"stdout: Hit:4 http://archive.ubuntu.com/ubuntu jammy-backports InRelease\n",
"stdout: Reading package lists...\n",
"stdout: Building dependency tree...\n",
"stdout: Reading state information...\n",
"stdout: All packages are up to date.\n",
"stdout: Reading package lists...\n",
"stdout: Building dependency tree...\n",
"stdout: Reading state information...\n",
"stdout: Suggested packages:\n",
"stdout: sqlite3-doc\n",
"stdout: The following NEW packages will be installed:\n",
"stdout: sqlite3\n",
"stdout: 0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.\n",
"stdout: Need to get 768 kB of archives.\n",
"stdout: After this operation, 1873 kB of additional disk space will be used.\n",
"stdout: Get:1 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 sqlite3 amd64 3.37.2-2ubuntu0.1 [768 kB]\n",
"stderr: debconf: delaying package configuration, since apt-utils is not installed\n",
"stdout: Fetched 768 kB in 0s (2258 kB/s)\n",
"stdout: Selecting previously unselected package sqlite3.\n",
"(Reading database ... 23999 files and directories currently installed.)\n",
"stdout: Preparing to unpack .../sqlite3_3.37.2-2ubuntu0.1_amd64.deb ...\n",
"stdout: Unpacking sqlite3 (3.37.2-2ubuntu0.1) ...\n",
"stdout: Setting up sqlite3 (3.37.2-2ubuntu0.1) ...\n",
"stdout: 3.37.2 2022-01-06 13:25:41 872ba256cbf61d9290b571c0e6d82a20c224ca3ad82971edc46b29818d5dalt1\n",
"version: 3.37.2 2022-01-06 13:25:41 872ba256cbf61d9290b571c0e6d82a20c224ca3ad82971edc46b29818d5dalt1\n",
"error: \n",
"exit code: 0\n"
]
}
],
"source": [
"# Install SQLite\n",
"e2b_data_analysis_tool.run_command(\"sudo apt update\")\n",
"e2b_data_analysis_tool.install_system_packages(\"sqlite3\")\n",
"\n",
"# Check the SQLite version\n",
"output = e2b_data_analysis_tool.run_command(\"sqlite3 --version\")\n",
"print(\"version: \", output[\"stdout\"])\n",
"print(\"error: \", output[\"stderr\"])\n",
"print(\"exit code: \", output[\"exit_code\"])"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"When your agent is finished, don't forget to close the sandbox"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"e2b_data_analysis_tool.close()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
9 changes: 9 additions & 0 deletions libs/langchain/langchain/tools/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -639,6 +639,12 @@ def _import_bearly_tool() -> Any:
return BearlyInterpreterTool


def _import_e2b_data_analysis() -> Any:
from langchain.tools.e2b_data_analysis.tool import E2BDataAnalysisTool

return E2BDataAnalysisTool


def __getattr__(name: str) -> Any:
if name == "AINAppOps":
return _import_ainetwork_app()
Expand Down Expand Up @@ -846,6 +852,8 @@ def __getattr__(name: str) -> Any:
return _import_zapier_tool_ZapierNLARunAction()
elif name == "BearlyInterpreterTool":
return _import_bearly_tool()
elif name == "E2BDataAnalysisTool":
return _import_e2b_data_analysis()
else:
raise AttributeError(f"Could not find: {name}")

Expand Down Expand Up @@ -958,4 +966,5 @@ def __getattr__(name: str) -> Any:
"tool",
"format_tool_to_openai_function",
"BearlyInterpreterTool",
"E2BDataAnalysisTool",
]
Empty file.
Loading

0 comments on commit 1f80949

Please sign in to comment.