<a href="https://colab.research.google.com/github/nkkchem/SCA24_Hackathon/blob/main/Agent_Tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Large Language Model Agents in LangChain





This is a brief introduction for how to use `LangChain` for the building of Large Language Model agents.

LLM-Agents are essentially LLMs that are given tools specific to their desired task. This means that if you want an LLM to be able to configure a database, or access the internet, Agents are how you go about this task.

This tutorial will step you through a potential application of LLMs - an SQL Database manager. We will see how to load in a local LLM through `vllm` as well as use an API-based model such as gpt-4, gemini-pro, or claude.

This tutorial is adapted from a tutorial build by PineCone-IO at https://github.com/pinecone-io/examples

---

To start, let's install the packages we will be using for this example.

In [9]:
!pip install -qU torch==2.3.0 vllm==0.4.2 kaleido python-multipart typing-extensions langchain langchain_community langchain_core openai google-search-results wikipedia sqlalchemy langchain_openai

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.7/67.7 MB[0m [31m10.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m79.9/79.9 MB[0m [31m9.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m974.2/974.2 kB[0m [31m65.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.2/2.2 MB[0m [31m80.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m315.5/315.5 kB[0m [31m36.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m325.5/325.5 kB[0m [31m34.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m307.2/307.2 kB[0m [31m38.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━

## SQL Agent - Setup

Pretend we have an SQL database that we would like to set up an LLM to interact with. This type of integration can provide a natural language way of gaining insights from your database. This can assist non-experts in utilizing your groups data to form insights.

We will first need to create an SQL database, then set up an LLM to interface with the database.

⚠️ In this tutorial we will try and use local Google Colab GPU resources, but the free tier of colab prohibits larger models and may decrease the reliability of the LLM-Agents. For this situation, we will provide an additional loader for the OpenAI API models, but you will need an API-key to utilize this resoruce. ⚠️

### The Database

In [2]:
from sqlalchemy import MetaData

metadata_obj = MetaData()

## Build the MetaData container object that keeps together many different features of a database (or multiple databases) being described.

In [3]:
from sqlalchemy import Column, Integer, String, Table, Date, Float

observations = Table(
    "observations",
    metadata_obj,
    Column("obs_id", Integer, primary_key=True),
    Column("sensor_id", String(10), nullable=False),  # Identifier for the sensor
    Column("measurement", Float, nullable=False),   # Scientific measurement (e.g., temperature, pressure, etc.)
    Column("date", Date, nullable=False),
    Column("location", String(50)),                   # Where the measurement was taken
)

## Represent a table in a database.

These above 2 cells build the skeleton of our database. We can see from the names that we are constructing a scientific measurement database that contains sensors collecting unique information on a certain date and time.

In [4]:
from sqlalchemy import create_engine

engine = create_engine("sqlite:///:memory:")
metadata_obj.create_all(engine)

In [5]:
from datetime import datetime

data = [
    [1, 'TEMP001', 25.5, datetime(2023, 6, 1), 'Lab A'],
    [2, 'TEMP001', 24.8, datetime(2023, 6, 2), 'Lab A'],
    [3, 'PRESS002', 1013.2, datetime(2023, 6, 1), 'Field Site X'],
    [4, 'PRESS002', 1012.8, datetime(2023, 6, 2), 'Field Site X'],
    [5, 'HUMID003', 55.0, datetime(2023, 6, 1), 'Greenhouse 1'],
    [6, 'HUMID003', 60.2, datetime(2023, 6, 2), 'Greenhouse 1'],
]

In [6]:
from sqlalchemy import insert

def insert_observation(obs):
    """Inserts a single scientific observation into the database."""

    stmt = insert(observations).values(
        obs_id=obs[0],
        sensor_id=obs[1],
        measurement=obs[2],
        date=obs[3],
        location=obs[4]  # Added to insert location data
    )

    with engine.begin() as conn:
        conn.execute(stmt)

# Example usage (assuming you have your 'engine' and 'data' from before):
for obs in data:
    insert_observation(obs)

These above 3 cells have created our locally hosted SQL-Database along with filled out some sample information pertaining to the skeleton we previously created.

----

## The Large Language Models

Now that we have a toy database constructed and running in our notebook, it is time to create our LLM agent. We will first provide instructions for spinning up a `vllm`-based LLM which pulls in models from `HuggingFace`. Then we will show the equivalent steps for OpenAI.

In [None]:
from langchain_community.utilities import SQLDatabase
from langchain.chains import create_sql_query_chain
from langchain_community.llms import VLLM

local_llm = VLLM(
    model="microsoft/Phi-3-mini-4k-instruct", # Set some HF model name, we prefer smaller models since they have to fit in 15 GB of vram
    trust_remote_code=True,  # mandatory for hf models
    temperature=0,
    dtype="float16", # This is only required when using the T4 GPU on Colab
)

db = SQLDatabase(engine) # Add the SQL database into memory
sql_chain = create_sql_query_chain(llm, db) # Combine the loaded LLM with the SQL Database to form an LLM Chain (Not an agent)

# Test the chain, this chain has no thought process, it will just try and convert your question into an SQL Query
sql_chain.invoke({"question": "How many employees are there"})

In [7]:
from langchain_community.utilities import SQLDatabase
from langchain.chains import create_sql_query_chain
from langchain_openai import ChatOpenAI

openai_llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0) # You need to have you OPENAI_API Key set. If not add OPENAI_API_KEY to the secrets tab on the left side of the screen (click the key)
db = SQLDatabase(engine) # Add the SQL database into memory
sql_chain = create_sql_query_chain(llm, db)  # Combine the loaded LLM with the SQL Database to form an LLM Chain (Not yet an agent)


ModuleNotFoundError: No module named 'langchain_community'

----

## The Agent

Now that we have our `local_llm` or `openai_llm` formed into an sql_query_chain, we can add that chain to an `AgentExecutor` type which will provide the background thought process that the agent should be able to handle.

There are a few different pre-made Agent types that LangChain provides:

1. Zero Shot React
2. Conversational React
3. React Docstore

or variations of these, which can be found in the documentation [here](https://api.python.langchain.com/en/latest/agents/langchain.agents.agent_types.AgentType.html)


---

We are going to use the Zero Shot React agent type. This type of agent is used to perform 'zero shot' tasks on the input. That means that we will not have several, interdependent interactions but only one. In other words, this agent will have no memory.

In [3]:
from langchain.agents import create_sql_agent
from langchain.agents.agent_toolkits import SQLDatabaseToolkit
from langchain.agents.agent_types import AgentType

agent_executor = create_sql_agent(
    llm=llm,
    toolkit=SQLDatabaseToolkit(db=db, llm=llm), # Add the database and LLM
    verbose=True,
    agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    max_iterations=3,
    handle_parsing_errors=True,
)

ModuleNotFoundError: No module named 'langchain'

In [4]:
agent_executor.run("How many entries contain TEMP001?")

NameError: name 'agent_executor' is not defined