# Third-party library example: cudf.pandas 🚀🐼 + langchain 🦜️🔗!

In this notebook, we'll demonstrate how `cudf.pandas` works with third-party libraries that depend on pandas.

We'll load some data into a DataFrame and use [langchain's Pandas integration](https://python.langchain.com/docs/integrations/toolkits/pandas) to answer questions about that data. We'll see that even though langchain doesn't know anything about cuDF, it will still automagically use the GPU to answer those questions much faster than with regular pandas!

⚠️ Note that at the time of writing, `langchain` is undergoing considerable changes (for example, see [here](https://github.com/langchain-ai/langchain/discussions/14243)). You may have to change some of the code in this notebook to make it work.

💰❗ Here we're using OpenAI's `gpt-4` model with langchain ([setup instructions](https://python.langchain.com/docs/integrations/llms/openai)). Note that at the time of writing, you need to buy credits from OpenAI to use this model via API.

----

In [None]:
!nvidia-smi

In [None]:
%load_ext cudf.pandas

In [None]:
import pandas as pd

from langchain.agents.agent_types import AgentType
from langchain.chat_models import ChatOpenAI
from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent
from langchain.llms import OpenAI

from langchain.cache import SQLiteCache
from langchain.globals import set_llm_cache
set_llm_cache(SQLiteCache(database_path=".langchain.db"))

In [None]:
from utils import download_taxi_data

download_taxi_data(n=12)

In [None]:
%%time
dfs = [
    pd.read_parquet("yellow_tripdata_2021-{:02d}.parquet".format(i))
    for i in range(1, 13)
]

In [None]:
%%time
df = pd.concat(dfs)

In [None]:
df.head()

We're not going to use all the data, so let's filter out just the columns we need:

In [None]:
df = df[["VendorID",
         "tpep_pickup_datetime",
         "tpep_dropoff_datetime",
         "passenger_count",
         "tip_amount"]]

OK, now let's set up an agent that will let us ask questions about the data in our DataFrame:

In [None]:
agent = create_pandas_dataframe_agent(
    ChatOpenAI(temperature=0, model="gpt-4"),
    df,
    agent_type=AgentType.OPENAI_FUNCTIONS,
    verbose=True,
    handle_parsing_errors=True,
)

Now, let's ask some questions!

In [None]:
%%time
result = agent.run("How many rows are there?")
print(result)

In [None]:
%%time
result = agent.run("Which vendor received the most tips?")
print(result)

Now, let's ask a more complicated question. Note that this part is finicky - sometimes it can return a response that simply contains the code that _should_ be run, but doesn't actually run the code. You may have to try a few times before you can get a useful response. And because we're caching responses in the file `.langchain.db`, you have to delete that file each time you want to try to get a new response.

In [None]:
%%cudf.pandas.profile
result = agent.run(
    """
    Which 30-minute, 1-hour, 2-hour, 5-hour and 24-hour windows have the most trips?
    Don't use any inplace operations please!
    """
)
print(result)