In [1]:
%load_ext autoreload
%autoreload 2

# LLM for Time Series Analysis

# Introduction

This notebook demonstrates how to create an LLM agent that can analyze pandas DataFrames and search for relevant information on the web. For this example, we will use a [DataFrame containing weather data](https://www.kaggle.com/datasets/sumanthvrao/daily-climate-time-series-data). It contains a simple time series of daily weather data, including temperature, humidity, and wind speed for a specific location. 

The agent will be able to answer questions and perform operations on a pandas DataFrame, as well as search for relevant information on news articles found in Reuters.

## Tech stack

- [Ollama](https://ollama.com/) - LLM server
- [LangChain](https://python.langchain.com/) - LLM framework
- [LangChain Anthropic](https://pypi.org/project/langchain-anthropic/) - Anthropic integration for LangChain
- [Pandas](https://pandas.pydata.org/) - DataFrame manipulation library
- [LangChain Community](https://python.langchain.com/api_reference/community/index.html) - Community support for LangChain
- [LangChain Experimental](https://python.langchain.com/api_reference/experimental/index.html) - Experimental features for LangChain
- [DuckDuckGo Search](https://python.langchain.com/docs/integrations/tools/duckduckgo_search) - Web search tool for LangChain


```mermaid
graph TD
    %% Main flow
    LLM --> Agent
    Python --> Agent
    WebSearch --> Agent
    Agent --> User

    %% Tech stack for LLM
    Anthropic[Anthropic] --> LLM
    LangChainAnthropic[LangChain Anthropic] --> LLM

    %% Tech stack for Agent
    LangChain[LangChain] --> Agent

    %% Tech stack for Python tool
    Pandas[Pandas] --> Python
    LCExperimental[LangChain Experimental] --> Python

    %% Tech stack for WebSearch tool
    LCCommunity[LangChain Community] --> WebSearch
    DuckDuckGo[DuckDuckGo Search] --> WebSearch

    %% Styles
    style LLM fill:#cce5ff,stroke:#004085,stroke-width:2px
    style Agent fill:#fff3cd,stroke:#856404,stroke-width:2px
    style Python fill:#d4edda,stroke:#155724,stroke-width:2px
    style WebSearch fill:#d4edda,stroke:#155724,stroke-width:2px
    style Anthropic fill:#f0f8ff,stroke:#004085
    style LangChainOllama fill:#f0f8ff,stroke:#004085
    style LangChain fill:#f9f2ec,stroke:#856404
    style Pandas fill:#e6ffed,stroke:#155724
    style LCCommunity fill:#e6ffed,stroke:#155724
    style LCExperimental fill:#e6ffed,stroke:#155724
    style DuckDuckGo fill:#ffe6e6,stroke:#660000

```

In [2]:
# disable warnings to keep the notebook clean
# you can remove this line if you want to see warnings, it might cause a lot of clutter in the output of certain cells
import warnings

warnings.filterwarnings("ignore")

# Dependencies installation

You can install the required dependencies by running the following cell.

In [3]:
!pip install -q pandas langchain langchain-community langchain-experimental langchain-anthropic duckduckgo-search


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


# Imports

below are the imports needed for this notebook. 

>Note: You will be asked to provide an API key for anthropic. To get the api key follow [this link](https://kpaste.infomaniak.com/abnZfEJqcr58JDcbspdZ6Y1xCSzHQcZ4#6RtAB95NgXimATi8SmW2nvNLMWfe19eQ264dMKHPyxej) and use the password written on the whiteboard. Don't hesitate to ask for assistance.

In [4]:
import getpass
import os
from typing import Annotated

import pandas as pd
from langchain.agents import (
    initialize_agent,  # used to create an agent that can use a list of tools
)
from langchain_anthropic import (
    ChatAnthropic,  # used to create a connection to the Anthropic server
)
from langchain_community.tools import (
    DuckDuckGoSearchRun,  # used to search for relevant information on the web with DuckDuckGo (no api key required)
)
from langchain_core.tools import tool  # used to create new tools for the agent
from langchain_experimental.tools.python.tool import (
    PythonAstREPLTool,  # used to create a Python tool that an LLM can use to perform operations
)

if "ANTHROPIC_API_KEY" not in os.environ:
    os.environ["ANTHROPIC_API_KEY"] = getpass.getpass("Enter your Anthropic API key: ")

# Data

Here we simply display the content of the dataframe. We will not perform any further operations on it ourself, but rather let the LLM agent do it for us.

In [5]:
df = pd.read_csv(
    "data/CSVs/DailyDelhiClimateTrain.csv",  # load the dataset from a CSV file. source: vhttps://www.kaggle.com/datasets/sumanthvrao/daily-climate-time-series-data
    parse_dates=["date"],
)

df.describe()

Unnamed: 0,date,meantemp,humidity,wind_speed,meanpressure
count,1462,1462.0,1462.0,1462.0,1462.0
mean,2015-01-01 12:00:00,25.495521,60.771702,6.802209,1011.104548
min,2013-01-01 00:00:00,6.0,13.428571,0.0,-3.041667
25%,2014-01-01 06:00:00,18.857143,50.375,3.475,1001.580357
50%,2015-01-01 12:00:00,27.714286,62.625,6.221667,1008.563492
75%,2016-01-01 18:00:00,31.305804,72.21875,9.238235,1014.944901
max,2017-01-01 00:00:00,38.714286,100.0,42.22,7679.333333
std,,7.348103,16.769652,4.561602,180.231668


# Prepare the agent

Below is the code to create the LLM, which is based on Ollama, the tools we will use, and the agent that will use these tools to answer questions about the DataFrame and search for relevant information on the web.

We use langchain to simplify the process of [creating the agent and integrating it with the tools](https://python.langchain.com/docs/how_to/tool_calling/), both the LLM, the tools, and the agent itself are defined using the `langchain` library.

In [10]:
## LLM ##

llm_ollama = ChatAnthropic(
    model_name="claude-3-7-sonnet-20250219",
    timeout=None,
    stop=None,
)  # create a connection to the Anthropic server with the specified model and temperature. the model must be compatible with tool calling

## Tools ##

python_tool = PythonAstREPLTool(
    name="Python REPL - DataFrame",
    locals={"df": df},
    description="""Useful for performing operations on the DataFrame df. usage: `df.head()`, `df.describe()`, ... any valid pandas operation.""",
    return_direct=False,
)


@tool
def reuters_search(
    query: Annotated[str, "The search query to find information on Reuters."],
) -> str:
    """Search Reuters for relevant information.

    Args:
        query (str): The search query to find relevant information on Reuters.

    Returns:
        str: The search results from Reuters.
    """
    transformed_query = f"site:reuters.com {query}"
    return DuckDuckGoSearchRun().invoke(transformed_query)


## Agent ##

custom_agent_df_ollama = initialize_agent(
    tools=[python_tool, reuters_search],  # list of tools the agent can use
    llm=llm_ollama,  # the LLM that the agent will use to reason and make decisions. This LLM must be compatible with tool calling
    verbose=True,  # set to True to see the agent's reasoning and actions
    allow_dangerous_code=True,  # set to True to allow the agent to execute code, this is required for the Python tool to work
    max_iterations=15,
    max_execution_time=60 * 2,
)

# Use the agent

The agent is now ready to be used. You can ask it questions about the DataFrame or request it to perform operations on the DataFrame. The agent will use the tools you provided to answer your questions and perform the requested operations.

keep in mind that those actions are not immediately executed, you are encourage to change the agent to `verbose=True` to see the agent's reasoning and actions if you want to ensure it is doing what you expect before reaching the final answer.

>Notice: to use a agent you simply need to call the `invoke` method on the agent object, passing the query as a string. The response will be a dictionary containing the input, output.

In [11]:
custom_agent_df_ollama.invoke(
    "what is the average temperature per month? What is the highest temperature?",
)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mI need to understand if there's a DataFrame already available or if I need to search for temperature data. Let me check the DataFrame first.

Action: Python REPL - DataFrame
Action Input: df.head()
[0m
Observation: [36;1m[1;3m        date   meantemp   humidity  wind_speed  meanpressure  month
0 2013-01-01  10.000000  84.500000    0.000000   1015.666667      1
1 2013-01-02   7.400000  92.000000    2.980000   1017.800000      1
2 2013-01-03   7.166667  87.000000    4.633333   1018.666667      1
3 2013-01-04   8.666667  71.333333    1.233333   1017.166667      1
4 2013-01-05   6.000000  86.833333    3.700000   1016.500000      1[0m
Thought:[32;1m[1;3mI'll analyze the temperature data in the provided DataFrame to find the average temperature per month and the highest temperature.

Action: Python REPL - DataFrame
Action Input: df.head()
[0m
Observation: [36;1m[1;3m        date   meantemp   humidity  wind_speed  meanpressu

{'input': 'what is the average temperature per month? What is the highest temperature?',
 'output': 'Agent stopped due to iteration limit or time limit.'}

You can access the Python REPL context of your agent. This can be useful if you need to perform some operations on the DataFrame after the agent has answered your question or performed the requested operation. 

As you can see in the example below, this can be done by accessing the `locals` attribute of the Python tool, this is a dictionary that contains the local variables of the Python REPL context. You can access the DataFrame by its name, in this case `df_mock`, and perform operations on it.


In [8]:
custom_agent_df_ollama.invoke(
    "create a mock dataframe named `df_mock` with 100 rows and the same columns as df.",
)
df_mock: pd.DataFrame = python_tool.locals.get("df_mock", pd.DataFrame())
df_mock.head()



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mI need to create a mock dataframe with 100 rows and the same columns as df. First, I should explore the existing df to understand its structure.

Action: Python REPL - DataFrame
Action Input: df.head()
[0m
Observation: [36;1m[1;3m        date   meantemp   humidity  wind_speed  meanpressure  month
0 2013-01-01  10.000000  84.500000    0.000000   1015.666667      1
1 2013-01-02   7.400000  92.000000    2.980000   1017.800000      1
2 2013-01-03   7.166667  87.000000    4.633333   1018.666667      1
3 2013-01-04   8.666667  71.333333    1.233333   1017.166667      1
4 2013-01-05   6.000000  86.833333    3.700000   1016.500000      1[0m
Thought:[32;1m[1;3mI need to create a mock dataframe with 100 rows and the same columns as df. First, I should explore the existing df to understand its structure.

Action: Python REPL - DataFrame
Action Input: df.head()
[0m
Observation: [36;1m[1;3m        date   meantemp   humidity  wind

Unnamed: 0,date,meantemp,humidity,wind_speed,meanpressure,month
0,2023-01-01,2.941866,57.579882,0.390621,1017.1423,1
1,2023-01-02,29.838953,91.492465,2.027464,1020.771764,1
2,2023-01-03,11.692355,82.353489,4.904171,1023.11958,1
3,2023-01-04,29.877555,56.502514,1.44726,1023.485232,1
4,2023-01-05,12.383886,84.963573,7.812249,1023.217653,1


Finally, you can combine multiple tools in a single agent, for example you can use the Python tool to perform operations on the DataFrame and the WebSearch tool to search for relevant information on the web. The agent will use the tools you provided to answer your questions and perform the requested operations.

In [9]:
custom_agent_df_ollama.invoke(
    "what is the date with the highest temperature in the dataframe ? can you find any news mentioning heatwave in India around that date?",
)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mI need to find the date with the highest temperature in the dataframe and then search for news about heatwaves in India around that date.

Action: Python REPL - DataFrame
Action Input: df.head()[0m
Observation: [36;1m[1;3m        date   meantemp   humidity  wind_speed  meanpressure  month
0 2013-01-01  10.000000  84.500000    0.000000   1015.666667      1
1 2013-01-02   7.400000  92.000000    2.980000   1017.800000      1
2 2013-01-03   7.166667  87.000000    4.633333   1018.666667      1
3 2013-01-04   8.666667  71.333333    1.233333   1017.166667      1
4 2013-01-05   6.000000  86.833333    3.700000   1016.500000      1[0m
Thought:[32;1m[1;3mI'll help you find the date with the highest temperature in the dataframe and then search for related news about heatwaves in India.

Thought: First, I need to find the date with the highest temperature in the dataframe.

Action: Python REPL - DataFrame
Action Input: df.head()[0m

{'input': 'what is the date with the highest temperature in the dataframe ? can you find any news mentioning heatwave in India around that date?',
 'output': 'Agent stopped due to iteration limit or time limit.'}