# Intro to AI Agents

> *This notebook should work well with the **`conda_python3`** kernel in SageMaker Studio on ml.t3.medium instance*

---

In this notebook, we will introduce the concept of AI agents and how they can be used to solve problems. Unlike static LLM workflows, AI agents are dynamic and can adapt to a much wider range of problems. They are particularly well suited when we don't know the exact parameters of each user interaction. Some cases which may call for AI agents include:
- Goal oriented tasks such as researching a topic, analyzing data across many sources, or helping find the best product for a user.
- Tasks that may involve a wide variety of user inputs, such as document processing where there may be a wide variety of document formats and data that needs to be extracted.
- Tasks that may involve usage of various external tools or APIs, without a fixed sequential order of operations.

There are numerous frameworks available for implementing agents including [Amazon Bedrock](https://aws.amazon.com/bedrock/agents/), [CrewAI](https://www.crewai.com/), and [LangGraph](https://www.langchain.com/langgraph). The frameworks vary in their capabilities and complexity, but all provide a way to define and deploy agents that can interact with users and other systems.For this notebook, we will use a lightweight framework from Hugging Face called [smolagents](https://github.com/huggingface/smolagents). This framework provides a very simple and lightweight way to define agents that can interact with users and other systems, and is thus well suitable for quick prototyping and experimentation.


---

In [2]:
import sys
import os
module_path = "../.."
sys.path.append(os.path.abspath(module_path))
from utils.environment_validation import validate_environment, validate_model_access
validate_environment()

Validating base environment
Base environment validated successfully


> 🚨 **Caution** You may get an exception running the cell bellow. If that's the case, please restart the kernel by clicking **Kernell** -> **Restart Kernel**. Alternatively click the refresh icon on the notebook toolbar above

In [3]:
required_models = [
    "amazon.titan-embed-text-v1",
    "us.anthropic.claude-3-5-haiku-20241022-v1:0",
    "us.anthropic.claude-3-5-sonnet-20241022-v2:0",
    "us.amazon.nova-pro-v1:0",
]
validate_model_access(required_models)

## Simple Search Agent

Let's build a simple search agent that can search for information on the web. The agent will take a query from the user and return the top search results. We will use DuckDuckGo as the search engine for this agent.
smolagents offers two types of agents:
- [CodeAgent](https://huggingface.co/papers/2402.01030): Invokes tools via generated python code snippets. This provides flexibility in terms of how the agent interacts with external tools as it's able to expand on the capabilities of the tools by incorporating custom code.
- [ToolCallingAgent](https://huggingface.co/learn/agents-course/en/unit2/smolagents/tool_calling_agents): Invokes tools by generating a JSON output that contains the tool name and parameters and the invocation parameters. This approach is more rigid as the model is limited to the capabilities of the tools it calls however it may prove to be more efficient and secure in some cases as it does not involve running arbitrary code.

As as an example if we have a tool that can reterieve stock data for a given stock symbol. If the agent receives a task that involves retrieving stock data for say "AAPL", "MSFT" and "GOOGL", the CodeAgent will write a code snippet with a simple for loop that will retrieve the stock data for each symbol. While the ToolCallingAgent will generate a JSON output that contains the tool name and parameters and the invocation parameters for each stock symbol. Additionally, if the tasks requires invoking additional tools to analyze the stock data, the CodeAgent can potentially tackle this in a single code snippet, while the ToolCallingAgent will need to make multiple calls to the tools which could be less efficient.

In [4]:
from smolagents import CodeAgent, DuckDuckGoSearchTool, LiteLLMModel, tool
from typing import List
import pandas as pd
import yfinance as yf
import pandas_datareader as pdr
import statsmodels.api as sm
import numpy as np


In [5]:
MODEL_ID = "bedrock/us.anthropic.claude-3-5-haiku-20241022-v1:0"
model = LiteLLMModel(model_id=MODEL_ID, temperature=0)

In [6]:
# define as simple agent that has access to online
agent = CodeAgent(tools=[DuckDuckGoSearchTool()], model=model)
agent.run("How many years would it take an average person to watch all of the content on Amazon prime video?")

'43.15 years'

## Stock Analysis Agent
Let's look at a more complex example. We will build an agent that can analyze stock data. In addition to web search, this agent will have access to the following tools:
- `get_ticker_data`: This tool will retrieve stock data for a given stock symbol.
- `get_fred_data`: This tool will retrieve economic data from the Federal Reserve Economic Data (FRED) API.
- `run_ols_regression`: This tool will run an ordinary least squares regression which can be used to analyze the relationship between two variables

Defining tools is easy as we merely need to decorate a function with the `@tool` decorator. The function should contain a docstring in [Google-style](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html) format that describes the tool's inputs and outputs. Providing clear documentation is important as it informs the agent about the tool's capabilities and how to use it.

In [7]:
@tool
def get_ticker_data(
    tickers: List[str],
    start_date: str,
    end_date: str,
    metric: str = "all",
    sampling: str = "monthly",
) -> dict:
    """Downloads historical stock data from Yahoo Finance and returns it as a dictionary.

    Examples:
        >>> get_ticker_data(["AAPL"], "2023-01-01", "2023-12-31", "Close", "weekly")
        {"AAPL": [{"Date": "2023-01-06", "Close": 129.619995}, {"Date": "2023-01-13", "Close": 134.759995}, ...]}

        >>> get_ticker_data(["AAPL", "MSFT"], "2023-01-01", "2023-12-31", "all", "monthly")
        {"AAPL": [{"Date": "2023-01-31", "Open": 144.479996, "High": 147.229996, "Low": 141.320007, "Close": 144.289993, "Adj Close": 143.839996, "Volume": 77663600}, ...],
          "MSFT": [{"Date": "2023-01-31", "Open": 250.089996, "High": 256.25, "Low": 242.529999, "Close": 252.509995, "Adj Close": 251.873795, "Volume": 47146900}, ...]}

    Args:
        tickers: A list of stock ticker symbols (e.g., ["AAPL", "MSFT"]).
        start_date: The start date for the data (e.g., "2023-01-01").
        end_date: The end date for the data in YYYY-MM-DD format (e.g., "2023-12-31").
        metric:  If "all", returns all available data columns (Open, High, Low, Close, Volume).
            Otherwise, specifies a single metric to return (e.g., "Close"). Defaults to "all".
        sampling: The frequency of the data. Can be "daily", "weekly", or "monthly". Defaults to "monthly".

    Returns:
        dict: A dictionary where keys are ticker symbols and values are lists of historical data records.
             Each record is a dictionary containing 'Date' and the requested metrics.

    Raises:
        ValueError: If an invalid sampling frequency is provided.


    """

    df = yf.download(tickers, start=start_date, end=end_date)

    if metric != "all":
        df = df[metric]

    if sampling == "weekly":
        df = df.resample("W-SAT").last()
    elif sampling == "monthly":
        df = df.resample("ME").last()
    elif sampling == "quarterly":
        df = df.resample("QE").last()
    elif sampling == "daily":
        pass
    else:
        raise ValueError(
            "Invalid sampling frequency. Use 'daily', 'weekly', 'monthly', 'quarterly."
        )

    result = {}
    for ticker in tickers:
        if metric == "all":
            df_tick = df.loc[:, (slice(None), ticker)]
            df_tick.columns = df_tick.columns.droplevel("Ticker")
        else:
            df_tick = df.loc[:, ticker]
            df_tick = df_tick.to_frame(name=metric)
        df_tick = df_tick.reset_index()
        df_tick["Date"] = df_tick["Date"].dt.strftime("%Y-%m-%d")
        result[ticker] = df_tick.to_dict(orient="records")

    return result


@tool
def get_fred_data(
    series: str, start_date: str, end_date: str, sampling: str = "monthly"
) -> list[dict]:
    """Downloads data from the Federal Reserve Economic Data (FRED) database and returns it as dictionary.

    Examples:
        >>> get_fred_data("GDP", "2023-01-01", "2023-01-10")
        [{"Date": "2023-01-01", "GDP": 21.0}, {"Date": "2023-01-02", "GDP": 22.0}, ...]

    Args:
        series: The FRED series ID (e.g., "GDP").
        start_date: The start date for the data (e.g., "2023-01-01").
        end_date: The end date for the data in YYYY-MM-DD format (e.g., "2023-12-31").
        sampling: The frequency of the data. Can be "monthly", "quarterly", or "yearly". Defaults to "monthly".

    Returns:
        list: A list representing a list of dictionaries, where each dictionary contains 'Date' and the value of the FRED series.

    Raises:
        ValueError: If an invalid sampling frequency is provided.


    """
    df = pdr.data.DataReader(series, start=start_date, end=end_date, data_source="fred")

    if sampling == "monthly":
        df = df.resample("ME").last()
    elif sampling == "quarterly":
        df = df.resample("QE").last()
    elif sampling == "yearly":
        df = df.resample("YE").last()
    else:
        raise ValueError(
            "Invalid sampling frequency. Use 'monthly', 'quarterly', or 'yearly'."
        )

    df.reset_index(inplace=True)
    df["DATE"] = df["DATE"].dt.strftime("%Y-%m-%d")
    df.rename(columns={"DATE": "Date"}, inplace=True)

    result = df.to_dict(orient="records")

    return result


@tool
def run_ols_regression(y: List[float], X: List[float]) -> dict:
    """Runs a simple Ordinary Least Squares (OLS) regression.
    If you are using to compute beta, make sure the dates are aligned.

    Examples:
        >>> y = [1, 2, 3, 4, 5]
        >>> X = [2, 4, 5, 4, 5]
        >>> run_ols_regression(y, X)
        {"const": -0.4, "coef": 0.9}

    Args:
        y: The dependent variable.
        X: The independent variable(s).

    Returns:
        dict: A dictionary containing the constant and coefficient of the OLS regression.


    """
    X = np.array(X)
    y = np.array(y)
    X = sm.add_constant(X)
    model = sm.OLS(y, X)
    results = model.fit()
    params = results.params
    const, coef = params
    return {"const": const, "coef": coef}

In [8]:
# define the stock analysis agent
stock_analysis_agent = CodeAgent(
    tools=[get_ticker_data, get_fred_data, run_ols_regression, DuckDuckGoSearchTool()],
    model=model,
    name="stock_analyst_agent",
    description="A research agent that specializes in analyzing stock performance, computing technical indicators, and forecasting volatility.",
)

In [9]:
stock_analysis_agent.run("What immediate impact did Amazon's announcement of Alexa+ have on its stock price?")

YF.download() has changed argument auto_adjust default to True


[*********************100%***********************]  1 of 1 completed


"The Alexa+ announcement had an immediate positive impact on Amazon's stock price, with a 4.49% increase over two trading days, rising from $167.08 to $174.58."

In [10]:
stock_analysis_agent.run("How has Amazon's stock price changed since the Fed began lowering rates in 2024")

[*********************100%***********************]  1 of 1 completed


"\nAmazon Stock Price Analysis Following Fed Rate Cut (September 2024):\n- Initial Price: $176.25\n- Final Price: $187.97000122070312\n- Percentage Increase: 6.64964608266844%\n\nInterpretation: \nAmazon's stock price increased by 6.64964608266844% in the month following the Fed's first rate cut in 2024. \nThis positive movement suggests that the rate cut may have been perceived favorably by investors, potentially due to expectations of improved economic conditions and lower borrowing costs.\n"

In [11]:
stock_analysis_agent.run("What is the correlation between Amazon's stock price and the GDP of the United States?")

[*********************100%***********************]  1 of 1 completed


[*********************100%***********************]  1 of 1 completed


0.0009147731342036517

In [12]:
stock_analysis_agent.run("Compare and analyze the market beta for FAANG stocks since 2019")

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed


'FAANG Stocks Beta Analysis (2019-2023):\n\n1. Beta Rankings (Most to Least Volatile):\n   - Apple (AAPL): 1.3144158931139294 - Highest market sensitivity\n   - Netflix (NFLX): 1.2155703090641943 - High market volatility\n   - Meta (META): 1.1521975484519142 - Significant market correlation\n   - Amazon (AMZN): 1.1423087286946347 - Moderate market volatility\n   - Google (GOOGL): 1.0584460910454017 - Closest to market movement\n\n2. Overall Insights:\n   - Average FAANG Stock Beta: 1.1765877140740149\n   - All stocks show above-market volatility\n   - Indicates higher risk and potential for larger price swings\n\n3. Interpretation:\n   - Beta > 1 suggests stocks move more dramatically than the S&P 500\n   - Investors can expect amplified market movements for these stocks\n   - Potential for higher returns, but also higher potential losses\n'