# Retrieval Augmented Generation

![](https://www.dailydoseofds.com/content/images/2024/10/rag.gif)
Source: [Daily Dose of DS](https://www.dailydoseofds.com/a-crash-course-on-building-rag-systems-part-1-with-implementations/)

Overview:
- Introduction to RAG with LlamaIndex
- Data ingestion
  - PDF
  - Web pages
  - Code
- Data splitting
  - Token splitting
  - Sentence splitting
  - Structured data splitting
  - Semantic chunking
- Vectorization
  - Embeddings
  - Vector storage
- Retrieval
  - Keyword search
  - Vector search
  - Hybrid search
- Advanced methods
  - Query rewriting
  - Multi-hop retrieval

In [230]:
%load_ext rich
%load_ext autoreload
%autoreload 2


# Introduction to RAG with LlamaIndex

LlamaIndex is a library for working with large language models.
One of its main strengths is its ability to ingest documents into a vector index and use them to answer questions.
This is known as Retrieval Augmented Generation (RAG).

To start, we will use a low-code, high-level abstraction to build a basic PDF question-answering system.
We will read in PDFs, split them into chunks, embed them, and store them in a vector database.
Then, we will use an abstraction known as a `QueryEngine` that implements RAG to answer questions about the documents.

In [231]:
# If we're in colab, use userdata to get the OPENAI_API_KEY
import os
from rich import print
from pathlib import Path

try:
    from google.colab import userdata
    print("Colab detected - setting up environment")
    os.environ["OPENAI_API_KEY"] = userdata.get("OPENAI_API_KEY")
    %pip install llama-index \
        llama-index-readers-web \
        thefuzz \
        gradio \
        chromadb \
        llama-index-embeddings-huggingface \
        llama-index-vector-stores-chroma \
        llama-index-retrievers-bm25 \
        llama-index-llms-gemini \
        docling \
        llama-index-readers-docling \
        llama-index-node-parser-docling \
        PyStemmer
        
except:
    print("Not in colab - using local environment variables.")
    from dotenv import load_dotenv
    load_dotenv("../.env")


In [232]:
import os
import requests

# Create data directory if it doesn't exist
data_dir = "data"
if not os.path.exists(data_dir):
    os.makedirs(data_dir)

# Download the PDF file
pdf_url = "https://arxiv.org/pdf/2407.21783"
pdf_path = os.path.join(data_dir, "2407.21783.pdf")

if not os.path.exists(pdf_path):
    response = requests.get(pdf_url)
    with open(pdf_path, "wb") as f:
        f.write(response.content)
    print(f"Downloaded PDF to {pdf_path}")
else:
    print(f"PDF already exists at {pdf_path}")


The first thing we need to do is to load the data.
For general documents like PDFs, LlamaIndex provides a nice abstraction known as a `SimpleDirectoryReader` that can load data from a directory.
In the cell below, we use it to load the data from the `data` directory.

In [4]:
from llama_index.core.readers import SimpleDirectoryReader
documents = SimpleDirectoryReader(data_dir).load_data()

Let's take a look at the first document.
Most prominently, we get the text of the document.
By default, we also get a lot of useful information, like the page number file name, file path, type, size, etc.
When we load lots of documents, this type of information becomes important to keep track of.

In [None]:
print(documents[0])

Now that we've loaded the data, we need to vectorize it.
We will use a combination of an embedding model and a vector database to store the vectors.
In the cell below, we use a HuggingFace embedding model to embed the documents.
We also use torch to determine the device to use for the embedding model (`mps` for Mac GPUs, `cuda` for Nvidia GPUs, and `cpu` otherwise).

In [6]:
from llama_index.core import VectorStoreIndex
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from torch.backends.mps import is_available as is_mps_available
from torch.cuda import is_available as is_cuda_available

if is_mps_available():
    device = "mps"
elif is_cuda_available():
    device = "cuda"
else:
    device = "cpu"

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-large-en-v1.5", device=device)
index = VectorStoreIndex.from_documents(documents, embed_model=embed_model)

Finally, let's set up our RAG query engine.
If we want to perform simple question-answering, we can use the `as_query_engine` method.
If we want to perform chat with history, we can use the `as_chat_engine` method.
We can see both below ðŸ‘‡.


In [7]:
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings
llm = OpenAI(model="gpt-4o-mini")
Settings.llm = llm # set gpt-4o-mini as the default llm

query_engine = index.as_query_engine(llm=llm)
chat_engine = index.as_chat_engine(llm=llm)

Let's see how the query enine works.
We start by passing it a question, then it uses the retriever to find the most relevant documents.
Finally, it uses the LLM to answer the question.

In [8]:
response = query_engine.query("How many new Llama models models are mentioned in the paper?")

Below we see the response text.
But there's also some additional information that we can access, including the source nodes.
These are the nodes that the retriever used to answer the question.
We can see that the retriever found several nodes that are relevant to the question, then the LLM used at least one of them to answer the question.

In [None]:
print(response.response)

Taking a look at the first source node, we can see that it is a node that contains part of the document that is relevant to the question.
It also contains the `score`, which is the similarity between the question and the node.


In [None]:
print(response.source_nodes[0])

Now let's see if our chat enigine comes up with the same answer.

In [None]:
chat_response = chat_engine.chat("How many new Llama models models are mentioned in the paper?")
print(chat_response.response)
print(chat_response.source_nodes[0])

# Data ingestion

Data often comes in many different formats.
It may come in the form of a PDF, a web page, a code file, etc.
We may need some specific processing pipelines to extract the text from these documents, split them correctly, and vectorize them.

Luckily, LlamaIndex (and other libraries) provide lots of built-in and add-on tools to help you ingest almost any data type.
Instead of loading a PDF, let's load a web page instead.
We will use one of the classes provided by [`llama-index-readers-web`](https://llamahub.ai/l/readers/llama-index-readers-web?from=readers) to load data from a web page.

In this section, we will:
- Load a web page as Markdown
- Split it into chunks following the structured format of the Markdown
- Embed the chunks
- Store the chunks in a vector database
- Create a query engine from the vector database and use it to answer a question


In [12]:
from llama_index.readers.web import SimpleWebPageReader
from llama_index.core.node_parser import MarkdownNodeParser
from llama_index.core.ingestion import IngestionPipeline
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb


In [13]:
web_docs = SimpleWebPageReader(html_to_text=True).load_data(['https://en.wikipedia.org/wiki/Wikipedia'])

In [14]:
collection = chromadb.EphemeralClient().create_collection("wikipedia", get_or_create=True)
vector_store = ChromaVectorStore(collection)

In [15]:
pipeline = IngestionPipeline(
    transformations=[
        MarkdownNodeParser.from_defaults(),
        embed_model,
    ], 
    vector_store=vector_store
)

In [16]:
nodes = pipeline.run(documents=web_docs)

In [17]:
index = VectorStoreIndex.from_vector_store(vector_store, embed_model=embed_model)

In [None]:
llm = OpenAI(model="gpt-4o-mini")
query_engine = index.as_query_engine(llm=llm)
response = query_engine.query("How many languages there exactly? Quote the exact text as well.")
print(response.response)

In [None]:
# Use fuzzywuzzy to find the closest match in the source text
from thefuzz import fuzz, process
# Get the top matching line of text from the source_text_quote
top_match, match_score = process.extractOne(response.response, response.source_nodes[0].text.splitlines(), scorer=fuzz.ratio)
assert top_match in response.source_nodes[0].text
print(f"Quote from source: '{top_match}'")

## OCR

There are times where the PDF is not in a format that can be easily read into text.
In these cases, we will need to use optical character recognition (OCR) to convert the images to text.
There are many libraries and cloud services that can do this, but for this example, we will use the `docling` library since our example document is a PDF.
We will then ingest the text into LlamaIndex and use it to answer a question.

![](https://ds4sd.github.io/docling/assets/docling_processing.png)

In [20]:
from llama_index.readers.docling import DoclingReader
from llama_index.node_parser.docling import DoclingNodeParser
from llama_index.core.schema import Document
from llama_index.core import StorageContext

In [21]:
reader = DoclingReader(export_type=DoclingReader.ExportType.JSON)

In [None]:
docling_docs = reader.load_data(pdf_path)

In [None]:
len(docling_docs)

In [24]:
# We need to set the text template to "{content}" because the default is "{metadata}\n\n{content}",
# and LlamaIndex will try to embed the metadata as well. The metadata is not useful at serach time.
docling_docs[0].text_template
docling_docs[0].text_template = "{content}"

In [None]:
docling_node_parser = DoclingNodeParser()
docling_nodes = docling_node_parser.get_nodes_from_documents(docling_docs)
len(docling_nodes)

In [None]:
docling_nodes[0].metadata

In [27]:
index = VectorStoreIndex(nodes=docling_nodes, embed_model=embed_model)

In [None]:
query_engine = index.as_query_engine(llm=llm)
response = query_engine.query("How many new Llama models are mentioned in the paper?")
print(response.response)

In [None]:
print(response.source_nodes[1])

# Data splitting

Many times, it's impractical to embed the entire document, and expensive to feed the entire document to the LLM.
Instead, we can split the document into chunks, embed the chunks, and use a retrieval method to find the most relevant chunks.
There are naÃ¯ve methods that split texts into chunks of a specific length with some overlap;
there are methods that use the structure of the document to split it into sections (e.g. sections, figures, tables);
and there are more advanced methods that use semantic similarity to group the text into chunks.

Since the OCR'd text is just one very long Markdown string, we need to split it into chunks.
One nice way to do that is use the inherent structure of Markdown to split it into sections.
We do that here with LlamaIndex's `MarkdownNodeParser`.

In [None]:
docling_md_docs = DoclingReader(export_type=DoclingReader.ExportType.MARKDOWN).load_data(pdf_path)

In [31]:
from llama_index.core.node_parser import MarkdownNodeParser
from IPython.display import Markdown, display

In [32]:
md_nodes = MarkdownNodeParser.from_defaults().get_nodes_from_documents(documents=docling_md_docs)

In [None]:
len(md_nodes)

In [None]:
display(Markdown(md_nodes[9].text))


In [None]:
md_index = VectorStoreIndex(nodes=md_nodes, embed_model=embed_model)
md_query_engine = md_index.as_query_engine(llm=llm)
response = md_query_engine.query("How many llama3 models are there?")
print(response.response)
print(response.source_nodes[0].text)

---

# Retrieval

## Starting simple: bm25

BM25 is a simple retrieval method that uses the BM25 algorithm to score the relevance of each document to the query.
The BM25 algorithm is a probabilistic retrieval model that uses the term frequency and inverse document frequency of the query terms to score the relevance of each document.
You should be familiar with the basic idea of tf-idf from your NLP class - you can think of BM25 as a generalization of tf-idf that takes into account more factors.

In [36]:
from llama_index.retrievers.bm25 import BM25Retriever

In [37]:
bm25 = BM25Retriever.from_defaults(nodes=md_nodes)

In [None]:
for node in bm25.retrieve("How many llama3 models are there?"):
    print(f"Score: {node.score:.4f}\nText:\n{node.node.text[:500]}...")

## Dense retrieval (vector search)

BM25 is a simple and fast method that depends on word matching.
But if we want to do more complex retrieval, we can use dense retrieval.
We represent both our query and documents as vectors and use a similarity metric to find the most relevant documents.
This is what's been going on under the hood in the previous examples using `VectorStoreIndex`.

Since most of the mechanics are taken care for us under the hood, let's examine what goes on under the hood.

In [39]:
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
import matplotlib.pyplot as plt


In [40]:
sample_documents = [
    "My favorite type of dog is a golden retriever.",
    "I like to eat pizza with my friends.",
    "I like to go to the gym in the morning.",
    "I like to play basketball with my friends.",
]

embeddings = np.array(embed_model.get_text_embedding_batch(sample_documents))

query = "What do I like to do with friends?"
query_embedding = np.array(embed_model.get_text_embedding(query))

In [None]:
cosine_similarity(query_embedding.reshape(1, -1), embeddings).squeeze()

Once we have a `VectorStoreIndex`, we can use the `as_retriever` method to get a retriever object.
This uses dense retrieval under the hood, since it already has an embedding model as a part of the class.
Here, we use the `similarity_top_k` parameter to limit the number of results to 2 and show the cosine similarity score along with the beginning of the text.

In [None]:
query = "How many llama3 models are there?"
top_results = md_index.as_retriever(similarity_top_k=2).retrieve(query)
for result in top_results:
    print(f"Score: {result.score:.4f}\nText:\n{result.node.text[:500]}...")

## Hybrid search: query rewriting and reciprocal ranking

Sometimes, you may want several methods of searching over your data, then combining the results.
This is known as hybrid search.

Haveing multiple retrievers may not mean having separate objects - we may just have multiple queries.
In this example, we'll use an LLM to rewrite our query into multiple queries, then use a dense retriever to find the most relevant documents.
Finally, we'll use reciprocal ranking to re-rank the results.


In [43]:
from llama_index.core.retrievers import QueryFusionRetriever

In [44]:
dense_retriever = md_index.as_retriever(similarity_top_k=5)
hybrid_retriever = QueryFusionRetriever(
    [dense_retriever],
    num_queries=3,
    use_async=False,
    mode='reciprocal_rerank',
    verbose=True
)

In [None]:
results = hybrid_retriever.retrieve("How many llama3 models are there?")
for result in results:
    print(f"Score: {result.score:.4f}\nText:\n{result.node.text[:500]}...")

# Reranking

Often, retrieval only gives us a good first pass at finding relevant documents.
We can use re-ranking to improve the results.
There are several re-ranking methods, but in this case we'll use a cross-encoder to re-rank the results.
We'll also introduce a more low-level method for organizing LlamaIndex flows called workflows.

In [78]:
from llama_index.core.postprocessor import SentenceTransformerRerank
md_query_engine = md_index.as_query_engine(llm=llm, node_postprocessors=[
    SentenceTransformerRerank(top_n=5)
], retriever_top_k=20)

In [None]:
response = md_query_engine.query("How many llama3 models are there?")
print(response.response)

In [46]:
from llama_index.core.schema import NodeWithScore
from llama_index.core.llms import ChatMessage
from llama_index.core.workflow import (
    Workflow,
    Context,
    step,
    StartEvent,
    StopEvent,
    Event
)
from typing import List


Let's walk throught the code below step by step.
First, we define the events that will be passed between the steps.
This includes a retrieval result, a re-ranking result, and a prompt for the LLM.
The start and stop are already taken care of for us by `StartEvent` and `StopEvent`.

Next, we define our `RAGWithReRank` workflow.
We initialize it with an index and LLM, which we will use for retrieval and answering and the cross-encoder for re-ranking.
We also define exactly what to do for each step of the workflow.
It knows what to do because of the `@step` decorator and the types annotations of each method.

Finally, we can run the workflow with a query.
The workflow will automatically handle passing the events between the steps, so we don't have to worry about that.
We can observe that the results are just what we expect as we saw from the previous examples.
One advantage of this method is that it's much more flexible - you can include arbitrary code, loops, conditionals, etc.

For more detailed documentation on workflows, see [here](https://docs.llamaindex.ai/en/stable/understanding/workflows/).

In [60]:
# Define the events that will be passed between the steps
class RetrievalResult(Event):
    results: List[NodeWithScore]

class ReRankResult(Event):
    results: List[NodeWithScore]

class RagPrompt(Event):
    prompt: str

# Define the workflow
class RAGWithReRank(Workflow):
    # Initialize the workflow with an index and LLM
    def __init__(self, index: VectorStoreIndex = md_index, llm: OpenAI = OpenAI(model="gpt-4o-mini")):
        super().__init__(timeout=10, verbose=False)
        self.index = index
        self.llm = llm
        self.reranker = SentenceTransformerRerank(top_n=5)

    # The start method is called when the workflow is run.
    # This is the first step of the workflow that takes in a query and returns a retrieval result.
    # Since we want to use a course retriever, we use a top k of 20.
    @step
    async def start(self, ctx: Context, ev: StartEvent) -> RetrievalResult:
        query = ev.query
        await ctx.set('query', query)
        results = self.index.as_retriever(similarity_top_k=20).retrieve(query)
        return RetrievalResult(results=results)
    
    # This step takes the retrieval results and re-ranks them.
    # Notice that the similarity top k is set to 5, so we only keep the top 5 results.
    # This is smaller than the retrieval step since this is about refining a smaller number of results.
    @step
    async def rerank(self, ctx: Context, ev: RetrievalResult) -> ReRankResult:
        results = ev.results
        query = await ctx.get('query')
        reranked = self.reranker.postprocess_nodes(nodes=results, query_str=query)
        return ReRankResult(results=reranked)
    
    # This step creates a prompt for the LLM.
    # It does this by joining the re-ranked results into a single string and formatting it with the query.
    @step
    async def create_prompt(self, ctx: Context, ev: ReRankResult) -> RagPrompt:
        reranked_results = ev.results
        await ctx.set('reranked_results', reranked_results)
        query = await ctx.get('query')
        reranked_str = '\n\n'.join(i.text for i in reranked_results)
        prompt = f"""\
Here is some relevant context:
--------------------------------
{reranked_str}
--------------------------------
Based on the above information and not prior knowledge, please answer the question.
Question:
{query}
"""
        return RagPrompt(prompt=prompt)

    # This step takes the prompt and uses the LLM to answer the question.
    # It also takes the re-ranked results and attaches them to the response in case we want to see the nodes.
    @step
    async def answer(self, ctx: Context, ev: RagPrompt) -> StopEvent:
        prompt = ev.prompt
        ranked_results = await ctx.get('reranked_results')
        messages = [
            ChatMessage.from_str(prompt)
        ]
        answer = await self.llm.achat(messages)
        result = {
            "response": answer,
            "source_nodes": ranked_results
        }
        return StopEvent(result=result)


In [61]:
rag = RAGWithReRank()

In [62]:
response = await rag.run(query="How many llama3 models are there?")

In [None]:
response.keys()

In [None]:
from llama_index.utils.workflow import draw_all_possible_flows
from IPython.core.display import HTML, display
draw_all_possible_flows(RAGWithReRank, filename="flow.html")

# Exercise: Chat with PDF (30 minutes)

Your goal in this exercise is to create a Gradio interface for a question-answering system.
Your application should:
- Use the query engine created above to answer questions about the uploaded PDF
- Display the question and answer in the UI. If using QueryEngine, use a question and answer format. If using ChatEngine, use a chat format.

If you need a challenge:
- Use the `gr.File` component to allow the user to upload ANY pdf and ask question about it.


In [84]:
import gradio as gr
from tempfile import TemporaryDirectory

In [None]:
# In the ingest_documents function:
def ingest_documents(file_obj):
    """
    Process an uploaded PDF file and create a query engine for it.
    
    Args:
        file_obj: Gradio file upload object containing the PDF
        
    Returns:
        tuple: (chat_engine, cleared_file_upload, filename)
    """
    if file_obj is None:
        return None, None, ""
        
    with TemporaryDirectory() as temp_dir:
        file_path = os.path.join(temp_dir, 'tmp.pdf')
        # Get the file bytes from the Gradio upload object
        file_bytes = open(file_obj, "rb").read()
        
        with open(file_path, "wb") as f:
            f.write(file_bytes)
            
        documents = SimpleDirectoryReader(temp_dir).load_data()
        llm = OpenAI(model="gpt-4o-mini")
        index = VectorStoreIndex.from_documents(documents, embed_model=embed_model)
        chat_engine = index.as_chat_engine(llm=llm, node_postprocessors=[
            SentenceTransformerRerank(top_n=5)
        ], retriever_top_k=20)
        
    return chat_engine, None, Path(file_obj.name).name, []

def chat(message, history, chat_engine):
    history.append(
        {
            "role": "user",
            "content": message
        }
    )
    response = chat_engine.chat(message)
    history.append({
        "role": "assistant",
        "content": response.response
    })
    return history, None

# In the Blocks interface:
with gr.Blocks() as demo:
    chat_engine_state = gr.State()  # Renamed for clarity
    gr.Markdown("## RAG Demo: Question answering with a PDF")
    with gr.Row():
        with gr.Column(scale=1):   
            file_upload = gr.File(label="Upload a PDF file", file_types=[".pdf"], file_count="single")
            submit_button = gr.Button("Submit")
            pdf_name = gr.Textbox(label="You are asking about...")
        with gr.Column(scale=5):
            chatbot = gr.Chatbot(label="Chat with the uploaded PDF", type='messages')
            input = gr.Textbox(label="Enter a question", interactive=True)

    # Update to only store the chat_engine in the State
    submit_button.click(
        fn=ingest_documents, 
        inputs=file_upload, 
        outputs=[chat_engine_state, file_upload, pdf_name, chatbot]
    )
    input.submit(
        fn=chat, 
        inputs=[input, chatbot, chat_engine_state], 
        outputs=[chatbot, input]
    )

demo.launch(debug=False)

# Agents

## Function calling

LLMs are fine tuned to call functions with parameters based on the prompt.
This means you can provide information about one ore more functions you want to call and have the LLM decide which one to call.
This is powerful tool when working with LLMs because it can provide a lot of external data and skills not available to the LLM itself.

Let's go through a simple function calling example.

In [205]:
from llama_index.core.tools import FunctionTool
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
import numpy as np
from rich import print
import json

In the cell below, we define a function that rolls a number of dice and returns the results.
LlamaIndex provides a `FunctionTool` class that makes it easy to convert a function into a tool.

In [206]:
def roll_dice(num_dice:int) -> str:
    """
    Rolls a number of dice and returns the results.

    Args:
        num_dice: The number of dice to roll

    Returns:
        str: The results of the dice roll (comma separated if multiple dice are rolled)
    """
    return ', '.join(str(np.random.randint(1, 6)) for _ in range(num_dice))

roll_dice_tool = FunctionTool.from_defaults(fn=roll_dice, description="Useful when you need to roll dice.")

Let's ask our LLM to roll 5 dice and see what happens.

In [213]:
llm = OpenAI(model="gpt-4o-mini")
messages = [
    ChatMessage.from_str("Roll 5 dice")
]
response = llm.chat_with_tools(tools=[roll_dice_tool], messages=messages)

It doesn't actually call the function yet - it just says which function to call and which arguments to pass to it.

In [214]:
print(response)

Let's parse the response to get the function call and arguments.

In [215]:
function_call = response.raw.choices[0].message.tool_calls[0].function
print(function_call)

Finally, let's call the function with the arguments.

In [216]:
tool_output = roll_dice_tool.call(**json.loads(function_call.arguments))
print(tool_output)

Great! Now we have a basic idea of how function calling works under the hood.
Agents use function calling to call tools, but generally use them in loops combined with reasoning steps to accomplish complex tasks.
Next, we'll see how to use LlamaIndex's implementation of agents.

## Llama index agents

We've seen how to use function calling with an LLM, but it was kind of difficult to go from the LLM response to the function call.
We didn't even close the loop and pass the tool call result back to the LLM.
LlamaIndex's `FunctionCallingAgent` and `ReActAgent` classes make this easier.
Let's go through an example of using the `FunctionCallingAgent` to perform a more complex tasks.

In [217]:
from llama_index.core.agent import ReActAgent, FunctionCallingAgent
from llama_index.core.tools import FunctionTool
from llama_index.tools.yahoo_finance import YahooFinanceToolSpec
from llama_index.llms.openai import OpenAI
from yfinance import download as yf_download
from datetime import datetime

It's really useful to be able to define your own tools.
Luckliy, LlamaIndex makes this easy with the `FunctionTool` class.
Here, we'll define a tool that gets stock data from Yahoo Finance and returns it as a markdown table.

In [79]:
def get_stock_data(ticker:str, start_date:str, end_date:str) -> str:
    """
    Gets stock data using yfinance. All dates should be in YYYY-MM-DD format.

    Args:
        ticker: The ticker symbol of the stock to get data for
        start_date: The start date of the data to get
        end_date: The end date of the data to get

    Returns:
        str: A markdown table of the stock data
    """
    df = yf_download(ticker, start=start_date, end=end_date)
    return df.to_markdown()

get_stock_data_tool = FunctionTool.from_defaults(fn=get_stock_data, description="Useful when you want to pull trended data about a stock.")

Now, we can pass this tool to an agent to have it answer questions about stocks.

In [224]:
agent = FunctionCallingAgent.from_tools(tools=[get_stock_data_tool], llm=OpenAI(model="gpt-4o-mini"), verbose=True)
response = agent.chat(f"What was TSLA's high and low over the last 7 days?")

> Running step f4e4fdd4-dbc7-4f32-a09a-72dcd1065789. Step input: What was TSLA's high and low over the last 7 days?
Added user message to memory: What was TSLA's high and low over the last 7 days?


[*********************100%***********************]  1 of 1 completed

=== Calling Function ===
Calling function: get_stock_data with args: {"ticker": "TSLA", "start_date": "2023-10-16", "end_date": "2023-10-23"}
=== Function Output ===
| Date                |   ('Adj Close', 'TSLA') |   ('Close', 'TSLA') |   ('High', 'TSLA') |   ('Low', 'TSLA') |   ('Open', 'TSLA') |   ('Volume', 'TSLA') |
|:--------------------|------------------------:|--------------------:|-------------------:|------------------:|-------------------:|---------------------:|
| 2023-10-16 00:00:00 |                  253.92 |              253.92 |             255.4  |            248.48 |             250.05 |          8.89172e+07 |
| 2023-10-17 00:00:00 |                  254.85 |              254.85 |             257.18 |            247.08 |             250.1  |          9.35629e+07 |
| 2023-10-18 00:00:00 |                  242.68 |              242.68 |             254.63 |            242.08 |             252.7  |          1.25148e+08 |
| 2023-10-19 00:00:00 |                  220.11 |




=== LLM Response ===
Over the last 7 days, TSLA's stock data shows the following high and low prices:

- **High**: $257.18 (on October 17, 2023)
- **Low**: $210.42 (on October 20, 2023)


### Check for understanding

Is there anything strange about the response to this question above?
What do you think we could do to fix it?
Let's define another tool that addresses this problem.
Does this help? Is there a different way to solve the problem?

## Using pre-packaged tools

LlamaIndex has a number of tools that are pre-defined and can be installed.
In this case, we will use the `YahooFinanceToolSpec` to get additional information about a stock beyond just the trended data.
We will also add our `get_stock_data_tool` to the list of tools, since getting trended data is not a part of the `YahooFinanceToolSpec`.

In [225]:
yfinance_tools = YahooFinanceToolSpec().to_tool_list()
yfinance_tools.append(get_stock_data_tool)

In [226]:
agent = FunctionCallingAgent.from_tools(tools=yfinance_tools, llm=OpenAI(model="gpt-4o-mini"), verbose=True)
response = agent.chat(f"What was TSLA's high and low over the last 7 days? Today's date is {datetime.now().strftime('%Y-%m-%d')}.")

> Running step bbf82ada-f013-48d9-8476-99c32b16c17f. Step input: What was TSLA's high and low over the last 7 days? Today's date is 2024-12-03.
Added user message to memory: What was TSLA's high and low over the last 7 days? Today's date is 2024-12-03.


[*********************100%***********************]  1 of 1 completed

=== Calling Function ===
Calling function: get_stock_data with args: {"ticker": "TSLA", "start_date": "2024-11-26", "end_date": "2024-12-03"}
=== Function Output ===
| Date                |   ('Adj Close', 'TSLA') |   ('Close', 'TSLA') |   ('High', 'TSLA') |   ('Low', 'TSLA') |   ('Open', 'TSLA') |   ('Volume', 'TSLA') |
|:--------------------|------------------------:|--------------------:|-------------------:|------------------:|-------------------:|---------------------:|
| 2024-11-26 00:00:00 |                  338.23 |              338.23 |             346.96 |            335.66 |             341    |          6.22959e+07 |
| 2024-11-27 00:00:00 |                  332.89 |              332.89 |             342.55 |            326.59 |             341.8  |          5.78964e+07 |
| 2024-11-29 00:00:00 |                  345.16 |              345.16 |             345.45 |            334.65 |             336.08 |          3.71676e+07 |
| 2024-12-02 00:00:00 |                  357.09 |




=== LLM Response ===
Over the last 7 days, Tesla (TSLA) had the following high and low prices:

- **High:** $360.00 (on December 2, 2024)
- **Low:** $326.59 (on November 27, 2024)


In [227]:
response = agent.chat("What does TSLA make?")

> Running step d140eeff-7170-4b3c-b7b2-df47504a706a. Step input: What does TSLA make?
Added user message to memory: What does TSLA make?
=== Calling Function ===
Calling function: stock_basic_info with args: {"ticker": "TSLA"}
=== Function Output ===
Info: 
{'address1': '1 Tesla Road', 'city': 'Austin', 'state': 'TX', 'zip': '78725', 'country': 'United States', 'phone': '512 516 8177', 'website': 'https://www.tesla.com', 'industry': 'Auto Manufacturers', 'industryKey': 'auto-manufacturers', 'industryDisp': 'Auto Manufacturers', 'sector': 'Consumer Cyclical', 'sectorKey': 'consumer-cyclical', 'sectorDisp': 'Consumer Cyclical', 'longBusinessSummary': 'Tesla, Inc. designs, develops, manufactures, leases, and sells electric vehicles, and energy generation and storage systems in the United States, China, and internationally. The company operates in two segments, Automotive, and Energy Generation and Storage. The Automotive segment offers electric vehicles, as well as sells automotive regula

## Autogen

Autogen is a library for building agents that can interact with each other.
It's a little more low-level than LlamaIndex's agents, but is very flexible and a great choice for more complex workflows.
One feature is that each type of interaction is mediated by a specific agent.
Autogen enables interactions between different types of agents in order to orchestrate complex workflows.
We'll walk through several examples of how to use Autogen to build agents, but for more details go to the [documentation](https://microsoft.github.io/autogen)

First, let's create a simple agent that can converse with a user.

In [285]:
from autogen import ConversableAgent

In [286]:
llm_config = {
    "config_list": [
        {
            "model": "gpt-4o-mini",
            "api_key": os.environ["OPENAI_API_KEY"]
        }
    ]
}

In [287]:
agent = ConversableAgent(name="My Agent", llm_config=llm_config)
print(agent.generate_reply(messages=[{"role": "user", "content": "Tell me a joke about Python"}]))

[31m
>>>>>>>> USING AUTO REPLY...[0m


Ok, this wasn't much more than an LLM call.
Let's take it a step further and create two agents that can interact with each other.
Here is an example straight from the Autogen documentation:

In [240]:
cathy = ConversableAgent(
    "cathy",
    system_message="Your name is Cathy and you are a part of a duo of comedians.",
    llm_config={"config_list": [{"model": "gpt-4o-mini", "temperature": 0.9, "api_key": os.environ.get("OPENAI_API_KEY")}]},
    human_input_mode="NEVER",  # Never ask for human input.
)

joe = ConversableAgent(
    "joe",
    system_message="Your name is Joe and you are a part of a duo of comedians.",
    llm_config={"config_list": [{"model": "gpt-4o-mini", "temperature": 0.7, "api_key": os.environ.get("OPENAI_API_KEY")}]},
    human_input_mode="NEVER",  # Never ask for human input.
)

response = cathy.initiate_chat(joe, message="Tell me a joke about Python", max_turns=2)

[33mcathy[0m (to joe):

Tell me a joke about Python

--------------------------------------------------------------------------------
[33mjoe[0m (to cathy):

Why do Python programmers prefer dark mode?

Because light attracts bugs!

--------------------------------------------------------------------------------
[33mcathy[0m (to joe):

That's a good one, Joe! How about this: 

Why did the Python programmer break up with their partner?

Because they had too many "issues" and couldn't handle their "exceptions"!

--------------------------------------------------------------------------------
[33mjoe[0m (to cathy):

That's a great one, Cathy! Here's another for you:

Why do Python programmers love nature?

Because they enjoy working with lists and trees!

--------------------------------------------------------------------------------


Another nice thing about Autogen is its ability to execute code.
In the cell below, we define a code executor that will execute code on the local machine.
We give it a rather long system message to help it understand what to do.
We will also have a multi-agent interaction here, but one agent is a code writer and the other is a code executor.
The code writer agent will write code and the code executor agent will execute it.
Let's see how this works in practice.

In [269]:
from autogen.coding import LocalCommandLineCodeExecutor
executor = LocalCommandLineCodeExecutor(
    timeout=10
)

code_writer_system_message = """
You have been given coding capability to solve tasks using Python code.
In the following cases, suggest python code (in a python coding block) or shell script (in a sh coding block) for the user to execute.
    1. When you need to collect info, use the code to output the info you need, for example, browse or search the web, download/read a file, print the content of a webpage or a file, get the current date/time, check the operating system. After sufficient info is printed and the task is ready to be solved based on your language skill, you can solve the task by yourself.
    2. When you need to perform some task with code, use the code to perform the task and output the result. Finish the task smartly.
Solve the task step by step if you need to. If a plan is not provided, explain your plan first. Be clear which step uses code, and which step uses your language skill.
When using code, you must indicate the script type in the code block. The user cannot provide any other feedback or perform any other action beyond executing the code you suggest. The user can't modify your code. So do not suggest incomplete code which requires users to modify. Don't use a code block if it's not intended to be executed by the user.
If you want the user to save the code in a file before executing it, put # filename: <filename> inside the code block as the first line. Don't include multiple code blocks in one response. Do not ask users to copy and paste the result. Instead, use 'print' function for the output when relevant. Check the execution result returned by the user.
"""

code_writer_agent = ConversableAgent(
    "code_writer",
    system_message=code_writer_system_message,
    llm_config={"config_list": [{"model": "gpt-4o-mini", "api_key": os.environ["OPENAI_API_KEY"]}]},
    code_execution_config=False,  # Turn off code execution for this agent.
    max_consecutive_auto_reply=2,
    human_input_mode="NEVER",
)

code_executor_agent = ConversableAgent(
    name="code_executor_agent",
    llm_config=False,
    code_execution_config={
        "executor": executor,
    },
    human_input_mode="NEVER",
)  

response = code_executor_agent.initiate_chat(code_writer_agent, message="What are the files in the current directory?", max_turns=2)

[33mcode_executor_agent[0m (to code_writer):

What are the files in the current directory?

--------------------------------------------------------------------------------
[33mcode_writer[0m (to code_executor_agent):

# filename: list_files.py
```python
import os

# Get the list of files in the current directory
files = os.listdir('.')

# Print the list of files
print("Files in the current directory:")
for file in files:
    print(file)
```

--------------------------------------------------------------------------------
[31m
>>>>>>>> EXECUTING CODE BLOCK (inferred language is python)...[0m
[33mcode_executor_agent[0m (to code_writer):

exitcode: 0 (execution succeeded)
Code output: Files in the current directory:
bayesian_optimization.gif
.DS_Store
tmp_code_a9b2887cb4447ffc40d0267d5221988a.py
01_getting_started_with_llms_20241123.ipynb
02_rag_and_agents_solution.ipynb
output
flow.html
01_getting_started_with_llms.ipynb
01_getting_started_with_llms_solution.ipynb
create_fastapi

And, just like LlamaIndex agents, you can use function calling with Autogen as well.

In [281]:
from autogen import AssistantAgent, UserProxyAgent, register_function

assistant = AssistantAgent(name="assistant", llm_config=llm_config, human_input_mode="NEVER")
user_proxy = UserProxyAgent(name="user_proxy", llm_config=False, human_input_mode="NEVER")

def roll_dice(num_dice:int) -> str:
    """
    Rolls a number of dice and returns the results.
    """
    return ', '.join(str(np.random.randint(1, 6)) for _ in range(num_dice))

register_function(roll_dice, caller=assistant, executor=user_proxy, description="Useful when you need to roll dice.")

In [284]:
response = user_proxy.initiate_chat(assistant, message="Roll 3 dice", max_turns=2)

[33muser_proxy[0m (to assistant):

Roll 3 dice

--------------------------------------------------------------------------------
[33massistant[0m (to user_proxy):

[32m***** Suggested tool call (call_FCLl3UffiTOv7FDbZHjCAd1V): roll_dice *****[0m
Arguments: 
{"num_dice":3}
[32m**************************************************************************[0m

--------------------------------------------------------------------------------
[35m
>>>>>>>> EXECUTING FUNCTION roll_dice...[0m
[33muser_proxy[0m (to assistant):

[33muser_proxy[0m (to assistant):

[32m***** Response from calling tool (call_FCLl3UffiTOv7FDbZHjCAd1V) *****[0m
3, 2, 1
[32m**********************************************************************[0m

--------------------------------------------------------------------------------
[33massistant[0m (to user_proxy):

The result of rolling 3 dice is: 3, 2, 1.

TERMINATE

--------------------------------------------------------------------------------


# Exercise: Build your own agent