# LLM Compiler Agent Cookbook

<a href="https://colab.research.google.com/github/run-llama/llama-hub/blob/main/llama_hub/llama_packs/agents/llm_compiler/llm_compiler.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**NOTE**: Full credits to the [source repo for LLMCompiler](https://github.com/SqueezeAILab/LLMCompiler). A lot of our implementation was lifted from this repo (and adapted with LlamaIndex modules).

In this cookbook, we show how to use our LLMCompiler agent implementation for various settings. This includes using some simple function tools to do math, but also to answer multi-part queries for more advanced RAG use cases over multiple documents.

We see that the LLMCompilerAgent is capable of parallel function calling, giving results much more quickly than sequential execution through ReAct.


In [1]:
# Phoenix can display in real time the traces automatically
# collected from your LlamaIndex application.
import phoenix as px

# Look for a URL in the output to open the App in a browser.
px.launch_app()
# The App is initially empty, but as you proceed with the steps below,
# traces will appear automatically as your LlamaIndex application runs.

import llama_index

llama_index.set_global_handler("arize_phoenix")

# Run all of your LlamaIndex applications as usual and traces
# will be collected and displayed in Phoenix.

🌍 To view the Phoenix app in your browser, visit http://127.0.0.1:6006/
📺 To view the Phoenix app in a notebook, run `px.active_session().view()`
📖 For more information on how to use Phoenix, check out https://docs.arize.com/phoenix


In [2]:
import nest_asyncio

nest_asyncio.apply()

### Download Llama Pack

Here we download the Llama Pack.

**NOTE**: This only works if `skip_load=True`, because we are loading an entire directory of files instead of just a single file.

Instead of directly using the pack though, we will show how to directly import some of the underlying modules to build your custom agents.

In [3]:
# Option: if developing with the llama_hub package
# from llama_hub.llama_packs.agents.llm_compiler.step import LLMCompilerAgentWorker

# Option: download_llama_pack
from llama_index.llama_pack import download_llama_pack

download_llama_pack(
    "LLMCompilerAgentPack",
    "./agent_pack",
    skip_load=True,
    # leave the below line commented out if using the notebook on main
    # llama_hub_url="https://raw.githubusercontent.com/run-llama/llama-hub/jerry/add_llm_compiler_pack/llama_hub"
)
from agent_pack.step import LLMCompilerAgentWorker

## Test LLMCompiler Agent with Simple Functions

Here we test the LLMCompilerAgent with simple math functions (add, multiply) to illustrate how it works.

In [4]:
import json
from typing import Sequence, List

from llama_index.llms import OpenAI, ChatMessage
from llama_index.tools import BaseTool, FunctionTool

import nest_asyncio

nest_asyncio.apply()

### Define Functions

In [5]:
def multiply(a: int, b: int) -> int:
    """Multiple two integers and returns the result integer"""
    return a * b


multiply_tool = FunctionTool.from_defaults(fn=multiply)


def add(a: int, b: int) -> int:
    """Add two integers and returns the result integer"""
    return a + b


add_tool = FunctionTool.from_defaults(fn=add)

tools = [multiply_tool, add_tool]

In [6]:
multiply_tool.metadata.fn_schema_str

"{'title': 'multiply', 'type': 'object', 'properties': {'a': {'title': 'A', 'type': 'integer'}, 'b': {'title': 'B', 'type': 'integer'}}, 'required': ['a', 'b']}"

### Setup LLMCompiler Agent

We import the `LLMCompilerAgentWorker` and combine it with the `AgentRunner`.

In [9]:
from llama_index.agent import AgentRunner

In [10]:
llm = OpenAI(model="gpt-4")

In [11]:
callback_manager = llm.callback_manager

In [12]:
agent_worker = LLMCompilerAgentWorker.from_tools(
    tools, llm=llm, verbose=True, callback_manager=callback_manager
)
agent = AgentRunner(agent_worker, callback_manager=callback_manager)

### Test out some Queries

In [13]:
response = agent.chat("What is (121 * 3) + 42?")

> Running step 7fbd5304-8f67-46f6-882d-9c15ace75d80 for task dbbb4991-5347-4302-a358-6fe5d0705bdc.
> Step count: 0
[1;3;38;5;200m> Plan: 1. multiply(121, 3)
2. add($1, 42)
3. join()<END_OF_PLAN>
[0m[1;3;34mRan task: multiply. Observation: 363
[0m[1;3;34mRan task: add. Observation: 405
[0m[1;3;34mRan task: join. Observation: None
[0m[1;3;38;5;200m> Thought: The result of the operation is 405.
[0m[1;3;38;5;200m> Answer: 405
[0m

In [14]:
response

AgentChatResponse(response='405', sources=[], source_nodes=[])

In [15]:
agent.memory.get_all()

[ChatMessage(role=<MessageRole.USER: 'user'>, content='What is (121 * 3) + 42?', additional_kwargs={}),
 ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, content='405', additional_kwargs={})]

## Try out LLMCompiler for RAG

Now let's try out the LLMCompiler for RAG use cases. Specifically, we load a dataset of Wikipedia articles about various cities and ask questions over them.

### Setup Data

We use our `WikipediaReader` to load data for various cities.

In [7]:
from llama_index.readers import WikipediaReader

In [8]:
wiki_titles = ["Toronto", "Seattle", "Chicago", "Boston", "Miami"]

In [9]:
city_docs = {}
reader = WikipediaReader()
for wiki_title in wiki_titles:
    docs = reader.load_data(pages=[wiki_title])
    city_docs[wiki_title] = docs

### Setup LLM + Service Context

In [12]:
from llama_index import ServiceContext
from llama_index.llms import OpenAI
from llama_index.callbacks import CallbackManager

llm = OpenAI(temperature=0, model="gpt-4")
service_context = ServiceContext.from_defaults(llm=llm)
callback_manager = CallbackManager([])

### Define Toolset

In [13]:
from llama_index import load_index_from_storage, StorageContext
from llama_index.node_parser import SentenceSplitter
from llama_index.tools import QueryEngineTool, ToolMetadata
from llama_index import VectorStoreIndex
import os

node_parser = SentenceSplitter()

# Build agents dictionary
query_engine_tools = []

for idx, wiki_title in enumerate(wiki_titles):
    nodes = node_parser.get_nodes_from_documents(city_docs[wiki_title])

    if not os.path.exists(f"./data/{wiki_title}"):
        # build vector index
        vector_index = VectorStoreIndex(
            nodes, service_context=service_context, callback_manager=callback_manager
        )
        vector_index.storage_context.persist(persist_dir=f"./data/{wiki_title}")
    else:
        vector_index = load_index_from_storage(
            StorageContext.from_defaults(persist_dir=f"./data/{wiki_title}"),
            service_context=service_context,
            callback_manager=callback_manager,
        )
    # define query engines
    vector_query_engine = vector_index.as_query_engine()

    # define tools
    query_engine_tools.append(
        QueryEngineTool(
            query_engine=vector_query_engine,
            metadata=ToolMetadata(
                name=f"vector_tool_{wiki_title}",
                description=(
                    "Useful for questions related to specific aspects of"
                    f" {wiki_title} (e.g. the history, arts and culture,"
                    " sports, demographics, or more)."
                ),
            ),
        )
    )

### Setup LLMCompilerAgent

In [15]:
from llama_index.agent import AgentRunner
from llama_index.llms import OpenAI

llm = OpenAI(model="gpt-4")
agent_worker = LLMCompilerAgentWorker.from_tools(
    query_engine_tools,
    llm=llm,
    verbose=True,
    callback_manager=callback_manager,
)
agent = AgentRunner(agent_worker, callback_manager=callback_manager)

### Test out Queries

In [16]:
response = agent.chat(
    "Tell me about the demographics of Miami, and compare that with the demographics of Chicago?"
)
print(str(response))

> Running step 3aef03b1-9f79-4581-8859-680e0c589daa for task f76c6082-2f96-4cb9-8f28-b3d377f40874.
> Step count: 0
[1;3;38;5;200m> Plan: 1. vector_tool_Miami('demographics')
2. vector_tool_Chicago('demographics')
Thought: I can answer the question now.
3. join()<END_OF_PLAN>
[0m[1;3;34mRan task: vector_tool_Miami. Observation: Miami is the largest city in South Florida and the second-largest in Florida, with a population of 442,241 as of the 2020 census. The city is part of the Miami metropolitan area, which has over 6 million residents. Miami's population is diverse, with a significant Hispanic majority. In 2010, 34.4% of city residents were of Cuban origin, and 15.8% had a Central American background. The Non-Hispanic White population of Miami has been growing, making up 14.0% of the population in 2020. The Non-Hispanic Black population has been declining, making up 11.7% of the population in 2020. Other ethnic groups include those of Asian ancestry, who accounted for 1.0% of Miam

In [17]:
response = agent.chat(
    "Is the climate of Chicago or Seattle better during the wintertime?"
)
print(str(response))

> Running step f8fdf4cb-9dde-4aba-996d-edbcee53c4c2 for task 20df27d7-cc27-4311-bb57-b1a6f4ad5799.
> Step count: 0
[1;3;38;5;200m> Plan: 1. vector_tool_Chicago("climate during wintertime")
2. vector_tool_Seattle("climate during wintertime")
3. join()<END_OF_PLAN>
[0m[1;3;34mRan task: vector_tool_Seattle. Observation: During wintertime, Seattle experiences cool, wet conditions. Extreme cold temperatures, below about 15 °F or -9 °C, are rare due to the moderating influence of the adjacent Puget Sound, the greater Pacific Ocean, and Lake Washington. The city is often cloudy due to frequent storms and lows moving in from the Pacific Ocean, and it has many "rain days". However, the rainfall is often a light drizzle.
[0m[1;3;34mRan task: vector_tool_Chicago. Observation: During wintertime, the city experiences relatively cold and snowy conditions. Blizzards can occur, as they did in winter 2011. The normal winter high from December through March is about 36 °F (2 °C). January and Februa