# LLM Compiler Agent Cookbook

<a href="https://colab.research.google.com/github/run-llama/llama-hub/blob/main/llama_hub/llama_packs/agents/llm_compiler/llm_compiler.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**NOTE**: Full credits to the [source repo for LLMCompiler](https://github.com/SqueezeAILab/LLMCompiler). A lot of our implementation was lifted from this repo (and adapted with LlamaIndex modules).

In this cookbook, we show how to use our LLMCompiler agent implementation for various settings. This includes using some simple function tools to do math, but also to answer multi-part queries for more advanced RAG use cases over multiple documents.

We see that the LLMCompilerAgent is capable of parallel function calling, giving results much more quickly than sequential execution through ReAct.


In [8]:
!pip install phoenix llama_index

Collecting llama_index
  Downloading llama_index-0.12.22-py3-none-any.whl.metadata (12 kB)
Collecting llama-index-agent-openai<0.5.0,>=0.4.0 (from llama_index)
  Downloading llama_index_agent_openai-0.4.6-py3-none-any.whl.metadata (727 bytes)
Collecting llama-index-cli<0.5.0,>=0.4.1 (from llama_index)
  Downloading llama_index_cli-0.4.1-py3-none-any.whl.metadata (1.5 kB)
Collecting llama-index-core<0.13.0,>=0.12.22 (from llama_index)
  Downloading llama_index_core-0.12.22-py3-none-any.whl.metadata (2.5 kB)
Collecting llama-index-embeddings-openai<0.4.0,>=0.3.0 (from llama_index)
  Downloading llama_index_embeddings_openai-0.3.1-py3-none-any.whl.metadata (684 bytes)
Collecting llama-index-indices-managed-llama-cloud>=0.4.0 (from llama_index)
  Downloading llama_index_indices_managed_llama_cloud-0.6.8-py3-none-any.whl.metadata (3.6 kB)
Collecting llama-index-llms-openai<0.4.0,>=0.3.0 (from llama_index)
  Downloading llama_index_llms_openai-0.3.25-py3-none-any.whl.metadata (3.3 kB)
Collec

In [10]:
pip install --upgrade arize-phoenix


Collecting arize-phoenix
  Downloading arize_phoenix-8.8.0-py3-none-any.whl.metadata (23 kB)
Collecting aioitertools (from arize-phoenix)
  Downloading aioitertools-0.12.0-py3-none-any.whl.metadata (3.8 kB)
Collecting aiosqlite (from arize-phoenix)
  Downloading aiosqlite-0.21.0-py3-none-any.whl.metadata (4.3 kB)
Collecting alembic<2,>=1.3.0 (from arize-phoenix)
  Downloading alembic-1.14.1-py3-none-any.whl.metadata (7.4 kB)
Collecting arize-phoenix-client (from arize-phoenix)
  Downloading arize_phoenix_client-1.0.3-py3-none-any.whl.metadata (3.3 kB)
Collecting arize-phoenix-evals>=0.13.1 (from arize-phoenix)
  Downloading arize_phoenix_evals-0.20.3-py3-none-any.whl.metadata (4.4 kB)
Collecting arize-phoenix-otel>=0.5.1 (from arize-phoenix)
  Downloading arize_phoenix_otel-0.8.0-py3-none-any.whl.metadata (6.7 kB)
Collecting authlib (from arize-phoenix)
  Downloading authlib-1.5.1-py2.py3-none-any.whl.metadata (3.9 kB)
Collecting fastapi (from arize-phoenix)
  Downloading fastapi-0.115

In [12]:
!pip install --upgrade llama-index



In [15]:
!pip install llama-index-callbacks-arize-phoenix

Collecting llama-index-callbacks-arize-phoenix
  Downloading llama_index_callbacks_arize_phoenix-0.3.0-py3-none-any.whl.metadata (783 bytes)
Collecting openinference-instrumentation-llama-index>=3.0.0 (from llama-index-callbacks-arize-phoenix)
  Downloading openinference_instrumentation_llama_index-3.3.1-py3-none-any.whl.metadata (5.7 kB)
Collecting opentelemetry-instrumentation (from openinference-instrumentation-llama-index>=3.0.0->llama-index-callbacks-arize-phoenix)
  Downloading opentelemetry_instrumentation-0.51b0-py3-none-any.whl.metadata (6.3 kB)
Downloading llama_index_callbacks_arize_phoenix-0.3.0-py3-none-any.whl (2.2 kB)
Downloading openinference_instrumentation_llama_index-3.3.1-py3-none-any.whl (26 kB)
Downloading opentelemetry_instrumentation-0.51b0-py3-none-any.whl (30 kB)
Installing collected packages: opentelemetry-instrumentation, openinference-instrumentation-llama-index, llama-index-callbacks-arize-phoenix
Successfully installed llama-index-callbacks-arize-phoenix-

In [71]:
# Phoenix can display in real time the traces automatically
from llama_index.core import set_global_handler

# collected from your LlamaIndex application.
import phoenix as px

# Look for a URL in the output to open the App in a browser.
px.launch_app()
# The App is initially empty, but as you proceed with the steps below,
# traces will appear automatically as your LlamaIndex application runs.

import llama_index

set_global_handler("arize_phoenix")


# Run all of your LlamaIndex applications as usual and traces
# will be collected and displayed in Phoenix.



🌍 To view the Phoenix app in your browser, visit https://85g5bih45f5-496ff2e9c6d22116-6006-colab.googleusercontent.com/
📖 For more information on how to use Phoenix, check out https://docs.arize.com/phoenix


In [72]:
import nest_asyncio

nest_asyncio.apply()

### Download Llama Pack

Here we download the Llama Pack.

**NOTE**: This only works if `skip_load=True`, because we are loading an entire directory of files instead of just a single file.

Instead of directly using the pack though, we will show how to directly import some of the underlying modules to build your custom agents.

In [19]:
!pip install llama-hub

Collecting llama-hub
  Downloading llama_hub-0.0.79.post1-py3-none-any.whl.metadata (16 kB)
Collecting html2text (from llama-hub)
  Downloading html2text-2024.2.26.tar.gz (56 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.5/56.5 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting pyaml<24.0.0,>=23.9.7 (from llama-hub)
  Downloading pyaml-23.12.0-py3-none-any.whl.metadata (11 kB)
Collecting retrying (from llama-hub)
  Downloading retrying-1.3.4-py3-none-any.whl.metadata (6.9 kB)
Downloading llama_hub-0.0.79.post1-py3-none-any.whl (103.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m103.9/103.9 MB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pyaml-23.12.0-py3-none-any.whl (23 kB)
Downloading retrying-1.3.4-py3-none-any.whl (11 kB)
Building wheels for collected packages: html2text
  Building wheel for html2text (setup.py) ... [?25l[?25hdone
  Created wheel for html

In [23]:
pip install --upgrade llama-index llama-hub




In [73]:
# # Option: if developing with the llama_hub package
# # from llama_hub.llama_packs.agents.llm_compiler.step import LLMCompilerAgentWorker

# # Option: download_llama_pack
# from llama_index.llama_pack import download_llama_pack

# download_llama_pack(
#     "LLMCompilerAgentPack",
#     "./agent_pack",
#     skip_load=True,
#     # leave the below line commented out if using the notebook on main
#     # llama_hub_url="https://raw.githubusercontent.com/run-llama/llama-hub/jerry/add_llm_compiler_pack/llama_hub"
# )
# from agent_pack.step import LLMCompilerAgentWorker

from llama_index.core.llama_pack import download_llama_pack

# download and install dependencies
download_llama_pack("LLMCompilerAgentPack", "./llm_compiler_agent_pack")

## Test LLMCompiler Agent with Simple Functions

Here we test the LLMCompilerAgent with simple math functions (add, multiply) to illustrate how it works.

In [74]:
import json
from typing import Sequence, List

# from llama_index.llms import OpenAI, ChatMessage
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage

# from llama_index.tools import BaseTool, FunctionTool
from llama_index.core.tools import BaseTool, FunctionTool


import nest_asyncio

nest_asyncio.apply()

### Define Functions

In [75]:
def multiply(a: int, b: int) -> int:
    """Multiple two integers and returns the result integer"""
    return a * b


multiply_tool = FunctionTool.from_defaults(fn=multiply)


def add(a: int, b: int) -> int:
    """Add two integers and returns the result integer"""
    return a + b


add_tool = FunctionTool.from_defaults(fn=add)

tools = [multiply_tool, add_tool]

In [76]:
multiply_tool.metadata.fn_schema_str

'{"properties": {"a": {"title": "A", "type": "integer"}, "b": {"title": "B", "type": "integer"}}, "required": ["a", "b"], "type": "object"}'

### Setup LLMCompiler Agent

We import the `LLMCompilerAgentWorker` and combine it with the `AgentRunner`.

In [77]:
# from llama_index.agent import AgentRunner
from llama_index.core.agent.runner.base import AgentRunner

In [133]:
myapi = ""

llm = OpenAI(model="gpt-4", api_key=myapi)



In [134]:
callback_manager = llm.callback_manager

In [135]:
from llama_index.packs.agents_llm_compiler.step import LLMCompilerAgentWorker


agent_worker = LLMCompilerAgentWorker.from_tools(
    tools, llm=llm, verbose=True, callback_manager=callback_manager
)
agent = AgentRunner(agent_worker, callback_manager=callback_manager)

### Test out some Queries

In [136]:
response = agent.chat("Calculate (121 * 3) + 42")



> Running step 9e5a4751-d925-4291-a778-49097ea3e751 for task b0c344ba-3663-4b79-a73f-0423eead84cc.
> Step count: 0
[1;3;38;5;200m> Plan: 1. multiply(121, 3)
2. add($1, 42)
3. join()<END_OF_PLAN>
[0m[1;3;34mRan task: multiply. Observation: 363
[0m[1;3;34mRan task: add. Observation: 405
[0m[1;3;34mRan task: join. Observation: None
[0m[1;3;38;5;200m> Thought: The result of the calculation is 405.
[0m[1;3;38;5;200m> Answer: 405
[0m

In [82]:
response

AgentChatResponse(response='finish(405)', sources=[], source_nodes=[], is_dummy_stream=False, metadata=None)

In [83]:
agent.memory.get_all()

[ChatMessage(role=<MessageRole.USER: 'user'>, additional_kwargs={}, blocks=[TextBlock(block_type='text', text='Calculate (121 * 3) + 42')]),
 ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, additional_kwargs={}, blocks=[TextBlock(block_type='text', text='finish(405)')])]

## Try out LLMCompiler for RAG

Now let's try out the LLMCompiler for RAG use cases. Specifically, we load a dataset of Wikipedia articles about various cities and ask questions over them.

### Setup Data

We use our `WikipediaReader` to load data for various cities.

In [87]:
!pip install llama-index-readers-wikipedia


Collecting llama-index-readers-wikipedia
  Downloading llama_index_readers_wikipedia-0.3.0-py3-none-any.whl.metadata (1.5 kB)
Downloading llama_index_readers_wikipedia-0.3.0-py3-none-any.whl (2.7 kB)
Installing collected packages: llama-index-readers-wikipedia
Successfully installed llama-index-readers-wikipedia-0.3.0


In [95]:
!pip install wikipedia

Collecting wikipedia
  Downloading wikipedia-1.4.0.tar.gz (27 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: wikipedia
  Building wheel for wikipedia (setup.py) ... [?25l[?25hdone
  Created wheel for wikipedia: filename=wikipedia-1.4.0-py3-none-any.whl size=11679 sha256=3d2bb60e8e22188c4bd21324bbe7d6bbf84b870cf5ecf4e31d986297217c35c6
  Stored in directory: /root/.cache/pip/wheels/8f/ab/cb/45ccc40522d3a1c41e1d2ad53b8f33a62f394011ec38cd71c6
Successfully built wikipedia
Installing collected packages: wikipedia
Successfully installed wikipedia-1.4.0


In [92]:
from llama_index.readers.wikipedia import WikipediaReader


In [93]:
wiki_titles = ["Toronto", "Seattle", "Chicago", "Boston", "Miami"]

In [96]:
city_docs = {}
reader = WikipediaReader()
for wiki_title in wiki_titles:
    docs = reader.load_data(pages=[wiki_title])
    city_docs[wiki_title] = docs

### Setup LLM + Service Context
Migrationfrom ServiceContext to Settings in <a href="https://docs.llamaindex.ai/en/stable/module_guides/supporting_modules/service_context_migration/">LLamaIndex docs</a>

In [137]:
# from llama_index import ServiceContext
# from llama_index.llms import OpenAI
# from llama_index.callbacks import CallbackManager

from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.core.callbacks import CallbackManager
from llama_index.embeddings.openai import OpenAIEmbedding

llm = OpenAI(temperature=0, model="gpt-4", api_key=myapi)
# service_context = ServiceContext.from_defaults(llm=llm)
Settings.llm = llm
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small", api_key=myapi)
Settings.node_parser = SentenceSplitter(chunk_size=512, chunk_overlap=20)
Settings.num_output = 512
Settings.context_window = 3900


callback_manager = CallbackManager([])

### Define Toolset

In [138]:
# from llama_index.storage.storage_context import load_index_from_storage, StorageContext
from llama_index.core.storage import StorageContext
from llama_index.core.indices.loading import load_index_from_storage

# from llama_index.node_parser import SentenceSplitter
from llama_index.core.node_parser import SentenceSplitter
# from llama_index.tools import QueryEngineTool, ToolMetadata
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core import VectorStoreIndex
import os

node_parser = SentenceSplitter()

# Build agents dictionary
query_engine_tools = []

for idx, wiki_title in enumerate(wiki_titles):
    nodes = node_parser.get_nodes_from_documents(city_docs[wiki_title])

    if not os.path.exists(f"./data/{wiki_title}"):
        # build vector index
        vector_index = VectorStoreIndex(
            nodes, callback_manager=callback_manager
        )
        vector_index.storage_context.persist(persist_dir=f"./data/{wiki_title}")
    else:
        vector_index = load_index_from_storage(
            StorageContext.from_defaults(persist_dir=f"./data/{wiki_title}"),
            callback_manager=callback_manager,
        )
    # define query engines
    vector_query_engine = vector_index.as_query_engine()

    # define tools
    query_engine_tools.append(
        QueryEngineTool(
            query_engine=vector_query_engine,
            metadata=ToolMetadata(
                name=f"vector_tool_{wiki_title}",
                description=(
                    "Useful for questions related to specific aspects of"
                    f" {wiki_title} (e.g. the history, arts and culture,"
                    " sports, demographics, or more)."
                ),
            ),
        )
    )

### Setup LLMCompilerAgent

In [140]:
# from llama_index.agent import AgentRunner
from llama_index.core.agent.runner.base import AgentRunner
# from llama_index.llms import OpenAI
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-4", api_key=myapi)
agent_worker = LLMCompilerAgentWorker.from_tools(
    query_engine_tools,
    llm=llm,
    verbose=True,
    callback_manager=callback_manager,
)
agent = AgentRunner(agent_worker, callback_manager=callback_manager)

### Test out Queries

In [141]:
response = agent.chat(
    "Tell me about the demographics of Miami, and compare that with the demographics of Chicago?"
)
print(str(response))

> Running step 1449a3b2-c56a-4808-b034-e86eae211d5d for task 280484fa-4f20-472a-a202-6f543a8562f2.
> Step count: 0
[1;3;38;5;200m> Plan: 1. vector_tool_Miami("demographics")
2. vector_tool_Chicago("demographics")
3. join()<END_OF_PLAN>
[0m[1;3;34mRan task: vector_tool_Miami. Observation: Miami is the largest city in South Florida and the second-largest in Florida, with a population of over 442,241 as of the 2020 census. It is part of the Miami metropolitan area, which has over 6 million residents. Miami's population has seen various shifts over the years. In the first half of the 20th century, the city experienced rapid growth, but this slowed down in the latter half of the century. However, in the 2000s and 2010s, the population began to grow quickly again, particularly in areas like Downtown Miami, Edgewater, and Brickell.

In terms of ethnic composition, Miami's population was reported as 45.3% Hispanic, 32.9% non-Hispanic White, and 22.7% Black in 1970. Over the years, the city 

In [142]:
response = agent.chat(
    "Is the climate of Chicago or Seattle better during the wintertime?"
)
print(str(response))

> Running step 06dbb0e0-b928-44b1-ae7c-315bbfa054e5 for task 7d615852-5b3d-4d8b-a63f-bc40c9a05323.
> Step count: 0
[1;3;38;5;200m> Plan: 1. vector_tool_Chicago("climate during wintertime")
2. vector_tool_Seattle("climate during wintertime")
3. join()<END_OF_PLAN>
[0m[1;3;34mRan task: vector_tool_Seattle. Observation: During wintertime, Seattle experiences cool, wet weather. Extreme cold temperatures, below about 15 °F or -9 °C, are rare due to the moderating influence of the adjacent Puget Sound, the greater Pacific Ocean, and Lake Washington. The city is often cloudy due to frequent storms and lows moving in from the Pacific Ocean.
[0m[1;3;34mRan task: vector_tool_Chicago. Observation: During wintertime, the city experiences relatively cold and snowy conditions. Blizzards can occur, as they did in winter 2011. There are many sunny but cold days. The normal winter high from December through March is about 36 °F (2 °C). January and February are the coldest months. A polar vortex in