# Query Rewriting RAG using Llama Agents

In this notebook, we setup two agent services: 

1. A query rewriting service
2. a RAG service 

Both of these services will be chained together in a simple constrained flow using our Pipeline Orchestrator.

After testing our `llama-agents` system, we then detail how to deploy it as a local set of servers.

In [None]:
import nest_asyncio

nest_asyncio.apply()

import os

os.environ["OPENAI_API_KEY"] = "sk-proj-..."
os.environ["LLAMA_CLOUD_API_KEY"] = "llx-..."

## Load Data

First, we load our data and parse it with LlamaParse.

If you don't have an API key, you can get one for free at [https://cloud.llamaindex.ai](https://cloud.llamaindex.ai).

In [None]:
!mkdir -p 'data/10k/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf' -O 'data/10k/uber_2021.pdf'

In [None]:
from llama_parse import LlamaParse

parser = LlamaParse(result_type="text")
docs = parser.load_data("data/10k/uber_2021.pdf")

Started parsing the file under job_id cac11eca-71fd-4456-93a1-1e35a71a8bcb


Next, we index are data and cache to disk.

In [None]:
import os
from llama_index.core import (
    StorageContext,
    VectorStoreIndex,
    load_index_from_storage,
)

if not os.path.exists("storage"):
    index = VectorStoreIndex.from_documents(docs)
    # save index to disk
    index.set_index_id("vector_index")
    index.storage_context.persist("./storage")
else:
    # rebuild storage context
    storage_context = StorageContext.from_defaults(persist_dir="storage")
    # load index
    index = load_index_from_storage(storage_context, index_id="vector_index")

## Setup Agents

We define a few custom agents: 
- a retriever agent that will return nodes based on a custom query string
- a query rewrite agent that rewrites using a HyDE prompt

The agents are defined using the `FnAgentWorker` -- the requirement here is to pass in a function that takes a state dict, performs some operation, and returns the modified state and a boolean indicating if another reasoning loop is needed.

The state has two special keys:
- `__task__` -- this contains the original input to the agent
- `__output__` -- once `is_done=True`, the output should hold the final result

In [None]:
# define router agent

from llama_index.core.agent import FnAgentWorker
from llama_index.core import PromptTemplate
from llama_index.core.query_pipeline import QueryPipeline
from llama_index.llms.openai import OpenAI
from typing import Any, Dict, Tuple

OPENAI_LLM = OpenAI(model="gpt-4o")

# use HyDE to hallucinate answer.
HYDE_PROMPT_STR = (
    "Please write a passage to answer the question\n"
    "Try to include as many key details as possible.\n"
    "\n"
    "\n"
    "{query_str}\n"
    "\n"
    "\n"
    'Passage:"""\n'
)
HYDE_PROMPT_TMPL = PromptTemplate(HYDE_PROMPT_STR)


def run_hdye_fn(state: Dict[str, Any]) -> Tuple[Dict[str, Any], bool]:
    """Run HyDE."""
    prompt_tmpl, llm, input_str = (
        state["prompt_tmpl"],
        state["llm"],
        state["__task__"].input,
    )
    qp = QueryPipeline(chain=[prompt_tmpl, llm])
    output = qp.run(query_str=input_str)

    state["__output__"] = str(output)

    # return state dictionary and also if agent is finished
    # for now, we don't loop, and just return True for is_done
    is_done = True
    return state, is_done

In [None]:
hyde_agent = FnAgentWorker(
    fn=run_hdye_fn, initial_state={"prompt_tmpl": HYDE_PROMPT_TMPL, "llm": OPENAI_LLM}
).as_agent()

Next, we define a similar agent to perform RAG:

In [None]:
# define RAG agent
from llama_index.core.query_engine import RetrieverQueryEngine


def run_rag_fn(state: Dict[str, Any]) -> Tuple[Dict[str, Any], bool]:
    """Run RAG."""
    retriever, llm, input_str = (
        state["retriever"],
        state["llm"],
        state["__task__"].input,
    )
    query_engine = RetrieverQueryEngine.from_args(retriever, llm=llm)
    response = query_engine.query(input_str)
    state["__output__"] = str(response)

    is_done = True
    return state, is_done


rag_agent = FnAgentWorker(
    fn=run_rag_fn, initial_state={"retriever": index.as_retriever(), "llm": OPENAI_LLM}
).as_agent()

## Setup Agent Services

Now, we are ready to build our `llama-agents` system. This includes
- A `AgentService` for each agent
- A `PipelineOrchestrator` defining the logic for defining the overall flow of tasks through the system
- A `SimpleMessageQueue` to facilitate message passing and communcation
- A `ControlPlaneServer` to act as the main control-plane for the system 

In [None]:
from llama_agents import (
    AgentService,
    ControlPlaneServer,
    SimpleMessageQueue,
    PipelineOrchestrator,
    ServiceComponent,
)

from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo")
message_queue = SimpleMessageQueue()

## Define Agent Services
query_rewrite_server = AgentService(
    agent=hyde_agent,
    message_queue=message_queue,
    description="Used to rewrite queries",
    service_name="query_rewrite_agent",
)
query_rewrite_server_c = ServiceComponent.from_service_definition(query_rewrite_server)

rag_agent_server = AgentService(
    agent=rag_agent, message_queue=message_queue, description="rag_agent"
)
rag_agent_server_c = ServiceComponent.from_service_definition(rag_agent_server)

# create our multi-agent framework components and orchestrator
pipeline = QueryPipeline(chain=[query_rewrite_server_c, rag_agent_server_c])
pipeline_orchestrator = PipelineOrchestrator(pipeline)

control_plane = ControlPlaneServer(
    message_queue=message_queue,
    orchestrator=pipeline_orchestrator,
)

## Launch agent 

Using a `LocalLauncher`, we can simulate single passes of tasks through our `llama-agents` system.

In [None]:
from llama_agents.launchers import LocalLauncher

## Define Launcher
launcher = LocalLauncher(
    [query_rewrite_server, rag_agent_server],
    control_plane,
    message_queue,
)

In [None]:
query_str = "What are the risk factors for Uber?"
result = launcher.launch_single(query_str)

In [None]:
print(result)

Uber, as a leading ride-sharing company, faces a multitude of risk factors that could impact its operations, financial performance, and overall market position. These risks can be broadly categorized into regulatory, operational, financial, competitive, and reputational risks.

**Regulatory Risks:** Uber operates in a highly regulated industry, and changes in laws and regulations can significantly affect its business model. Different countries and cities have varying regulations regarding ride-sharing services, including licensing requirements, fare controls, and driver background checks. Compliance with these regulations can be costly and complex. Additionally, there is always the risk of new regulations being introduced that could limit Uber's ability to operate or increase its operational costs.

**Operational Risks:** Uber relies heavily on its technology platform to connect drivers with passengers. Any technical failures, cybersecurity breaches, or data privacy issues could disrup

In [None]:
query_str = "What was Uber's revenue growth in 2021?"
result = launcher.launch_single(query_str)

In [None]:
print(result)

In 2021, Uber Technologies Inc. experienced significant revenue growth, reflecting a strong recovery from the pandemic-induced downturn of the previous year. The company's total revenue for the year amounted to $17.5 billion, marking a substantial increase from the $11.1 billion reported in 2020. This impressive growth was driven by a resurgence in demand for ride-hailing services as vaccination rates increased and economies reopened, coupled with the continued expansion of Uber Eats, the company's food delivery segment. Uber's mobility segment, which includes ride-hailing, saw a notable rebound, while the delivery segment maintained its momentum, contributing to the overall revenue surge. Additionally, Uber Freight, the company's logistics arm, also played a role in bolstering revenue. The combination of these factors resulted in a year-over-year revenue growth of approximately 57%, underscoring Uber's resilience and adaptability in a challenging market environment.


## Launch as a Service

With our `llama-agents` system tested and working, we can launch it as a service and interact with it using the `llama-agents monitor`.

**NOTE:** This code is best launched from a separate python script, outside of a notebook.

Also note that for launching as a server, we explicitly add a consumer for "human" messages (this is where final results are published to by default).

Python Code in `app.py`:
```python

######  <setup custom FnAgentWorkers, pipelines>  ######

from llama_agents import (
    AgentService,
    CallableMessageConsumer,
    ControlPlaneServer,
    SimpleMessageQueue,
    PipelineOrchestrator,
    ServiceComponent,
)

from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo")
message_queue = SimpleMessageQueue(
    host="127.0.0.1",
    port=8010,
)

## Define Agent Services
query_rewrite_server = AgentService(
    agent=hyde_agent,
    message_queue=message_queue,
    description="Used to rewrite queries",
    service_name="query_rewrite_agent",
    host="127.0.0.1",
    port=8011,
)
query_rewrite_server_c = ServiceComponent.from_service_definition(query_rewrite_server)

rag_agent_server = AgentService(
    agent=rag_agent,
    message_queue=message_queue,
    description="rag_agent",
    host="127.0.0.1",
    port=8012,
)
rag_agent_server_c = ServiceComponent.from_service_definition(rag_agent_server)

# create our multi-agent framework components
pipeline = QueryPipeline(chain=[query_rewrite_server_c, rag_agent_server_c])
pipeline_orchestrator = PipelineOrchestrator(pipeline)
control_plane = ControlPlaneServer(
    message_queue=message_queue,
    orchestrator=pipeline_orchestrator,
    host="127.0.0.1",
    port=8013,
)

# Additional human consumer
def handle_result(message: QueueMessage) -> None:
    print(f"Got result:", message.data)

human_consumer = CallableMessageConsumer(
    handler=handle_result, message_type="human"
)

from llama_agents.launchers import ServerLauncher

## Define Launcher
launcher = ServerLauncher(
    [query_rewrite_server, rag_agent_server],
    control_plane,
    message_queue,
    additional_consumers=[human_consumer],
)

launcher.launch_servers()
```

Launch the app:
```bash
python ./app.py
```

In another terminal, launch the Monitor:
```bash
llama-agents monitor --control-plane-url http://127.0.0.1:8013
```

Now, we can see our monitor, send in new tasks/queries, and observe the states of services.