# Query Rewriting RAG using Llama Agents

In this notebook, we setup two services from query components: 

1. A query rewriting service
2. a RAG service 

Both of these services will be chained together in a simple constrained flow using our Pipeline Orchestrator.

After testing our `llama-agents` system, we then detail how to deploy it as a local set of servers.

In [None]:
import nest_asyncio

nest_asyncio.apply()

In [None]:
import os

os.environ["OPENAI_API_KEY"] = "sk-proj-..."
os.environ["LLAMA_CLOUD_API_KEY"] = "llx-..."

## Load Data

First, we load our data and parse it with LlamaParse.

If you don't have an API key, you can get one for free at [https://cloud.llamaindex.ai](https://cloud.llamaindex.ai).

In [None]:
!mkdir -p 'data/10k/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf' -O 'data/10k/uber_2021.pdf'

In [None]:
from llama_parse import LlamaParse

parser = LlamaParse(result_type="text")
docs = parser.load_data("data/10k/uber_2021.pdf")

Started parsing the file under job_id cac11eca-f530-4251-a839-f528bb42b029


Next, we index are data and cache to disk.

In [None]:
import os
from llama_index.core import (
    StorageContext,
    VectorStoreIndex,
    load_index_from_storage,
)

if not os.path.exists("storage"):
    index = VectorStoreIndex.from_documents(docs)
    # save index to disk
    index.set_index_id("vector_index")
    index.storage_context.persist("./storage")
else:
    # rebuild storage context
    storage_context = StorageContext.from_defaults(persist_dir="storage")
    # load index
    index = load_index_from_storage(storage_context, index_id="vector_index")

## Setup Agents

We define a few custom agents: 
- a retriever agent that will return nodes based on a custom query string
- a query rewrite agent that rewrites using a HyDE prompt

The agents are defined using the `FnAgentWorker` -- the requirement here is to pass in a function that takes a state dict, performs some operation, and returns the modified state and a boolean indicating if another reasoning loop is needed.

The state has two special keys:
- `__task__` -- this contains the original input to the agent
- `__output__` -- once `is_done=True`, the output should hold the final result

In [None]:
# define router agent

# from llama_index.core.agent import FnAgentWorker
from llama_index.core import PromptTemplate
from llama_index.core.query_pipeline import QueryPipeline, FnComponent, Link
from llama_index.llms.openai import OpenAI
from typing import Any, Dict, Tuple

OPENAI_LLM = OpenAI(model="gpt-4o")

# use HyDE to hallucinate answer.
HYDE_PROMPT_STR = (
    "Please write a passage to answer the question\n"
    "Try to include as many key details as possible.\n"
    "\n"
    "\n"
    "{query_str}\n"
    "\n"
    "\n"
    'Passage:"""\n'
)
HYDE_PROMPT_TMPL = PromptTemplate(HYDE_PROMPT_STR)


def run_hdye(input_str: str) -> str:
    """Run HyDE prompt."""
    qp = QueryPipeline(chain=[HYDE_PROMPT_TMPL, OPENAI_LLM])
    output = qp.run(query_str=input_str)
    return str(output)

In [None]:
hyde_component = FnComponent(fn=run_hdye)

Next, we define a similar agent to perform RAG:

In [None]:
# define RAG agent
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.response_synthesizers import TreeSummarize

retriever = index.as_retriever()
llm = OPENAI_LLM
summarizer = TreeSummarize(llm=llm)


def run_rag_fn(hyde_answer_str: str, input_str: str) -> str:
    """Run RAG."""
    retrieved_nodes = retriever.retrieve(hyde_answer_str)
    response = summarizer.synthesize(input_str, retrieved_nodes)
    return str(response)


rag_component = FnComponent(fn=run_rag_fn)

## Setup Agent Services

Now, we are ready to build our `llama-agents` system. This includes
- A `AgentService` for each agent
- A `PipelineOrchestrator` defining the logic for defining the overall flow of tasks through the system
- A `SimpleMessageQueue` to facilitate message passing and communcation
- A `ControlPlaneServer` to act as the main control-plane for the system 

In [None]:
from llama_agents import (
    AgentService,
    ControlPlaneServer,
    SimpleMessageQueue,
    PipelineOrchestrator,
    ServiceComponent,
    ComponentService,
)

from llama_index.llms.openai import OpenAI
from llama_index.core.query_pipeline import Link, InputComponent

llm = OpenAI(model="gpt-3.5-turbo")
message_queue = SimpleMessageQueue()

## Define Agent Services
query_rewrite_server = ComponentService(
    component=hyde_component,
    message_queue=message_queue,
    description="Used to rewrite queries",
    service_name="query_rewrite_component",
)

query_rewrite_server_c = ServiceComponent.from_component_service(query_rewrite_server)

rag_server = ComponentService(
    component=rag_component, message_queue=message_queue, description="rag_agent"
)
rag_server_c = ServiceComponent.from_component_service(rag_server)

# TODO: make more seamless from local

pipeline = QueryPipeline(
    module_dict={
        "input": InputComponent(),
        "query_rewrite_server_c": query_rewrite_server_c,
        "rag_server_c": rag_server_c,
    }
)
pipeline.add_links(
    [
        Link("input", "query_rewrite_server_c"),
        Link("input", "rag_server_c", dest_key="input_str"),
        Link("query_rewrite_server_c", "rag_server_c", dest_key="hyde_answer_str"),
    ]
)

pipeline_orchestrator = PipelineOrchestrator(pipeline)

control_plane = ControlPlaneServer(
    message_queue=message_queue,
    orchestrator=pipeline_orchestrator,
)

In [None]:
query_rewrite_server_c.input_keys

InputKeys(required_keys={'input_str'}, optional_keys=set())

## Launch agent 

Using a `LocalLauncher`, we can simulate single passes of tasks through our `llama-agents` system.

In [None]:
from llama_agents.launchers import LocalLauncher

## Define Launcher
launcher = LocalLauncher(
    [query_rewrite_server, rag_server],
    control_plane,
    message_queue,
)

In [None]:
query_str = "What are the risk factors for Uber?"
result = launcher.launch_single(query_str)

In [None]:
print(result)

The risk factors for Uber include:

1. Intense competition in the delivery and freight sectors.
2. Complex and evolving legal and regulatory environment.
3. Differing and sometimes conflicting laws and regulations across jurisdictions.
4. Regulatory scrutiny and licensing requirements in various regions.
5. Challenges in retaining and attracting high-quality personnel.
6. Potential security or data privacy breaches.
7. Cyberattacks that could harm reputation and operations.
8. Climate change risks and commitments.
9. Dependence on third parties for platform distribution and software.
10. Need for additional capital to support business growth.
11. Risks associated with identifying, acquiring, and integrating suitable businesses.
12. Potential limitations or modifications required in certain jurisdictions.
13. Extensive government regulation and oversight related to payment and financial services.
14. Risks related to data processing and privacy practices.
15. Intellectual property prote

In [None]:
query_str = "What was Uber's revenue growth in 2021?"
result = launcher.launch_single(query_str)

In [None]:
print(result)

Uber's revenue growth in 2021 was 57% year-over-year.


## Launch as a Service

With our `llama-agents` system tested and working, we can launch it as a service and interact with it using the `llama-agents monitor`.

**NOTE:** This code is best launched from a separate python script, outside of a notebook.

Also note that for launching as a server, we explicitly add a consumer for "human" messages (this is where final results are published to by default).

Python Code in `app.py`:
```python

######  <setup custom FnAgentWorkers, pipelines>  ######

from llama_agents import (
    AgentService,
    ControlPlaneServer,
    SimpleMessageQueue,
    PipelineOrchestrator,
    ServiceComponent,
    ComponentService
)

from llama_index.llms.openai import OpenAI
from llama_index.core.query_pipeline import Link, InputComponent

llm = OpenAI(model="gpt-3.5-turbo")
message_queue = SimpleMessageQueue()

## Define Agent Services
query_rewrite_server = ComponentService(
    component=hyde_component,
    message_queue=message_queue,
    description="Used to rewrite queries",
    service_name="query_rewrite_component",
)

query_rewrite_server_c = ServiceComponent.from_component_service(query_rewrite_server)

rag_server = ComponentService(
    component=rag_component, message_queue=message_queue, description="rag_agent"
)
rag_server_c = ServiceComponent.from_component_service(rag_server)

# TODO: make more seamless from local

pipeline = QueryPipeline(
    module_dict={
        "input": InputComponent(),
        "query_rewrite_server_c": query_rewrite_server_c,
        "rag_server_c": rag_server_c
    }
)
pipeline.add_links([
    Link("input", "query_rewrite_server_c"),
    Link("input", "rag_server_c", dest_key="input_str"),
    Link("query_rewrite_server_c", "rag_server_c", dest_key="hyde_answer_str"),
])

pipeline_orchestrator = PipelineOrchestrator(pipeline)

control_plane = ControlPlaneServer(
    message_queue=message_queue,
    orchestrator=pipeline_orchestrator,
)

from llama_agents.launchers import ServerLauncher

## Define Launcher
launcher = ServerLauncher(
    [query_rewrite_server, rag_agent_server],
    control_plane,
    message_queue,
    additional_consumers=[human_consumer],
)

launcher.launch_servers()
```

Launch the app:
```bash
python ./app.py
```

In another terminal, launch the Monitor:
```bash
llama-agents monitor --control-plane-url http://127.0.0.1:8013
```

Or, you can skip the monitor and use our client:

```python
from llama_agents import LlamaAgentsClient, AsyncLlamaAgentsClient

client = LlamaAgentsClient("http://127.0.0.1:8013")
task_id = client.create_task("What is the secret fact?")
# <Wait a few seconds>
# returns TaskResult or None if not finished
result = client.get_task_result(task_id)
```