# Query Pipelines Llamaindex

LlamaIndex provides a declarative query API that allows you to chain together different modules in order to orchestrate simple-to-advanced workflows over your data.

This is centered around our QueryPipeline abstraction. Load in a variety of modules (from LLMs to prompts to retrievers to other pipelines), connect them all together into a sequential chain or DAG, and run it end2end.

So what are the advantages of QueryPipeline?

- Express common workflows with fewer lines of code/boilerplate
- Greater readability
- Greater parity / better integration points with common low-code / no-code solutions (e.g. LangFlow)
- [In the future] A declarative interface allows easy serializability of pipeline components, providing portability of pipelines/easier deployment to different systems.

### Importing the libraries

In [4]:
from llama_index.query_pipeline.query import QueryPipeline
from llama_index.llms import OpenAI
from llama_index.prompts import PromptTemplate
from llama_index.storage import StorageContext
from llama_index import (
    VectorStoreIndex,
    ServiceContext,
    SimpleDirectoryReader,
    load_index_from_storage,
)

Load the API Keys

In [6]:
from dotenv import load_dotenv

# Load the enviroment variables
load_dotenv()

True

### Load the data

In [2]:
reader = SimpleDirectoryReader(
    input_files=["./data/paul_graham_essay.txt"]
)
docs = reader.load_data()
print(f"Loaded {len(docs)} docs")

Loaded 1 docs


### Create or rebuild storage and the index

In [5]:
import os

if not os.path.exists("storage"):
    index = VectorStoreIndex.from_documents(docs)
    # save index to disk
    index.set_index_id("vector_index")
    index.storage_context.persist("./storage")
else:
    # rebuild storage context
    storage_context = StorageContext.from_defaults(persist_dir="storage")
    # load index
    index = load_index_from_storage(storage_context, index_id="vector_index")

## Chain Together Prompt and LLM

In this section we show a super simple workflow of chaining together a prompt with LLM.

We simply define chain on initialization. This is a special case of a query pipeline where the components are purely sequential, and we automatically convert outputs into the right format for the next inputs.

#### Defining a Sequential Chain

Some simple pipelines are purely linear in nature - the output of the previous module directly goes into the input of the next module.

Some examples:

- prompt -> LLM -> output parsing
- prompt -> LLM -> prompt -> LLM
- retriever -> response synthesizer

In [7]:
# try chaining basic prompts
prompt_str = "Please generate related movies to {movie_name}"
prompt_tmpl = PromptTemplate(prompt_str)
llm = OpenAI(model="gpt-3.5-turbo")
# Define the query pipeline
p = QueryPipeline(chain=[prompt_tmpl, llm], verbose=True)

In [8]:
output = p.run(movie_name="The Departed")
print(str(output))

[1;3;38;2;155;135;227m> Running module a3aea295-140f-4182-99e1-d68a8beee6f0 with input: 
movie_name: The Departed

[0m[1;3;38;2;155;135;227m> Running module f785a2ed-cbda-4bc6-85a9-51f04986f0b8 with input: 
messages: Please generate related movies to The Departed

[0massistant: 1. Infernal Affairs (2002) - The Departed is actually a remake of this Hong Kong crime thriller, which follows a similar storyline of undercover cops infiltrating a criminal organization.

2. The Town (2010) - Directed by Ben Affleck, this crime drama revolves around a group of bank robbers in Boston and the FBI agent determined to bring them down.

3. Heat (1995) - Directed by Michael Mann, this classic crime film features an intense cat-and-mouse game between a skilled detective and a professional thief in Los Angeles.

4. American Gangster (2007) - Based on a true story, this crime drama stars Denzel Washington as a Harlem drug lord and Russell Crowe as the detective determined to bring him to justice.

5

## Chain multiple Prompts with Streaming

The query pipelines have LLM streaming support (simply do as_query_component(streaming=True)). Intermediate outputs will get autoconverted, and the final output can be a streaming output. 

In [9]:
prompt_str = "Please generate related movies to {movie_name}"
prompt_tmpl = PromptTemplate(prompt_str)
# let's add some subsequent prompts for fun
prompt_str2 = """\
Here's some text:

{text}

Can you rewrite this with a summary of each movie?
"""
prompt_tmpl2 = PromptTemplate(prompt_str2)
llm = OpenAI(model="gpt-3.5-turbo")
llm_c = llm.as_query_component(streaming=True)

p = QueryPipeline(
    chain=[prompt_tmpl, llm_c, prompt_tmpl2, llm_c], verbose=True
)
# p = QueryPipeline(chain=[prompt_tmpl, llm_c], verbose=True)


In [10]:
output = p.run(movie_name="The Dark Knight")
for o in output:
    print(o.delta, end="")

[1;3;38;2;155;135;227m> Running module 75059c2a-0656-48ab-a6f1-9c6b21705828 with input: 
movie_name: The Dark Knight

[0m[1;3;38;2;155;135;227m> Running module 68f4a780-4fe7-485c-8650-48878708a24d with input: 
messages: Please generate related movies to The Dark Knight

[0m[1;3;38;2;155;135;227m> Running module 4df40a5f-b8f2-4c5c-b2bc-05e21242cfec with input: 
text: <generator object llm_chat_callback.<locals>.wrap.<locals>.wrapped_llm_chat.<locals>.wrapped_gen at 0x0000025C53B82020>

[0m[1;3;38;2;155;135;227m> Running module 30e10c95-c374-4d9e-b893-f7c1ce3d003d with input: 
messages: Here's some text:

1. Batman Begins (2005)
2. The Dark Knight Rises (2012)
3. Batman v Superman: Dawn of Justice (2016)
4. Man of Steel (2013)
5. The Avengers (2012)
6. Iron Man (2008)
7. Captain Amer...

[0m1. Batman Begins (2005): A young Bruce Wayne becomes Batman to protect Gotham City from the League of Shadows and their leader, Ra's al Ghul.
2. The Dark Knight Rises (2012): Batman returns to

## Chain Together Query Rewriting Workflow (prompts + LLM) with Retrieval

Here we try a slightly more complex workflow where we send the input through two prompts before initiating retrieval.

- Generate question about given topic.

- Hallucinate answer given question, for better retrieval.

Since each prompt only takes in one input, note that the QueryPipeline will automatically chain LLM outputs into the prompt and then into the LLM.

In [11]:
from llama_index.postprocessor import CohereRerank

# First prompt: generate question regarding topic
prompt_str1 = "Please generate a concise question about Paul Graham's life regarding the following topic {topic}"
prompt_tmpl1 = PromptTemplate(prompt_str1)
# Second prompt: use HyDE to hallucinate answer.
prompt_str2 = (
    "Please write a passage to answer the question\n"
    "Try to include as many key details as possible.\n"
    "\n"
    "\n"
    "{query_str}\n"
    "\n"
    "\n"
    'Passage:"""\n'
)
prompt_tmpl2 = PromptTemplate(prompt_str2)

llm = OpenAI(model="gpt-3.5-turbo")
retriever = index.as_retriever(similarity_top_k=5)
p = QueryPipeline(
    chain=[prompt_tmpl1, llm, prompt_tmpl2, llm, retriever], verbose=True
)

In [12]:
nodes = p.run(topic="college")
len(nodes)

[1;3;38;2;155;135;227m> Running module 6adb7b56-54f3-4564-a6a7-8d512ee8ae34 with input: 
topic: college

[0m[1;3;38;2;155;135;227m> Running module 57bd8005-ea8e-40c6-9ea5-e714cc166401 with input: 
messages: Please generate a concise question about Paul Graham's life regarding the following topic college

[0m[1;3;38;2;155;135;227m> Running module a901183d-16ab-4786-aace-15bb59d49844 with input: 
query_str: assistant: How did Paul Graham's college experience shape his career and entrepreneurial mindset?

[0m[1;3;38;2;155;135;227m> Running module af775f22-0342-4930-89fc-6a14dc37dbc0 with input: 
messages: Please write a passage to answer the question
Try to include as many key details as possible.


How did Paul Graham's college experience shape his career and entrepreneurial mindset?


Passage:"""


[0m[1;3;38;2;155;135;227m> Running module e68186fe-29c2-4fc4-929b-731d4125fc49 with input: 
input: assistant: Paul Graham's college experience played a pivotal role in shaping his ca

5

## Create a Full RAG Pipeline as a DAG

Here we chain together a full RAG pipeline consisting of query rewriting, retrieval, reranking, and response synthesis.

Here we can’t use chain syntax because certain modules depend on multiple inputs (for instance, response synthesis expects both the retrieved nodes and the original question). Instead we’ll construct a DAG explicitly, through add_modules and then add_link.

### RAG Pipeline with Query Rewriting

We use an LLM to rewrite the query first before passing it to our downstream modules - retrieval/reranking/synthesis.

In [13]:
from llama_index.postprocessor import CohereRerank
from llama_index.response_synthesizers import TreeSummarize
from llama_index import ServiceContext

In [15]:
# define modules
prompt_str = "Please generate a question about Paul Graham's life regarding the following topic {topic}"
prompt_tmpl = PromptTemplate(prompt_str)
llm = OpenAI(model="gpt-3.5-turbo")
retriever = index.as_retriever(similarity_top_k=3)
reranker = CohereRerank()
summarizer = TreeSummarize(
    service_context=ServiceContext.from_defaults(llm=llm)
)

Define the query pipeline

In [16]:
# define query pipeline
p = QueryPipeline(verbose=True)
p.add_modules(
    {
        "llm": llm,
        "prompt_tmpl": prompt_tmpl,
        "retriever": retriever,
        "summarizer": summarizer,
        "reranker": reranker,
    }
)

Next we draw links between modules with add_link. add_link takes in the source/destination module ids, and optionally the source_key and dest_key. Specify the source_key or dest_key if there are multiple outputs/inputs respectively.

You can view the set of input/output keys for each module through module.as_query_component().input_keys and module.as_query_component().output_keys.

Here we explicitly specify dest_key for the reranker and summarizer modules because they take in two inputs (query_str and nodes).

In [17]:
p.add_link("prompt_tmpl", "llm")
p.add_link("llm", "retriever")
p.add_link("retriever", "reranker", dest_key="nodes")
p.add_link("llm", "reranker", dest_key="query_str")
p.add_link("reranker", "summarizer", dest_key="nodes")
p.add_link("llm", "summarizer", dest_key="query_str")

# look at summarizer input keys
print(summarizer.as_query_component().input_keys)

required_keys={'query_str', 'nodes'} optional_keys=set()


In [18]:
response = p.run(topic="YC")

[1;3;38;2;155;135;227m> Running module prompt_tmpl with input: 
topic: YC

[0m[1;3;38;2;155;135;227m> Running module llm with input: 
messages: Please generate a question about Paul Graham's life regarding the following topic YC

[0m[1;3;38;2;155;135;227m> Running module retriever with input: 
input: assistant: What role did Paul Graham play in the founding and development of Y Combinator (YC)?

[0m[1;3;38;2;155;135;227m> Running module reranker with input: 
query_str: assistant: What role did Paul Graham play in the founding and development of Y Combinator (YC)?
nodes: [NodeWithScore(node=TextNode(id_='15e9d741-37ee-47c6-8585-7f972e71fb9d', embedding=None, metadata={'file_path': 'data\\paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/...

[0m[1;3;38;2;155;135;227m> Running module summarizer with input: 
query_str: assistant: What role did Paul Graham play in the founding and development of Y Combinator (YC)?
nodes: [NodeWithScore(node=TextNode(

In [19]:
print(str(response))

Paul Graham played a significant role in the founding and development of Y Combinator (YC). He was one of the co-founders of YC and was actively involved in shaping its initial vision and structure. He and his co-founders recognized the need for a new type of investment firm that would provide more comprehensive support to early-stage startups. They wanted to help founders with not just funding but also with other crucial aspects of starting a company, such as legal incorporation and mentorship. Paul Graham's experience as a startup founder himself and his understanding of the challenges faced by founders played a crucial role in shaping YC's approach. He also played a role in selecting and funding the first batch of startups and continued to be actively involved in the growth and development of YC over the years.


You can do async too

In [20]:
response = await p.arun(topic="YC")
print(str(response))

[1;3;38;2;155;135;227m> Running modules and inputs in parallel: 
Module key: prompt_tmpl. Input: 
topic: YC


[0m[1;3;38;2;155;135;227m> Running modules and inputs in parallel: 
Module key: llm. Input: 
messages: Please generate a question about Paul Graham's life regarding the following topic YC


[0m[1;3;38;2;155;135;227m> Running modules and inputs in parallel: 
Module key: retriever. Input: 
input: assistant: What role did Paul Graham play in the founding and development of Y Combinator (YC)?


[0m[1;3;38;2;155;135;227m> Running modules and inputs in parallel: 
Module key: reranker. Input: 
query_str: assistant: What role did Paul Graham play in the founding and development of Y Combinator (YC)?
nodes: [NodeWithScore(node=TextNode(id_='15e9d741-37ee-47c6-8585-7f972e71fb9d', embedding=None, metadata={'file_path': 'data\\paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/...


[0m[1;3;38;2;155;135;227m> Running modules and inputs in parallel: 
M

## RAG Pipeline without Query Rewriting

Here we setup a RAG pipeline without the query rewriting step.

Here we need a way to link the input query to both the retriever, reranker, and summarizer. We can do this by defining a special InputComponent, allowing us to link the inputs to multiple downstream modules.

In [21]:
from llama_index.postprocessor import CohereRerank
from llama_index.response_synthesizers import TreeSummarize
from llama_index import ServiceContext
from llama_index.query_pipeline import InputComponent

retriever = index.as_retriever(similarity_top_k=5)
summarizer = TreeSummarize(
    service_context=ServiceContext.from_defaults(
        llm=OpenAI(model="gpt-3.5-turbo")
    )
)
reranker = CohereRerank()

In [22]:
p = QueryPipeline(verbose=True)
p.add_modules(
    {
        "input": InputComponent(),
        "retriever": retriever,
        "summarizer": summarizer,
    }
)
p.add_link("input", "retriever")
p.add_link("input", "summarizer", dest_key="query_str")
p.add_link("retriever", "summarizer", dest_key="nodes")

In [23]:
output = p.run(input="what did the author do in YC")
print(str(output))

[1;3;38;2;155;135;227m> Running module input with input: 
input: what did the author do in YC

[0m[1;3;38;2;155;135;227m> Running module retriever with input: 
input: what did the author do in YC

[0m[1;3;38;2;155;135;227m> Running module summarizer with input: 
query_str: what did the author do in YC
nodes: [NodeWithScore(node=TextNode(id_='3c623089-a168-40b9-9807-edea40010eb4', embedding=None, metadata={'file_path': 'data\\paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/...

[0mThe author worked in various aspects of Y Combinator (YC). They funded startups as an angel investor, provided support and guidance to founders, and helped them with setting up their companies. Additionally, they worked on internal software for YC, wrote essays about startups, and worked on a news aggregator called Hacker News.


## Defining a Custom Component in a Query Pipeline

You can easily define a custom component. Simply subclass a QueryComponent, implement validation/run functions + some helpers, and plug it in.

In [24]:
from llama_index.query_pipeline import (
    CustomQueryComponent,
    InputKeys,
    OutputKeys,
)
from typing import Dict, Any
from llama_index.llms.llm import BaseLLM
from pydantic import Field


class RelatedMovieComponent(CustomQueryComponent):
    """Related movie component."""

    llm: BaseLLM = Field(..., description="OpenAI LLM")

    def _validate_component_inputs(
        self, input: Dict[str, Any]
    ) -> Dict[str, Any]:
        """Validate component inputs during run_component."""
        # NOTE: this is OPTIONAL but we show you here how to do validation as an example
        return input

    @property
    def _input_keys(self) -> set:
        """Input keys dict."""
        # NOTE: These are required inputs. If you have optional inputs please override
        # `optional_input_keys_dict`
        return {"movie"}

    @property
    def _output_keys(self) -> set:
        return {"output"}

    def _run_component(self, **kwargs) -> Dict[str, Any]:
        """Run the component."""
        # use QueryPipeline itself here for convenience
        prompt_str = "Please generate related movies to {movie_name}"
        prompt_tmpl = PromptTemplate(prompt_str)
        p = QueryPipeline(chain=[prompt_tmpl, llm])
        return {"output": p.run(movie_name=kwargs["movie"])}

Let’s try the custom component out! We’ll also add a step to convert the output to Shakespeare.

In [25]:
llm = OpenAI(model="gpt-3.5-turbo")
component = RelatedMovieComponent(llm=llm)

# let's add some subsequent prompts for fun
prompt_str = """\
Here's some text:

{text}

Can you rewrite this in the voice of Shakespeare?
"""
prompt_tmpl = PromptTemplate(prompt_str)

p = QueryPipeline(chain=[component, prompt_tmpl, llm], verbose=True)

In [26]:
output = p.run(movie="Love Actually")
print(str(output))

[1;3;38;2;155;135;227m> Running module 1ebfa41f-5ac9-441c-b009-b18cf45edec0 with input: 
movie: Love Actually

[0m[1;3;38;2;155;135;227m> Running module e9ce341a-1ae8-4379-9d04-790a0ed024f9 with input: 
text: assistant: 1. "Valentine's Day" (2010) - This romantic comedy follows the lives of several interconnected couples and singles in Los Angeles as they navigate love and relationships on Valentine's Day....

[0m[1;3;38;2;155;135;227m> Running module dd728f26-fcaa-4cad-bd57-fa7ee052936f with input: 
messages: Here's some text:

1. "Valentine's Day" (2010) - This romantic comedy follows the lives of several interconnected couples and singles in Los Angeles as they navigate love and relationships on Valentin...

[0massistant: 1. "Valentine's Daye" (2010) - Thise romantic comedy doth follow the lives of several interconnected couples and singles in Los Angeles as they doth navigate love and relationships on Valentine's Daye.

2. "New Year's Eve" (2011) - Similar to "Love Actually,"