In the newsletter, following are the steps to building a RAG pipeline:

Step 1: Take an inbound query and deconstruct it into relevant concepts  
Step 2: Collect similar concepts from your data store  
Step 3: Recombine these concepts with your original query to build a more relevant, authoritative answer.  
Let's see how is the implementation in this project

# Step 1: Take an inbound query and deconstruct it into relevant concepts

We can take a look at the `query_pipeline.py` file, specifically within the `configure_query_pipeline` function:

```python
def configure_query_pipeline(*, index: VectorStoreIndex, llm: OpenAI) -> QueryPipeline:
    """Configure and set up the query pipeline"""
    text_qa_chat_template = ChatPromptTemplate.from_messages(_chat_template_messages)
    query_pipeline = QueryPipeline()

    retriever = index.as_retriever(similarity_top_k=_TOP_K_RETRIEVAL)
    summarizer = TreeSummarize(
        llm=llm, streaming=True, summary_template=text_qa_chat_template
    )

    query_pipeline.add_modules(
        {
            "input": InputComponent(),
            "retriever": retriever,
            "summarizer": summarizer,
        }
    )
    query_pipeline.add_link("input", "retriever")
    query_pipeline.add_link("input", "summarizer", dest_key="query_str")
    query_pipeline.add_link("retriever", "summarizer", dest_key="nodes")

    return query_pipeline
```

It takes the inbound query using InputComponent library and deconstructs it using QueryPipeline library.
Those classes are provided by llama_index.core.query_pipeline (see llama_index.core.query_pipeline note for more detail)

`InputComponent`is responsible for taking the raw query input and preparing it for further processing.
`QueryPipeline` is to chain different components to create workflows for processing queries.

### Retriever

A **retriever** in LlamaIndex is a component designed to fetch relevant information from a dataset based on a given query. It acts as a bridge between the raw data and the language model, ensuring that the most pertinent pieces of information are selected for further processing.

#### Example Usage of a Retriever

Imagine you have a collection of documents about various topics, and you want to find information related to "machine learning." A retriever will search through the documents and return the most relevant ones.

Here's a simple example:

```python
from llama_index import GPTSimpleVectorIndex

# Load your data and create an index
documents = ["This is a document about machine learning.", "Another document discussing deep learning."]
index = GPTSimpleVectorIndex.from_documents(documents)

# Create a retriever
retriever = index.as_retriever()

# Use the retriever to find relevant documents
query = "machine learning"
retrieved_docs = retriever.retrieve(query)

for doc in retrieved_docs:
    print(doc)
```

In this example, the retriever searches the index for documents related to "machine learning" and returns them.

### Summarizer

A **summarizer** in LlamaIndex is a component that takes the retrieved information and generates a concise summary. It helps in distilling the essential points from the retrieved documents, making it easier to understand the key information.

#### Example Usage of a Summarizer

Continuing from the previous example, let's say you want to summarize the retrieved documents to get a brief overview of the content.

Here's how you can use a summarizer:

```python
from llama_index.core.response_synthesizers import TreeSummarize
from llama_index.llms import OpenAI

# Create a summarizer
summarizer = TreeSummarize(llm=OpenAI(model="gpt-3.5-turbo"))

# Summarize the retrieved documents
query_str = "Summarize the information about machine learning"
text_chunks = [doc.text for doc in retrieved_docs]
summary = summarizer.get_response(query_str, text_chunks)

print(summary)
```

In this example, the summarizer takes the text chunks from the retrieved documents and generates a summary based on the query "Summarize the information about machine learning."
