# Combine retrieve, augment, and generate APIs in a single pipeline

We have [deployed microservices on the Hub](https://huggingface.co/collections/ai-blueprint/retrieval-augemented-generation-rag-6790c9f597b02c043cfbf7af) for [retrieving](retrieve.ipynb), [augmenting](augment.ipynb) and [generating](generate.ipynb). Currently, we will show how to tie them all together in a complete RAG pipeline, which we will deploy as a microservice of its own at the end of this notebook.

## Dependencies and imports

Let's install the necessary dependencies.

In [None]:
!pip install gradio-client pandas

Next, let's import the necessary libraries.

In [3]:
from gradio_client import Client
import pandas as pd

## Retrieve documents

Let's start by retrieving documents that are relevant to answering the query. We use the Hugging Face Hub as vector search backend with the [ai-blueprint/fineweb-bbc-news-embeddings](https://huggingface.co/datasets/ai-blueprint/fineweb-bbc-news-embeddings) dataset and can call it through a REST API with the Gradio Python Client. Our API is available at https://ai-blueprint-rag-retrieve.hf.space/?view=api. See the [retrieve notebook](./retrieve.ipynb) for more details.

In [4]:
gradio_client_retrieve = Client("https://ai-blueprint-rag-retrieve.hf.space/")

def retrieve(query: str, k: int = 5):
    results = gradio_client_retrieve.predict(api_name="/similarity_search", query=query, k=k)
    return pd.DataFrame(data=results["data"], columns=results["headers"])

retrieve("What is the future of AI?", k=5)

Loaded as API: https://ai-blueprint-rag-retrieve.hf.space/ ✔


Unnamed: 0,url,text,distance
0,https://www.bbc.com/news/technology-51064369,The last decade was a big one for artificial i...,0.2812
1,http://www.bbc.co.uk/news/technology-25000756,Singularity: The robots are coming to steal ou...,0.365842
2,http://www.bbc.com/news/technology-25000756,Singularity: The robots are coming to steal ou...,0.365842
3,https://www.bbc.co.uk/news/technology-37494863,"Google, Facebook, Amazon join forces on future...",0.38082
4,https://www.bbc.co.uk/news/technology-37494863,"Google, Facebook, Amazon join forces on future...",0.38082


## Reranking retrieved documents

Whenever we retrieve documents from the vector search backend, we can use a reranker to improve the quality of the retrieved documents before passing them to the LLM.
We will first retrieve documents pass them to a reranker and return the reranked documents sorted by relevance. The reranker API is available at https://ai-blueprint-rag-augment.hf.space/?view=api. See the [rerank notebook](./rerank.ipynb) for more details. Note that we will re-use the retrieve methods from the previous section.

In [12]:
rerank_client = Client("https://ai-blueprint-rag-augment.hf.space/")

def retrieve_and_rerank(query: str, k_retrieved: int):
    documents = retrieve(query, k_retrieved)
    documents = documents.drop_duplicates("text")
    documents_dict = {
        "headers": documents.columns.tolist(),
        "data": documents.values.tolist()
    }
    results = rerank_client.predict(api_name="/rerank", query=query, documents=documents_dict)
    reranked_documents = pd.DataFrame(data=results["data"], columns=results["headers"])
    reranked_documents = reranked_documents.sort_values(by="rank", ascending=False)
    return reranked_documents

retrieve_and_rerank("What is the future of AI?", k_retrieved=10)

Loaded as API: https://ai-blueprint-rag-augment.hf.space/ ✔


Unnamed: 0,url,text,distance,rank
0,https://www.bbc.com/news/technology-51064369,The last decade was a big one for artificial i...,0.2812,0.505991
1,https://www.bbc.com/news/technology-52415775,UK spies will need to use artificial intellige...,0.414651,0.505261
2,https://www.bbc.co.uk/news/technology-37494863,"Google, Facebook, Amazon join forces on future...",0.38082,0.502983
3,http://www.bbc.com/news/world-us-canada-39425862,Vector Institute is just the latest in Canada'...,0.424994,0.50262
4,http://www.bbc.com/news/technology-39657505,Ted 2017: The robot that wants to go to univer...,0.424357,0.502362
5,http://www.bbc.co.uk/news/technology-25000756,Singularity: The robots are coming to steal ou...,0.365842,0.500686
6,https://www.bbc.co.uk/news/business-48139212,Artificial intelligence (AI) is one of the mos...,0.407243,0.500419


We can see the returned documents have slightly shifted in the ranking, which is good, because we see that our reranking works.

## Generating responses with reranked documents

We will now use the retrieved documents to generate a response based on the context. We will be using our the language model which we deployed as microservice on the Hub. See the [generate notebook](./generate.ipynb) for more details.

## Gradio as vector search interface

We will be using [Gradio](https://github.com/gradio-app/gradio) as web application tool to create a demo interface for our RAG pipeline. We can develop this locally and then easily deploy it to Hugging Face Spaces. Lastly, we can use the Gradio client as SDK to directly interact our RAG pipeline.

### Gradio as sharable app


In [38]:
import gradio as gr


def rag_interface(query: str, k_retrieved: int, k_reranked: int):
    response, documents = rag_pipeline(query, k_retrieved=k_retrieved, k_reranked=k_reranked)
    return response.content, documents


with gr.Blocks() as demo:
    gr.Markdown("""# RAG Hub Datasets 
                
                Part of [smol blueprint](https://github.com/davidberenstein1957/smol-blueprint) - a smol blueprint for AI development, focusing on practical examples of RAG, information extraction, analysis and fine-tuning in the age of LLMs.""")

    with gr.Row():
        query_input = gr.Textbox(
            label="Query", placeholder="Enter your question here...", lines=3
        )

    with gr.Row():
        with gr.Column():
            retrieve_slider = gr.Slider(
                minimum=1,
                maximum=20,
                value=10,
                label="Number of documents to retrieve",
            )
        with gr.Column():
            rerank_slider = gr.Slider(
                minimum=1,
                maximum=10,
                value=5,
                label="Number of documents to use after reranking",
            )

    submit_btn = gr.Button("Submit")
    response_output = gr.Textbox(label="Response", lines=10)
    documents_output = gr.Dataframe(
        label="Documents", headers=["chunk", "url", "distance", "rank"], wrap=True
    )

    submit_btn.click(
        fn=rag_interface,
        inputs=[query_input, retrieve_slider, rerank_slider],
        outputs=[response_output, documents_output],
    )

demo.launch()

* Running on local URL:  http://127.0.0.1:7865

To create a public link, set `share=True` in `launch()`.




<iframe
	src="https://smol-blueprint-rag-hub-datasets.hf.space"
	frameborder="0"
	width="850"
	height="450"
></iframe>

### Deploying Gradio to Hugging Face Spaces

We can now [deploy our Gradio application to Hugging Face Spaces](https://huggingface.co/new-space?sdk=gradio&name=rag-hub-datasets).

-  Click on the "Create Space" button.
-  Copy the code from the Gradio interface and paste it into an `app.py` file. Don't forget to copy the `generate_response_*` function, along with the code to execute the RAG pipeline.
-  Create a `requirements.txt` file with `gradio-client` and `sentence-transformers`.
-  Set a Hugging Face API as `HF_TOKEN` secret variable in the space settings, if you are using the Inference API.

We wait a couple of minutes for the application to deploy et voila, we have [a public RAG interface](https://huggingface.co/spaces/smol-blueprint/rag-hub-datasets)!

### Use the web app as microservice

We can now use the [Gradio client as SDK](https://www.gradio.app/guides/getting-started-with-the-python-client) to directly interact with our RAG pipeline. Each Gradio app has a API documentation that describes the available endpoints and their parameters, which you can access from the button at the bottom of the Gradio app's space page.

In [40]:
from gradio_client import Client

client = Client("https://smol-blueprint-rag-hub-datasets.hf.space/")
result = client.predict(
    query="What is the future of AI?",
    k_retrieved=10,
    k_reranked=5,
    api_name="/rag_pipeline",
)
result

Loaded as API: https://smol-blueprint-rag-hub-datasets.hf.space/ ✔
('In the future, artificial intelligence (AI) is expected to play an increasingly significant role in various aspects of society, including object recognition, computer vision, natural language processing, robotics, and more. AI is already being used to develop products and services that enhance personal convenience, make life more efficient, and improve the quality of lives.\n\nAI work will also be more methodical and less reliant on high-bandwidth parallelism, setting it free from the "wonder years" of the 21st century. This will allow AI systems to generalize and adapt more effectively, borrowing from examples and experiences to solve problems more efficiently with fewer examples.\n\nArtificial intelligence is expected to see an industry growth of $23 billion by 2023, a $3.75 billion growth compared to the previous year, and to continue growing at $6.9 billion each year.\n\nResearchers are actively exploring various 

## Next steps

We have seen how to build a RAG pipeline with a SmolLM and some rerankers. Next steps would be to monitor the performance of the RAG pipeline and improve it.