# 🍫Tune your RAG data pipeline and evaluate its performance

> ⚠️ This notebook can be run on your local machine or on a virtual machine and requires [Docker Compose](https://docs.docker.com/desktop/).
> Please note that it is not compatible with Google Colab as the latter does not support Docker.

In this notebook we demonstrate how to iteratively evaluate and tune a Retrieval-Augmented Generation (RAG) system using [Fondant](https://fondant.ai).

We will:

1. Set up a [Weaviate](https://weaviate.io/platform) vector store
2. Define a parameter set to test
3. Run a Fondant pipeline with those parameters to index our documents into the vector store
4. Run a Fondant pipeline with those parameters to evaluate the performance
5. Inspect the evaluation results and data between each processing step
6. Repeat step 2 - 5 until we're happy with the results

<div align="center">
<img src="../art/iteration.png" width="1000"/>
</div>

## Set up environment

> ⚠️ This section checks the prerequisites of your environment. Read any errors or warnings carefully.

Ensure a **Python between version 3.8 and 3.10** is available

In [None]:
import sys
if sys.version_info < (3, 8, 0) or sys.version_info >= (3, 11, 0):
    raise Exception(f"A Python version between 3.8 and 3.10 is required. You are running {sys.version}")

Check if **docker compose** is installed and the **docker daemon** is running

In [None]:
!docker compose version

Install Fondant framework

In [None]:
!pip install -q -r ../requirements.txt --disable-pip-version-check && echo "Success"

## Spin up the Weaviate vector store

> ⚠️ For **Apple M1/M2** chip users:
> 
> - In Docker Desktop Dashboard `Settings -> Features in development`, make sure to **un**check `Use containerd` for pulling and storing images. More info [here](https://docs.docker.com/desktop/settings/mac/#beta-features)
> - Make sure that Docker uses linux/amd64 platform and not arm64 (cell below should take care of that)

Run **Weaviate** with Docker compose

In [None]:
!docker compose -f weaviate_service/docker-compose.yaml up --detach

Make sure you have **Weaviate client v3**

Make sure the vectorDB is running and accessible

In [None]:
import logging
import weaviate

try:
    local_weaviate_client = weaviate.Client("http://localhost:8081")
    logging.info("Connected to Weaviate instance")
except weaviate.WeaviateStartUpError:
    logging.error("Cannot connect to weaviate instance, is it running?")

#### Indexing pipeline

`pipeline_index.py` processes text data and loads it into the vector database

<div align="center">
<img src="../art/indexing_ltr.png" width="800"/>
</div>

- [**Load data**](https://github.com/ml6team/fondant/tree/main/components/load_from_parquet): loads data from the Hugging Face Hub
- [**Chunk data**](https://github.com/ml6team/fondant/tree/main/components/chunk_text): divides the text into sections of a certain size and with a certain overlap
- [**Embed chunks**](https://github.com/ml6team/fondant/tree/main/components/embed_text): embeds each chunk as a vector, e.g. using [Cohere](https://cohere.com/embeddings)
- [**Index vector store**](https://github.com/ml6team/fondant/tree/main/components/index_weaviate): writes data and embeddings to the vector store

> 💡 This notebook defaults to the first 1000 rows of the [wikitext](https://huggingface.co/datasets/wikitext) dataset for demonstration purposes, but you can load your own dataset using one the other load components available on the [**Fondant Hub**](https://fondant.ai/en/latest/components/hub/#component-hub) or by creating your own [**custom load component**](https://fondant.ai/en/latest/guides/implement_custom_components/). Keep in mind that changing the dataset implies that you also need to change the evaluation dataset used in the evaluation pipeline. 

#### Create the indexing pipeline

We are reusing the index pipeline from the [indexing notebook](./indexing.ipynb). Therefore, we have extracted the code into a separate file and created a function that parameterizes the entire pipeline. 

In [None]:
import utils

BASE_PATH = "./data"
utils.create_directory_if_not_exists(BASE_PATH)
weaviate_url = f"http://{utils.get_host_ip()}:8081"
weaviate_class = "Pipeline1"

## Evaluation Pipeline

`pipeline_eval.py` evaluates retrieval performance using the questions provided in your test dataset

<div align=center>
<img src="../art/evaluation_ltr.png" width="800"/>
</div>

- [**Load eval data**](https://github.com/ml6team/fondant/tree/main/components/load_from_csv): loads the evaluation dataset (questions) from a csv file
- [**Embed questons**](https://github.com/ml6team/fondant/tree/main/components/embed_text): embeds each question as a vector, e.g. using [Cohere](https://cohere.com/embeddings)
- [**Query vector store**](https://github.com/ml6team/fondant/tree/main/components/retrieve_from_weaviate): retrieves the most relevant chunks for each question from the vector store
- [**Evaluate**](https://github.com/ml6team/fondant/tree/0.8.0/components/evaluate_ragas): evaluates the retrieved chunks for each question, e.g. using [RAGAS](https://docs.ragas.io/en/latest/index.html)
- [**Aggregate**](https://github.com/ml6team/fondant-usecase-RAG/tree/main/src/components/aggregate_eval_results): calculates aggregated results

### Create the evaluation pipeline

⚠️ If you want to use an **OpenAI** model for evaluation you will need an [API key](https://platform.openai.com/docs/quickstart) (see TODO below)

Change the arguments below if you want to run the pipeline with different parameters.

In [None]:
evaluation_args = {
    "retrieval_top_k": 2,
    "llm_module_name": "langchain.chat_models",
    "llm_class_name": "ChatOpenAI",
    "llm_kwargs": {
      "openai_api_key": "" ,   # TODO: Update with your key or use a different model
      "model_name" : "gpt-3.5-turbo"
    },
    "evaluation_metrics": ["context_precision", "context_relevancy"]
}

We begin by initializing our pipeline.

In [None]:
import pyarrow as pa
from fondant.pipeline import Pipeline
evaluation_pipeline = Pipeline(
        name="evaluation-pipeline",
        description="Pipeline to evaluate a RAG system",
        base_path=BASE_PATH,
)


We have created a set of evaluation questions which we will use to evaluate the retrieval performance of the RAG system. Therefore, we need to load the CSV file containing the questions. We are going to use a reusable component for this task, `load_from_csv`.

In [None]:
evaluation_set_filename = "wikitext_1000_q.csv"

load_from_csv = evaluation_pipeline.read(
    "load_from_csv",
    arguments={
        "dataset_uri": "/evaldata/" + evaluation_set_filename,
        # mounted dir from within docker as extra_volumes
        "column_separator": ";",
    },
    produces={
        "question": pa.string(),
    },
)

Afterward, we are going to embed our questions and retrieve answers from the database. Here we will once again use the reusable `embed_text` component.

In [None]:
embed_text_op = load_from_csv.apply(
    "embed_text",
    arguments={
        "model_provider": "huggingface",
        "model": "all-MiniLM-L6-v2"
    },
    consumes={
        "text": "question",
    }
)

Afterwards, we are going to retrieve chunks from the vector database and evaluate the retrieved chunks using RAGAS. Finally, we are going to aggregate the metrics to allow an overall performance estimation.

Take a look at the `components` folder to learn more about the custom component implementation."

In [17]:
import pandas as pd
import pyarrow as pa
from fondant.component import PandasTransformComponent
from fondant.pipeline import lightweight_component


@lightweight_component(
    produces={"retrieved_chunks": pa.list_(pa.string())},
    extra_requires=["weaviate-client==3.24.1"],
)
class RetrieveFromWeaviateComponent(PandasTransformComponent):
    def __init__(self, *, weaviate_url: str, class_name: str, top_k: int) -> None:
        import weaviate

        self.client = weaviate.Client(
            url=weaviate_url,
            additional_config=None,
            additional_headers=None,
        )
        self.class_name = class_name
        self.k = top_k

    def teardown(self) -> None:
        del self.client

    def retrieve_chunks_from_embeddings(self, vector_query: list):
        """Get results from weaviate database."""
        query = (
            self.client.query.get(self.class_name, ["passage"])
            .with_near_vector({"vector": vector_query})
            .with_limit(self.k)
            .with_additional(["distance"])
        )

        result = query.do()
        result_dict = result["data"]["Get"][self.class_name]
        return [retrieved_chunk["passage"] for retrieved_chunk in result_dict]

    def transform(self, dataframe: pd.DataFrame) -> pd.DataFrame:
        dataframe["retrieved_chunks"] = dataframe["embedding"].apply(self.retrieve_chunks_from_embeddings)
        return dataframe

# Add component to pipeline
retrieve_chunks = embed_text_op.apply(
    RetrieveFromWeaviateComponent,
    arguments={
        "weaviate_url": weaviate_url,
        "class_name": weaviate_class,
        "top_k": 2
    },
)

 Consumes: {'question': {'type': 'string'}, 'embedding': {'type': 'array', 'items': {'type': 'float32'}}}


In [None]:
@lightweight_component(
    consumes={
        "question": pa.string(),
        "retrieved_chunks": pa.list_(pa.string()),
    },
    produces={
        "context_precision": pa.float32(),
        "context_relevancy": pa.float32(),
    },
    extra_requires=["ragas==0.0.21"],
)
class RagasEvaluator(PandasTransformComponent):

    def transform(self, dataframe: pd.DataFrame) -> pd.DataFrame:
        from datasets import Dataset
        from ragas import evaluate
        from ragas.metrics import context_precision, context_relevancy
        from langchain_openai.chat_models import ChatOpenAI

        gpt_evaluator = ChatOpenAI(model_name="gpt-3.5-turbo")

        dataframe = dataframe.rename(
            columns={"retrieved_chunks": "contexts"},
        )
        
        dataset = Dataset.from_pandas(dataframe)

 
        #if "id" in hf_dataset.column_names:
        #    hf_dataset = hf_dataset.remove_columns("id")

        result = evaluate(
            dataset,  
            metrics=[context_precision, context_relevancy],
            llm=gpt_evaluator,
        )

        results_df = result.to_pandas()
        results_df = results_df.set_index(dataframe.index)

        return results_df
    
# Add component to pipeline
retriever_eval = retrieve_chunks.apply(
    RagasEvaluator,
)

In [None]:
from fondant.component import DaskTransformComponent
import dask.dataframe as dd


@lightweight_component(
    consumes={
        "context_precision": pa.float32(),
        "context_relevancy": pa.float32(),
    },
    produces={
        "metric": pa.string(),
        "score": pa.float32(),
    },
)
class AggregateResults(DaskTransformComponent):
    def transform(self, dataframe: dd.DataFrame) -> dd.DataFrame:
        metrics = list(self.consumes.keys())
        agg = dataframe[metrics].mean()
        agg_df = agg.to_frame(name="score")
        agg_df["metric"] = agg.index
        agg_df.index = agg_df.index.astype(str)

        return agg_df

# Add component to pipeline
retriever_eval.apply(
    AggregateResults, 
    consumes={
        "context_precision": "context_precision",
        "context_relevancy": "context_relevancy"
    }
)

#### Run the evaluation pipeline

In [None]:
import os
from fondant.pipeline.runner import DockerRunner
runner = DockerRunner() 
extra_volumes = [str(os.path.join(os.path.abspath('.'), "evaluation_datasets")) + ":/evaldata"]
runner.run(evaluation_pipeline, extra_volumes=extra_volumes)

#### Show evaluation results

In [None]:
utils.get_metrics_latest_run(base_path=BASE_PATH)

## Explore data

You can also check your data and results at each step in the pipelines using the **Fondant data explorer**. The first time you run the data explorer, you need to download the docker image which may take a minute. Then you can access the data explorer at: **http://localhost:8501/**

Enjoy the exploration! 🍫 

Press the ◼️ in the notebook toolbar to **stop the explorer**.

In [None]:
from fondant.explore import run_explorer_app

run_explorer_app(base_path=BASE_PATH)

To stop the Explore, run the cell below.

In [None]:
from fondant.explore import stop_explorer_app

stop_explorer_app()

## Clean up your environment

After your pipeline run successfully, you can **clean up** your environment and stop the weaviate database.

In [None]:
!docker compose -f weaviate/docker-compose.yaml down

## Feedback

Please share your experience or **let us know how we can improve** through our 
* [**Discord**](https://discord.gg/HnTdWhydGp) 
* [**GitHub**](https://github.com/ml6team/fondant)

And of course feel free to give us a [**star** ⭐](https://github.com/ml6team/fondant) if you like what we are doing!