## Set up environment

**Check if docker compose is installed and the docker daemon is running**

In [None]:
!docker compose version
!docker info

### Set up Fondant

In [None]:
!pip install -r ../requirements.txt

## Initiate the Weaviate vector store

If you are using a MacBook with a M1 processor you have to make sure to set the docker default platform to linux/amd64

In [None]:
import os
os.environ["DOCKER_DEFAULT_PLATFORM"]="linux/amd64"

Run Weaviate with Docker compose

In [None]:
!docker compose -f weaviate/docker-compose.yaml up --detach

Make sure you have Weaviate client v3

In [None]:
!pip install "weaviate-client==3.*"

Make sure the vectorDB is running and accessible

In [None]:
import weaviate

local_weaviate_client = weaviate.Client("http://localhost:8080")
local_weaviate_client.schema.get()

## Parameters-Search

**Import the pipelines creator and the pipeline runner**

In [None]:
from fondant.pipeline.runner import DockerRunner
import pipeline_index, pipeline_eval
from utils import get_host_ip, create_directory_if_not_exists, run_parameters_search

**Run the Grid-Search**

In [None]:
# Define evaluation dataset to load (csv file with a "question" column)
extra_volumes = [str(os.path.join(os.path.abspath('.'), "local_file")) + ":/data"]

# Define the values for grid search
chunk_sizes = [256]
chunk_overlaps = [10]
embed_models = [("huggingface","all-MiniLM-L6-v2")]
top_ks = [2]

# configurable parameters shared by indexing and evaluation pipeline (further below)
host_ip = get_host_ip() #get the host IP address to enable Docker access to Weaviate

BASE_PATH = "./data-dir"
BASE_PATH = create_directory_if_not_exists(BASE_PATH) #create a folder to store the pipeline data if it doesn't exist

fixed_args = {
    "pipeline_dir":BASE_PATH,
    "weaviate_url":f"http://{host_ip}:8080", # IP address 
}
fixed_index_args = {
    "hf_dataset_name":"wikitext@~parquet",
    "data_column_name":"text",
    "n_rows_to_load":100,
}
fixed_eval_args = {
    "csv_dataset_uri":"/data/wikitext_1000_q.csv", #make sure it is the same as mounted file
    "csv_column_separator":";",
    "question_column_name":"question",
    "module": "langchain.llms",
    "llm_name":"OpenAI",
    "llm_kwargs":{"openai_api_key": ""}, #TODO Specify your key if you're using OpenAI
    "metrics":["context_precision", "context_relevancy"]
}

parameters_search_results = run_parameters_search(
    extra_volumes=extra_volumes,
    fixed_args=fixed_args,
    fixed_index_args=fixed_index_args,
    fixed_eval_args=fixed_eval_args,
    chunk_sizes=chunk_sizes,
    chunk_overlaps= chunk_overlaps,
    embed_models=embed_models,
    top_ks=top_ks,
)

## Evaluation Results

**Read Latest Evaluated Pipeline Score**

You can read the results for each RAG configuration ran. 

In [None]:
from utils import read_evaluated_pipelines

read_evaluated_pipelines(parameters_search_results=parameters_search_results)

In [None]:
from utils import output_results

output_results(results=parameters_search_results)

## Exploring the dataset

You can explore your results using the fondant explorer, this enables you to visualize your output dataset at each component step. It might take a while to start the first time as it needs to download the explorer docker image first. 

Enjoy the exploration! 🍫 

In [None]:
from fondant.explore import run_explorer_app

run_explorer_app(base_path=fixed_args["pipeline_dir"])

## Clean up your environment

After your pipeline run successfully, you should clean up your environment and stop the weaviate database.

In [None]:
!docker compose -f weaviate/docker-compose.yaml down