# Retrieval Augment Generation

### Usecase: Building docsrag: A query-engine to help developers quickly find information in open-source documentation
- More specifically we will be building **raybot** with our docsrag library: A retrieval-augmented question answering system using ray's documentation

### Techstack:
- `llama_index`
   - `llama_hub` for document loading
   - `openai` and `huggingface` for LLM models
   - `langchain` for language chaining
   - `nltk` for text processing
- `ray`

### Building a retrieval-augmented question answering system using ray documentation
Retrieval augmented generation (RAG) is a paradigm for augmenting LLM with custom data. It generally consists of two stages:
1. indexing stage: preparing a knowledge base
2. querying stage: retrieving relevant context from the knowledge to assist the LLM in responding to a question

[<img src="rag.jpeg" height="500"/>](rag.jpeg)

# Indexing Stage
Given a dataset of documents, we first need to index them. This is done by:
- Load the documents
- Parse the documents into passages which are called nodes
- Use an embedding model to encode the nodes into embedding vectors
- Index the embeddings using a vector similarity search database
<!-- ![index](index_build.jpeg) -->
[<img src="index_build.jpeg" height="500"/>](index_build.jpeg)


### Document Loader

We will go over how a sample markdown document is loaded into a document object

Also the llama-index markdown-reader does not support introducing document relationships.

### DocumentLoader implementation in docsrag

We showcase the docsrag `GithubDocumentLoader` which simply an adapter for `llama_hub.github_repo.GithubRepositoryReader`

For the sake of simplicity, the `GithubDocumentLoader`:
- consider only markdown (`.md`) and restructured-text (`.rst`) files inside the ray repo doc/source folder.
- read the documents as raw text given the default `llama_index` readers have their flaws

In [1]:
from docsrag.docs_loader import GithubDocumentLoader

document_loader = GithubDocumentLoader(
    owner="ray-project",
    repo="ray",
    version_tag="releases/2.6.3",
    paths_to_include=["doc/source/"],
    file_extensions_to_include=[".md", ".rst"],
    paths_to_exclude=[
        "doc/source/_ext/",
        "doc/source/_includes/",
        "doc/source/_static/",
        "doc/source/_templates/",
    ],
    filenames_to_exclude=[],
)

In [2]:
# uncomment and run this command to fetch the documents
# docsrag fetch-documents --config-path ./data/config.yaml --data-path ./data --overwrite

In [3]:
import pickle

with open(f"./data/docs/{hash(document_loader)}.pkl", "rb") as f:
    docs = pickle.load(f)

In [4]:
print(f"Number of documents: {len(docs)}")

Number of documents: 426


In [5]:
sample_mkdown_doc = next(
    doc for doc in docs if doc.metadata["file_path"].endswith(".md")
)

print(sample_mkdown_doc.text[:500])

(observability-configure-manage-dashboard)=
# Configuring and Managing Ray Dashboard
{ref}`Ray Dashboard<observability-getting-started>` is one of the most important tools to monitor and debug Ray applications and Clusters. This page describes how to configure Ray Dashboard on your Clusters.

Dashboard configurations may differ depending on how you launch Ray Clusters (e.g., local Ray Cluster v.s. KubeRay). Integrations with Prometheus and Grafana are optional for enhanced Dashboard experience.



## Node Parser
A node parser chunks a document into nodes

The parser will:
- run a text chunker
- inject additional node metadata
- construct node relationships

A node is:
- the chunk text plus metadata (e.g. node text hash, node relationships to other nodes)

We showcase the docsrag `NodeParser` which simply an adapter for `llama_hub.github_repo.GithubRepositoryReader`

In [6]:
from docsrag.node_parser import NodeParser

In [7]:
node_parser = NodeParser.parse_obj(
    {
        "inherit_metadata_from_doc": True,
        "construct_prev_next_relations": True,
        "text_chunker": {
            "chunk_size": 1024,
            "chunk_overlap": 20,
            "paragraph_separator": "\n\n\n",
            "sentence_tokenizer": {"type": "tokenizers/punkt"},
            "secondary_chunking_regex": "[^,.;。]+[,.;。]?",
            "tokenizer": {"encoding": "gpt2"},
            "word_seperator": " ",
        },
        "metadata_pipeline": {
            "extractors": [
                "file_path_extractor",
                "text_hash_extractor",
            ]
        },
    }
)

In [8]:
# uncomment and run this command to parse the nodes
# docsrag parse-nodes --config-path ./data/config.yaml --data-path ./data --overwrite

In [9]:
%psource node_parser.run

    [0;32mdef[0m [0mrun[0m[0;34m([0m[0;34m[0m
[0;34m[0m        [0mself[0m[0;34m,[0m [0mdocuments[0m[0;34m:[0m [0mlist[0m[0;34m[[0m[0;34m"Document"[0m[0;34m][0m[0;34m,[0m [0muse_ray[0m[0;34m:[0m [0mbool[0m [0;34m=[0m [0;32mTrue[0m[0;34m,[0m [0mbatch_size[0m[0;34m:[0m [0mint[0m [0;34m=[0m [0;36m100[0m[0;34m[0m
[0;34m[0m    [0;34m)[0m [0;34m->[0m [0mlist[0m[0;34m[[0m[0;34m"BaseNode"[0m[0;34m][0m[0;34m:[0m[0;34m[0m
[0;34m[0m        [0;34m"""Parse the documents into nodes."""[0m[0;34m[0m
[0;34m[0m        [0;32mif[0m [0muse_ray[0m[0;34m:[0m[0;34m[0m
[0;34m[0m            [0;32mimport[0m [0mray[0m[0;34m[0m
[0;34m[0m[0;34m[0m
[0;34m[0m            [0mray[0m[0;34m.[0m[0minit[0m[0;34m([0m[0mignore_reinit_error[0m[0;34m=[0m[0;32mTrue[0m[0;34m)[0m[0;34m[0m
[0;34m[0m[0;34m[0m
[0;34m[0m            [0;32mreturn[0m [0;34m[[0m[0;34m[0m
[0;34m[0m                [0mnode[0m[

In [10]:
import pickle

with open(f"./data/nodes/{hash(node_parser)}.pkl", "rb") as f:
    nodes = pickle.load(f)

In [11]:
print(f"Number of nodes: {len(nodes)}")

Number of nodes: 1212


In [12]:
import yaml
import gradio as gr
from docsrag.node_parser import NodeParser
import pickle

with open("tutorial_docs.pkl", "rb") as f:
    docs = pickle.load(f)

config = {
    "inherit_metadata_from_doc": True,
    "construct_prev_next_relations": True,
    "text_chunker": {
        "chunk_size": 1024,
        "chunk_overlap": 20,
        "paragraph_separator": "\n\n\n",
        "sentence_tokenizer": {"type": "tokenizers/punkt"},
        "secondary_chunking_regex": "[^,.;。]+[,.;。]?",
        "tokenizer": {"encoding": "gpt2"},
        "word_seperator": " ",
    },
    "metadata_pipeline": {
        "extractors": [
            "file_path_extractor",
            "text_hash_extractor",
        ]
    },
}


def parse_nodes(
    text,
    chunk_size=1024,
    chunk_overlap=20,
    paragraph_separator="\n\n\n",
    sentence_tokenizer="tokenizers/punkt",
    secondary_chunking_regex="[^,.;。]+[,.;。]?",
    tokenizer="gpt2",
    word_seperator=" ",
    extractors=["file_path_extractor", "text_hash_extractor"],
):
    config_dict = config
    config_dict["text_chunker"]["chunk_size"] = chunk_size
    config_dict["text_chunker"]["chunk_overlap"] = chunk_overlap
    config_dict["text_chunker"]["paragraph_separator"] = paragraph_separator
    config_dict["text_chunker"]["sentence_tokenizer"]["type"] = sentence_tokenizer
    config_dict["text_chunker"]["secondary_chunking_regex"] = secondary_chunking_regex
    config_dict["text_chunker"]["tokenizer"]["encoding"] = tokenizer
    config_dict["text_chunker"]["word_seperator"] = word_seperator
    config_dict["metadata_pipeline"]["extractors"] = extractors

    node_parser = NodeParser.parse_obj(config_dict)
    doc = docs[0]
    doc.text = text
    nodes = node_parser.run([doc], use_ray=False)
    return (
        nodes[0].text,
        yaml.dump(nodes[0].metadata),
        yaml.dump([str(rel) for rel in nodes[0].relationships]),
        nodes[1].text,
        yaml.dump(nodes[1].metadata),
        yaml.dump([str(rel) for rel in nodes[1].relationships]),
    )


with gr.Blocks() as demo:
    with gr.Row():
        with gr.Column():
            title = gr.Markdown(
                """
                # Node Parser Demo
                Shows how configuration options affect the output of the node parser.
                """
            )
    with gr.Row():
        with gr.Column(scale=3, min_width=100):
            text1 = gr.Textbox(label="Document", value=docs[0].text)
        with gr.Column(scale=1, min_width=100):
            text2 = gr.Textbox(label="NodeParser chunksize", value=1024)
            text3 = gr.Textbox(label="NodeParser chunk_overlap", value=20)
            text4 = gr.Textbox(label="NodeParser paragraph_separator", value='"\n\n\n"')
            text5 = gr.Textbox(
                label="NodeParser sentence_tokenizer", value="tokenizers/punkt"
            )
            text6 = gr.Textbox(
                label="NodeParser secondary_chunking_regex", value='"[^,.;。]+[,.;。]?"'
            )

    with gr.Row():
        inbtw = gr.Button("Submit", variant="primary")

    with gr.Row():
        with gr.Column(scale=3, min_width=100):
            out1 = gr.Textbox(label="First Node text")
        with gr.Column(scale=1, min_width=100):
            out2 = gr.Textbox(label="First Node metadata")
        with gr.Column(scale=1, min_width=100):
            out3 = gr.Textbox(label="First Node relationships")

    with gr.Row():
        with gr.Column(scale=3, min_width=100):
            out4 = gr.Textbox(label="Second Node text")
        with gr.Column(scale=1, min_width=100):
            out5 = gr.Textbox(label="Second Node metadata")
        with gr.Column(scale=1, min_width=100):
            out6 = gr.Textbox(label="Second Node relationships")
    inbtw.click(
        parse_nodes,
        inputs=[text1, text2, text3, text4, text5, text6],
        outputs=[out1, out2, out3, out4, out5, out6],
    )

demo.launch(quiet=True)

  from .autonotebook import tqdm as notebook_tqdm


Running on local URL:  http://127.0.0.1:7861




## Embedding model and vector store

We showcase the docsrag VectorStoreIndexRay (a very simple in-memory vector store) and how to use it to find similar nodes.

In [13]:
from docsrag.embedding.index import VectorStoreSpec, VectorStoreIndexRay

We start by building our VectorStoreIndexRay from the nodes we parsed earlier. This will compute the embeddings for each node and store them in a vector store.

In [14]:
%psource VectorStoreIndexRay.build_from_spec

    [0;34m@[0m[0mclassmethod[0m[0;34m[0m
[0;34m[0m    [0;32mdef[0m [0mbuild_from_spec[0m[0;34m([0m[0mcls[0m[0;34m,[0m [0mnodes[0m[0;34m,[0m [0mspec[0m[0;34m:[0m [0mVectorStoreSpec[0m[0;34m,[0m [0;34m**[0m[0mray_kwargs[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m        [0mdocstore[0m [0;34m=[0m [0mSimpleDocumentStore[0m[0;34m([0m[0;34m)[0m[0;34m[0m
[0;34m[0m        [0mvector_store[0m [0;34m=[0m [0mSimpleVectorStore[0m[0;34m([0m[0;34m)[0m[0;34m[0m
[0;34m[0m        [0mindex_struct[0m [0;34m=[0m [0mIndexDict[0m[0;34m([0m[0;34m)[0m[0;34m[0m
[0;34m[0m        [0membed_model[0m [0;34m=[0m [0mload_embed_model[0m[0;34m([0m[0mspec[0m[0;34m.[0m[0membedding_model_name[0m[0;34m)[0m[0;34m[0m
[0;34m[0m        [0mdocstore[0m[0;34m,[0m [0mvector_store[0m[0;34m,[0m [0mindex_struct[0m [0;34m=[0m [0mcls[0m[0;34m.[0m[0mupdate_from_nodes[0m[0;34m([0m[0;34m[0m
[0;34m[0m            [0m

In [15]:
%psource VectorStoreIndexRay._get_node_embeddings

    [0;34m@[0m[0mclassmethod[0m[0;34m[0m
[0;34m[0m    [0;32mdef[0m [0m_get_node_embeddings[0m[0;34m([0m[0mcls[0m[0;34m,[0m [0mnodes[0m[0;34m:[0m [0mSequence[0m[0;34m[[0m[0mBaseNode[0m[0;34m][0m[0;34m,[0m [0membed_model[0m[0;34m,[0m [0;34m**[0m[0mray_kwargs[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m        [0;34m"""Get node embeddings."""[0m[0;34m[0m
[0;34m[0m        [0mray[0m[0;34m.[0m[0minit[0m[0;34m([0m[0mignore_reinit_error[0m[0;34m=[0m[0;32mTrue[0m[0;34m)[0m[0;34m[0m
[0;34m[0m[0;34m[0m
[0;34m[0m        [0;32mreturn[0m [0;34m[[0m[0;34m[0m
[0;34m[0m            [0mnode_with_embedding[0m[0;34m[0m
[0;34m[0m            [0;32mfor[0m [0mnode_with_embedding[0m [0;32min[0m [0mray[0m[0;34m.[0m[0mdata[0m[0;34m.[0m[0mfrom_items[0m[0;34m([0m[0mnodes[0m[0;34m)[0m[0;34m[0m
[0;34m[0m            [0;34m.[0m[0mmap_batches[0m[0;34m([0m[0mpartial[0m[0;34m([0m[0mget_embedding[0

In [3]:
node_limit = None

embedding_vector_store_spec = VectorStoreSpec.parse_obj(
    {"embedding_model_name": "BAAI/bge-small-en"}
)

vector_store_index = VectorStoreIndexRay.build_from_spec(
    nodes=nodes[:node_limit] if node_limit else nodes,
    spec=embedding_vector_store_spec,
    num_gpus=0,
    batch_size=100,
)

NameError: name 'VectorStoreSpec' is not defined

In [22]:
from pathlib import Path

store.save(Path(f"./data/vector_index/{hash(vector_store_index)}"))

In [18]:
# to build the full vector store uncomment and run below command
# docsrag build-embedding-vector-store-index --config-path ./data/config.yaml --data-path ./data --overwrite

We load the vector store that was built in the previous step.

In [4]:
hash_vector_store = 525061202 # hash(vector_store_index)
loaded_index = VectorStoreIndexRay.load(Path(f"data/vector_index/{hash_vector_store}/"))

NameError: name 'VectorStoreIndexRay' is not defined

In [24]:
nodes_with_scores = loaded_index.retrieve_most_similiar_nodes(
    query="How can I migrate from a single-application config to a multi-application config in Ray Serve?",
    similarity_top_k=3,
)

In [25]:
print(f"Number of nodes fetched: {len(nodes_with_scores)}")

Number of nodes fetched: 3


In [26]:
most_similar_node = nodes_with_scores[0]
print(most_similar_node.node.text[-1170:], end="\n\n")
print(f"{most_similar_node.node.metadata=}")
print(f"{most_similar_node.score=}")


Migrating the single-application config `ServeApplicationSchema` to the multi-application config format `ServeDeploySchema` is straightforward. Each entry under the  `applications` field matches the old, single-application config format. To convert a single-application config to the multi-application config format:
* Copy the entire old config to an entry under the `applications` field.
* Remove `host` and `port` from the entry and move them under the `http_options` field.
* Name the application.
* If you haven't already, set the application-level `route_prefix` to the route prefix of the ingress deployment in the application. In a multi-application config, you should set route prefixes at the application level instead of for the ingress deployment in each application.
* When needed, add more applications.

For more details on the multi-application config format, see the documentation for [`ServeDeploySchema`](serve-rest-api-config-schema).

:::{note} 
You must remove `host` and `port

## Evaluating our Embedding Index using standard ranking and classification metrics

- Step1: Build a question and answer evaluation dataset from the ray documentation corpus
- Step2: Assess the quality of our embedding index based on the built dataset

### Building an Evaluation Dataset
[<img src="eval_build.jpeg" height="500"/>](eval_build.jpeg)


In [27]:
from textwrap import dedent
from docsrag.evaluation_dataset_generator import EvaluationDatasetBuilder

eval_dataset_builder = EvaluationDatasetBuilder.parse_obj(
    {
        "qa_generator_open_ai": {
            "model": "gpt-3.5-turbo",
            "system_prompt": dedent(
                """
            You are a helpful assistant that generates questions and answers from a provided context.
            The context will be selected documents from the ray's project documentation.
            The questions you generate should be obvious on their own and should mimic what a developer might ask trying to work with ray, especially if they can't directly find the answer in the documentation.
            The answers should be factually correct, can be of a variable length and can contain code.
            If the provided context does not contain enough information to create a question and answer, you should respond with 'I can't generate a question and answer from this context'. 
            The following is an example of how the output should look:
            Q1: How can I view ray dashboard from outside the Kubernetes cluster?
            A1: You can use port-forwarding. Run the command 'kubectl port-forward --address 0.0.0.0 ${RAYCLUSTER_HEAD_POD} 8265:8265'

            Q2: {question}
            A2: {answer}
            """
            ).lstrip(),
            "user_prompt_template": dedent(
                """
        Provide questions and answers from the following context:

        {context}
        """
            ).lstrip(),
            "max_tokens": 1024,
            "temperature": 1.0,
            "top_p": 0.85,
            "frequency_penalty": 0,
            "presence_penalty": 0,
        },
        "noise_injector_from_parquet": {"dataset_name": "trivia_questions.parquet"},
    }
)

In [28]:
# Note this is the prompt used by llama-index in its finetuning module
# """\
# Context information is below.

# ---------------------
# {context_str}
# ---------------------

# Given the context information and not prior knowledge.
# generate only questions based on the below query.

# You are a Teacher/ Professor. Your task is to setup \
# {num_questions_per_chunk} questions for an upcoming \
# quiz/examination. The questions should be diverse in nature \
# across the document. Restrict the questions to the \
# context information provided."
# """

In [29]:
qa_generator_openai = eval_dataset_builder.qa_generator_open_ai

In [30]:
questions = qa_generator_openai.run(context=most_similar_node.node.text)

In [31]:
print(questions)

Q1: How do you add a new application to Ray Serve?
A1: To add a new application, you need to add a new entry under the `applications` field in the config. Each application must have a unique name and route prefix.

Q2: How do you delete an application from Ray Serve?
A2: To delete an application, you need to remove the corresponding entry under the `applications` field in the config.

Q3: How do you update an application in Ray Serve?
A3: To update an application, you need to modify the config options in the corresponding entry under the `applications` field in the config.

Q4: How do you migrate from a single-application config to a multi-application config in Ray Serve?
A4: To migrate from a single-application config to a multi-application config, you need to:
- Copy the entire old config to an entry under the `applications` field.
- Remove `host` and `port` from the entry and move them under the `http_options` field.
- Name the application.
- Set the application-level `route_prefix`

### Evaluate our Embedding Vector Index Store

[<img src="run_eval.jpeg" height="600"/>](run_eval.jpeg)


In [32]:
from docsrag.embedding.evaluation import VectorStoreEvaluator

In [33]:
evaluator = VectorStoreEvaluator(
    vector_store_index=loaded_index,
    evaluation_dataset_name=hash(eval_dataset_builder),
    top_ks=[1, 3, 5, 7, 10]
)

In [34]:
scores = evaluator.run()

2023-08-31 08:47:14,535	INFO evaluation.py:227 -- Recall@1: 42.57%
2023-08-31 08:47:14,536	INFO evaluation.py:227 -- Recall@3: 60.10%
2023-08-31 08:47:14,536	INFO evaluation.py:227 -- Recall@5: 67.26%
2023-08-31 08:47:14,537	INFO evaluation.py:227 -- Recall@7: 71.20%
2023-08-31 08:47:14,538	INFO evaluation.py:227 -- Recall@10: 75.00%
2023-08-31 08:47:14,539	INFO evaluation.py:230 -- MRR@1: 0.4257
2023-08-31 08:47:14,539	INFO evaluation.py:230 -- MRR@3: 0.5036
2023-08-31 08:47:14,540	INFO evaluation.py:230 -- MRR@5: 0.5200
2023-08-31 08:47:14,541	INFO evaluation.py:230 -- MRR@7: 0.5262
2023-08-31 08:47:14,542	INFO evaluation.py:230 -- MRR@10: 0.5305
2023-08-31 08:47:14,542	INFO evaluation.py:233 -- NDCG@1: 0.4257
2023-08-31 08:47:14,543	INFO evaluation.py:233 -- NDCG@3: 0.5286
2023-08-31 08:47:14,544	INFO evaluation.py:233 -- NDCG@5: 0.5582
2023-08-31 08:47:14,545	INFO evaluation.py:233 -- NDCG@7: 0.5718
2023-08-31 08:47:14,545	INFO evaluation.py:233 -- NDCG@10: 0.5834


In [12]:
scores["recall@k"]

NameError: name 'scores' is not defined

### Now we are going to use the embedding vector store to augment our LLM model

In [1]:
from docsrag.llm.model import LLM, LLMPlusRag

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
predictor_without_rag = LLM(
    model="gpt-3.5-turbo",
    temperature=0.1,
    max_tokens=1000,
    max_retries=10,
)

In [8]:
query = "How can I set a metric and mode in ray Tune?"

In [39]:
answer_without_rag = predictor_without_rag.query(query)

In [40]:
print(answer_without_rag)

To set a metric and mode in Ray Tune, you can use the `metric` and `mode` parameters when defining your `tune.run()` function. Here's an example:

```python
import ray
from ray import tune

# Define your training function
def train_fn(config):
    # Your training logic here
    ...

# Set up the configuration space
config = {
    "learning_rate": tune.loguniform(0.001, 0.1),
    "batch_size": tune.choice([16, 32, 64]),
    ...
}

# Set the metric and mode
metric = "mean_accuracy"
mode = "max"

# Run the hyperparameter search
analysis = tune.run(
    train_fn,
    config=config,
    metric=metric,
    mode=mode,
    ...
)
```

In this example, the `metric` is set to `"mean_accuracy"` and the `mode` is set to `"max"`. This means that Ray Tune will search for hyperparameters that maximize the mean accuracy. You can change the metric and mode to suit your specific use case.


In [5]:
predictor_with_rag = LLMPlusRag(
    model="gpt-3.5-turbo",
    temperature=0.1,
    max_tokens=1000,
    max_retries=10,
    vector_store_path=f"./data/vector_index/{hash_vector_store}"
)

In [9]:
answer_with_rag = predictor_with_rag.query(query=query, similarity_top_k=2)

In [10]:
print(answer_with_rag)

To set a metric and mode in Ray Tune, you can use the `metric` and `mode` parameters when creating a `tune.Trainable` or when configuring a `tune.Tuner`.

For example, in the Function training API:

```python
def trainable(config):
    # ...
    session.report({"accuracy": accuracy})

tune.run(
    trainable,
    config=config,
    metric="accuracy",
    mode="max"
)
```

And in the Class training API:

```python
class MyTrainable(tune.Trainable):
    def step(self):
        # ...
        return {"accuracy": accuracy}

tune.run(
    MyTrainable,
    config=config,
    metric="accuracy",
    mode="max"
)
```

In both cases, the `metric` parameter specifies the name of the metric you want to optimize, and the `mode` parameter specifies whether to maximize or minimize the metric. The `mode` can be set to `"max"` or `"min"`.


## Fine-tuning embedding configuration using ray

In [19]:
import pickle
with open("data/nodes/130956594988870197.pkl", "rb") as f:
    nodes = pickle.load(f)

In [29]:
from ray import tune
from pathlib import Path

def objective(config):  # ①
    from docsrag.embedding.index import VectorStoreIndexRay, VectorStoreSpec
    from docsrag.embedding.evaluation import VectorStoreEvaluator

    import pickle
    print("path", Path(".").resolve(), Path(".").iterdir())
    with open("data/nodes/130956594988870197.pkl", "rb") as f:
        nodes = pickle.load(f)
    node_limit = 100

    embedding_vector_store_spec = VectorStoreSpec.parse_obj(
        {"embedding_model_name": config["embedding_model_name"]}
    )

    vector_store_index = VectorStoreIndexRay.build_from_spec(
        nodes=nodes[:node_limit] if node_limit else nodes,
        spec=embedding_vector_store_spec,
        num_gpus=0,
        batch_size=100,
    )

    evaluator = VectorStoreEvaluator(
        vector_store_index=vector_store_index,
        evaluation_dataset_name="1618109849114044135",
        top_ks=[3],
    )
    scores = evaluator.run()

    return {"score": scores.scores["recall@k"].squeeze()}


search_space = {  # ②
    "embedding_model_name": tune.choice(
        [
            "BAAI/bge-small-en",
            # "BAAI/bge-base-en",
        ],
    ),
    "top_k": tune.choice([1, 2, 3, 4]),
}

tuner = tune.Tuner(objective, param_space=search_space)  # ③

results = tuner.fit()
print(results.get_best_result(metric="score", mode="max").config)

0,1
Current time:,2023-08-31 09:13:55
Running for:,00:00:07.52
Memory:,31.4/64.0 GiB

Trial name,# failures,error file
objective_38782_00000,1,"/Users/marwansarieddine/ray_results/objective_2023-08-31_09-13-47/objective_38782_00000_0_embedding_model_name=BAAI_bge-small-en,top_k=3_2023-08-31_09-13-47/error.txt"

Trial name,status,loc,embedding_model_name,top_k
objective_38782_00000,ERROR,127.0.0.1:17250,BAAI/bge-small-en,3


2023-08-31 09:13:55,482	ERROR trial_runner.py:1450 -- Trial objective_38782_00000: Error happened when processing _ExecutorEventType.TRAINING_RESULT.
ray.exceptions.RayTaskError(FileNotFoundError): [36mray::ImplicitFunc.train()[39m (pid=17250, ip=127.0.0.1, repr=objective)
  File "/Users/marwansarieddine/.pyenv/versions/3.9.12/envs/raybot-evaluator-py39/lib/python3.9/site-packages/ray/tune/trainable/trainable.py", line 384, in train
    raise skipped from exception_cause(skipped)
  File "/Users/marwansarieddine/.pyenv/versions/3.9.12/envs/raybot-evaluator-py39/lib/python3.9/site-packages/ray/tune/trainable/function_trainable.py", line 336, in entrypoint
    return self._trainable_func(
  File "/Users/marwansarieddine/.pyenv/versions/3.9.12/envs/raybot-evaluator-py39/lib/python3.9/site-packages/ray/tune/trainable/function_trainable.py", line 653, in _trainable_func
    output = fn()
  File "/var/folders/b0/1qxcgh35671cbwgrdyyhzf5c0000gn/T/ipykernel_16043/135745827.py", line 10, in obj

[2m[36m(objective pid=17250)[0m path /Users/marwansarieddine/ray_results/objective_2023-08-31_09-13-47/objective_38782_00000_0_embedding_model_name=BAAI_bge-small-en,top_k=3_2023-08-31_09-13-47 <generator object Path.iterdir at 0x1bbab4270>


Trial name,date,hostname,node_ip,pid,timestamp,trial_id
objective_38782_00000,2023-08-31_09-13-50,marwans-mbp.lan,127.0.0.1,17250,1693487630,38782_00000


2023-08-31 09:13:55,496	ERROR tune.py:941 -- Trials did not complete: [objective_38782_00000]
2023-08-31 09:13:55,497	INFO tune.py:945 -- Total run time: 7.53 seconds (7.52 seconds for the tuning loop).


{'embedding_model_name': 'BAAI/bge-small-en', 'top_k': 3}


In [25]:
    from docsrag.embedding.index import VectorStoreIndexRay, VectorStoreSpec
    from docsrag.embedding.evaluation import VectorStoreEvaluator

embedding_vector_store_spec = VectorStoreSpec.parse_obj(
    {"embedding_model_name": "BAAI/bge-base-en"}
)

vector_store_index = VectorStoreIndexRay.build_from_spec(
    nodes=nodes[:node_limit] if node_limit else nodes,
    # spec=embedding_vector_store_spec,
    num_gpus=0,
    batch_size=100,
)

Downloading model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

KeyboardInterrupt: 