## Create a `QueryEngine` for retrieval augmented generation

### Set up the environemnt first

In [1]:
!pip install -q -r requirements.txt

In [2]:
from my_config import MyConfig
my_config = MyConfig()

### Setting up the persona database
We will be using personas from the dvilasuero/finepersonas-v0.1-tiny dataset. This dataset contains 5K personas that will be attending the party!
Let's load the dataset and store it as files in the data directory

In [None]:
# from datasets import load_dataset
# from pathlib import Path

# dataset = load_dataset(path="dvilasuero/finepersonas-v0.1-tiny", split="train")

# Path("data").mkdir(parents=True, exist_ok=True)
# for i, persona in enumerate(dataset):
#     with open(Path("data") / f"persona_{i}.txt", "w") as f:
#         f.write(persona["persona"])

### Loading and embedding persona documents
We will use the `SimpleDirectoryReader` to load the persona descriptions from the `data` directory. This will return a list of `Document` objects.

In [3]:
from llama_index.core import SimpleDirectoryReader

reader = SimpleDirectoryReader(input_dir="data")
documents = reader.load_data()
len(documents)



5000

Now we have a list of `Document` objects, we can use the `IngestionPipeline` to create nodes from the documents and prepare them for the `QueryEngine`.

We will use the `SentenceSplitter` to split the documents into smaller chunks and the `HuggingFaceEmbedding` (via `vLLM` possibly) to embed the chunks.

#### Create a Custom Embedding Class
We are subclassing `BaseEmbedding` from `llama_index` and override the `aget_text_embedding` method to hit my `RunPod` endpoint. Super Cool stuff.

In [9]:
## Implementation of this class is in ./my_utils.py
## The class is moved to utils, because we need to use it in the query notbook also
from my_utils import RunPodEmbedding

### Storing and indexing documents
Since we are using an ingestion pipeline, we can directly attach a vector store to the pipeline to populate it. In this case, we will use `Chroma` to store our documents. Let's run the pipeline again with the vector store attached. The `IngestionPipeline` caches the operations so this should be fast!

In [None]:
import chromadb
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.ingestion import IngestionPipeline
from llama_index.vector_stores.chroma import ChromaVectorStore

db = chromadb.PersistentClient(path="./alfred_chroma_db")
chroma_collection = db.get_or_create_collection(name="alfred")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)


# Instantiate custom class with your RunPod URL
runpod_url = f"https://{my_config.VLLM_EMBEDDING_MODEL_INFERENCE_NODE_IP}-8000.proxy.runpod.net/v1/embeddings"
embedding_model = RunPodEmbedding(endpoint_url=runpod_url)

pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(),
        embedding_model
    ],
    vector_store=vector_store,
)

nodes = list()
for batch in range(0, len(documents), 200):
    nodes.extend(await pipeline.arun(documents=documents[batch:batch+200]))
    print(f"Processed batch {batch // 200 + 1} => {batch}:{batch + 200}")
    import time; time.sleep(2) # <- Defensive guard against too many parallel requests
len(nodes)

Processed batch 1 => 0:200
Processed batch 2 => 200:400
Processed batch 3 => 400:600
Processed batch 4 => 600:800
Processed batch 5 => 800:1000
Processed batch 6 => 1000:1200
Processed batch 7 => 1200:1400
Processed batch 8 => 1400:1600
Processed batch 9 => 1600:1800
Processed batch 10 => 1800:2000
Processed batch 11 => 2000:2200
Processed batch 12 => 2200:2400
Processed batch 13 => 2400:2600
Processed batch 14 => 2600:2800
Processed batch 15 => 2800:3000
Processed batch 16 => 3000:3200
Processed batch 17 => 3200:3400
Processed batch 18 => 3400:3600
Processed batch 19 => 3600:3800
Processed batch 20 => 3800:4000
Processed batch 21 => 4000:4200
Processed batch 22 => 4200:4400
Processed batch 23 => 4400:4600
Processed batch 24 => 4600:4800
Processed batch 25 => 4800:5000


5000

We can create a `VectorStoreIndex` from the vector store and use it to query the documents by passing the vector store and embedding model to the `from_vector_store()` method.