## Exploring a local RAG system with LlamaIndex

In [None]:
from rag_pipeline.config.parameter_sets import FAST_ANSWERS, PRECISE_ANSWERS

rag_params = FAST_ANSWERS

In [None]:
# Utility Imports
import json

# LlamaIndex imports

# Alternative: Use Hugging Face Transformers for local Mistral

In [None]:
# Project imports
# Note that this duplicates the CONFIGURATION CONSTANTS section
from rag_pipeline.core.rag_system import *

## Data Storage

Below a few data directories for my application.

The `input_data_dir` is the place to store your documents.

The other data directories are for internal use only.

Note that they have to be .gitignored

(Except optionally a test input data directory.)

In [None]:
input_data_dir = "data/input_data"
index_storage_dir = "data/index_storage"
embeddings_cache_dir = "data/embeddings_cache"

In [None]:
rag = LocalRAGSystem(
    params=rag_params,
    index_storage_dir=index_storage_dir,
    embeddings_cache_dir=embeddings_cache_dir,
)

In [None]:
# TODO: Ensure this is idempotent, e.g. documents are added UNIQUELY to the index.

documents = rag.load_mixed_documents()
index = rag.create_index(documents)

In [None]:
print(len(documents))

In [None]:
# Get the number of documents in the vector store
num_docs = len(index.docstore.docs)
print(f"Number of documents in vector store: {num_docs}")

# Get a list of unique document names
doc_names = set()
for doc_id in index.docstore.docs:
    doc = index.docstore.docs[doc_id]
    if hasattr(doc, "metadata") and "file_name" in doc.metadata:
        doc_names.add(doc.metadata["file_name"])

print("\nUnique document names:")
for name in sorted(doc_names):
    print(f"- {name}")

## Example

For this use case, I added my CV, Motivational Letter, and Job Description.

After loading the documents and creating the index, I can now ask some questions, at all kinds of levels.

Ask questions to the candidate directly, ask questions for internal candidate evaluation, and suggest
candidate strategies for this opportunity.

My use case is in Dutch (Nederlands), so I ask to respond some of the questions in Dutch.

In [None]:
sources = rag.query(
    index,
    "Kun je voor een overzicht geven van de input data? Welke documenten zijn er ingeladen en wat is de inhoud?",
    top_k=len(documents),
)

In [None]:
print(json.dumps(sources, indent=2))

In [None]:
print(sources["answer"])
s = sources["sources"]
print(s[0])

In [None]:
print("\n".join(e["metadata"]["full_reference"] for e in s))

In [None]:
response = rag.query(index, "What are the names of the candidate and the organisation?")
print(response["answer"])

In [None]:
rag.query(index, "Hoe goed matcht deze kandidaat met deze motivatiebrief op deze vacature?")

In [None]:
rag.query(index, "Heb je enkele kritische vragen in het Nederlands voor deze kandidaat?")

In [None]:
rag.query(
    index,
    "Hoe kan ik als kandidaat de in de opdracht gevraagde competenties het beste beantwoorden in het Nederlands?",
)["answer"]