# Catbot - Miao

This notebook 
* Creates a database using ChromaDB
* Creates a QA RAG to query the database
* Generates a synthetic dataset using Ragas and a custom prompt
* Evaluation the responses obtained from the RAG using Ragas metrics

___
Created: Sept 2025

In [1]:
# Database
from catbot.database.downloader import download_and_chunk_wikipedia_articles
from catbot.database.embedding import create_data_base

# RAG
from chromadb import PersistentClient
from catbot.database.embedding import embedding_function
from catbot.rag.basic_rag import BasicRAG

# Synthetic data and evaluation
from ragas.testset import TestsetGenerator
from catbot.utils import get_evaluation_models
from ragas.dataset_schema import SingleTurnSample 
from ragas.metrics import (
    AnswerCorrectness,
    AnswerSimilarity,
    Faithfulness,
    LLMContextRecall,
    ResponseRelevancy,
    SemanticSimilarity,
)
from ragas.testset.synthesizers import default_query_distribution

generator_llm, generator_embeddings = get_evaluation_models()


  from .autonotebook import tqdm as notebook_tqdm


## Create or load the database
Download the articles under 'Cat' and chunk them according to their sections. Then, embedd the chunks and save them as a chroma collection.

In [2]:
articles = download_and_chunk_wikipedia_articles(['Cat'])

try:
    create_data_base(articles)
    print("Database created successfully.")
except:
    client = PersistentClient(path='catbot/database/chroma')
    collection = client.get_collection("its_all_about_cats", embedding_function=embedding_function(model_name="text-embedding-3-small"))
    print("Database already exists, loaded existing database.")

Database already exists, loaded existing database.


## Set up the QA RAG
Load the locally stored database into a chroma collection. Provide it to the basic RAG - done. You can now query the rag.

In [3]:
rag = BasicRAG(collection)

question = "What is the history of cats?"
response = rag.respond(question)

print("Question: ", question)
print("Response: ", response['response'])

context_titles = [context['metadata']['title'] for context in response['sources']]
print("Contexts:")
for title in context_titles:
    print("->", title)

Question:  What is the history of cats?
Response:  Cats have been domesticated for nearly 10,000 years. The oldest evidence of cats kept as pets is from the Mediterranean island of Cyprus, around 7500 BC. In the past, mostly in Egypt, people kept cats because they hunted and ate mice and rats. Ancient Egyptians worshipped cats as gods and often mummified them so they could be with their owners "for all of eternity". Cats started becoming pets during the time of the ancient Egyptians. Today, people often keep cats as pets, and some domestic cats live without care from people as feral or stray cats.
Contexts:
-> Cat - History
-> Cat - Introduction


## Generate Synthetic data

## Using Ragas

In [7]:
generator = TestsetGenerator(llm=generator_llm, embedding_model=generator_embeddings)
query_distribution = default_query_distribution(llm=generator_llm)

synthetic_data = generator.generate_with_langchain_docs(articles, testset_size=5, query_distribution=[query_distribution[1]],)

Applying CustomNodeFilter:   0%|          | 0/12 [00:00<?, ?it/s]       Node 982d0455-a307-4b03-a8e2-918c9292fe2a does not have a summary. Skipping filtering.
Node 2bce5b94-08d0-4889-99d5-92fe3d825b10 does not have a summary. Skipping filtering.
Node 3ab156dc-9028-4c8a-8974-7be0542b7be2 does not have a summary. Skipping filtering.
Generating personas: 100%|██████████| 3/3 [00:01<00:00,  2.25it/s]                                           
Generating Scenarios: 100%|██████████| 1/1 [00:04<00:00,  4.93s/it]
Generating Samples: 100%|██████████| 2/2 [00:03<00:00,  1.89s/it]


In [None]:
synthetic_data.samples

[TestsetSample(eval_sample=SingleTurnSample(user_input='How do the effects of spaying and neutering influence cat behaviour, particularly in terms of activity levels and lifespan, and how does this relate to their natural hunting instincts?', retrieved_contexts=None, reference_contexts=['<1-hop>\n\nCat - Health\n\nIn 2023, pet cats lived for, on average, 13 years. This has increased from seven years in the 1980s. Creme Puff, the oldest cat that ever lived, died at the age of 38.\nCats that roam outside will get fleas at some time. Cat fleas will not live on people, but fleas will not hesitate to bite anyone nearby. Vets have a good product which they put just behind the cat\'s upper neck, where it can\'t be licked off.\nCats kept indoors most of the time are kept from danger from the outside. That is obvious, but its own nature is be free to wander. \nHouse cats can become overweight through lack of exercise and over-feeding. When they get spayed or neutered ("fixed"), they tend to exe

The `default_query_distribution` object allows for a blend of different synthetic data generation types: [0] gives `SingleHopSpecificQuerySynthesizer`, [1] gives `MultiHopAbstractQuerySynthesizer` and [2] gives `MultiHopAbstractQuerySynthesizer`. 

In [11]:
query_distribution[1]

(MultiHopAbstractQuerySynthesizer(name='multi_hop_abstract_query_synthesizer', llm=LangchainLLMWrapper(langchain_llm=ChatOpenAI(...)), generate_query_reference_prompt=QueryAnswerGenerationPrompt(instruction=Generate a multi-hop query and answer based on the specified conditions (persona, themes, style, length) and the provided context. The themes represent a set of phrases either extracted or generated from the context, which highlight the suitability of the selected context for multi-hop query creation. Ensure the query explicitly incorporates these themes.### Instructions:
 1. **Generate a Multi-Hop Query**: Use the provided context segments and themes to form a query that requires combining information from multiple segments (e.g., `<1-hop>` and `<2-hop>`). Ensure the query explicitly incorporates one or more themes and reflects their relevance to the context.
 2. **Generate an Answer**: Use only the content from the provided context to create a detailed and faithful answer to the q

## Using a custom prompt
The prompt below can be used for custom synthetic data generation.


In [None]:
prompt = """
Given the following documentation page, generate one clear, specific question that can 
be answered using information found only on this page.

Guidelines:
- Keep the question focused and concise (no more than 25 words).
- Make it practical—ask about concepts, actions, or procedures as a real user would.
- Avoid asking for lists of steps or comprehensive summaries.
- Ask as if you needed a precise answer for a specific task, not a generic explanation.
- Avoid overly broad, complex, or multi-part questions.
- Do not ask for details not explicitly stated in the document.
- Don't use the question you find in the title, come up with something new
Example of a good question:
- How do I reconcile Mollie transactions with payments and invoices?

Documentation Page:
{context}

Generated question: 
"""

## Evaluation with Ragas metrics
To showcase the evaluation metrics, we are going to use a single question that was generated above

In [27]:
question = synthetic_data.samples[0].eval_sample.user_input
reference = synthetic_data.samples[0].eval_sample.reference

response_object = rag.respond(question)
response = response_object['response']
retrieved_contexts = [context['document'] for context in response_object['sources']]

print("Question: ", question)
print("Reference: ", reference)
print()
print("Response: ", response)
print("Contexts: ", retrieved_contexts)


Question:  How does the unique anatomy of cats, such as their extra lumbar and thoracic vertebrae and free-floating clavicle bones, contribute to their flexibility and hunting abilities, and how does keeping cats indoors impact their safety and health?
Reference:  Cats have an anatomy similar to other members of the genus Felis, including extra lumbar and thoracic vertebrae, which contribute to their exceptional flexibility. Additionally, their front paws are attached to the shoulder by free-floating clavicle bones, allowing them to pass their bodies through any space their heads can fit into. This flexibility and precise walking gait, including a unique pacing gait, enhance their mobility and hunting skills, especially for catching small rodents, which is reflected in their narrowly spaced canine teeth adapted for this prey. However, cats kept indoors most of the time are protected from dangers such as fleas, which outdoor cats commonly encounter. Indoor cats are safer from external t

### Retrieval: Context Recall 

In [22]:
sample = SingleTurnSample(
    user_input=question,
    reference = reference,
    retrieved_contexts=retrieved_contexts,
)

context_recall = LLMContextRecall(llm=generator_llm)
await context_recall.single_turn_ascore(sample)

0.5

## Generation: Faithfulness and Correctness

### Faithfulness
Evaluate the response based on the retrieved context, requires question, response and context.

In [23]:
sample = SingleTurnSample(
    user_input=question,
    response=response,
    retrieved_contexts=retrieved_contexts,
)
scorer = Faithfulness(llm=generator_llm)

print("Question: ", sample.user_input)
print("Response: ", sample.response)
print("Contexts: ", len(sample.retrieved_contexts), " contexts")
print("Faithfulness score: ", await scorer.single_turn_ascore(sample))


Question:  How does the unique anatomy of cats, such as their extra lumbar and thoracic vertebrae and free-floating clavicle bones, contribute to their flexibility and hunting abilities, and how does keeping cats indoors impact their safety and health?
Response:  Cats' unique anatomy, including extra lumbar and thoracic vertebrae, contributes to their high flexibility. This flexibility allows them to move easily and fit their bodies through any space into which they can fit their heads. Additionally, their front paws are attached to the shoulder by free-floating clavicle bones, which further aid in their ability to pass through tight spaces. These anatomical features support cats' hunting abilities by enabling precise and agile movements necessary for catching small prey like rodents.

The context does not provide information about how keeping cats indoors impacts their safety and health.

Meow.
Contexts:  1  contexts
Faithfulness score:  0.7


### Answer Correctness
Evaluate the response based on the ground truth, requires question, response and context and ground truth.

In [28]:
sample = SingleTurnSample(
    user_input=question,
    response=response,
    reference=reference,
)

answer_similarity = AnswerSimilarity(embeddings = generator_embeddings)
scorer = AnswerCorrectness(llm=generator_llm, answer_similarity=answer_similarity, weights = [0,1])

print("AnswerCorrectness score: ", await scorer.single_turn_ascore(sample))

AnswerCorrectness score:  0.8452824326211854


Answer Correctness needs the semantic similarity module, it can be explicitely called as below

In [26]:
sample = SingleTurnSample(
    response='what is the latin name for cat?',
    reference  = 'Translate "Felis catus"'
)

scorer = SemanticSimilarity(embeddings=generator_embeddings)
print("Semantic Similarity: ", await scorer.single_turn_ascore(sample))

Semantic Similarity:  0.5873730040920158


## End-to-End: Response Relevancy
Generates a set of questions from the response, embedds them, calculates cosine similarity to the user question and averages.

In [None]:
sample = SingleTurnSample(
    user_input=question,
    response = response['response'],
    )

scorer = ResponseRelevancy(llm=generator_llm, embeddings=generator_embeddings)

print("Question: ", sample.user_input)
print("Response: ", sample.response)
print("Response Relevancy: ", await scorer.single_turn_ascore(sample))

np.float64(0.7658875076572474)