# Evaluation

- LLMs are a black box
- It's hard to tell if the application works correctly for all inputs by eye-balling the outfor for a few
- Regression
- Metrics based development
- Equilavent to testing

### RAGAS Framework

- https://docs.ragas.io/en/latest/index.html

### Evaluating our application

In [2]:
from datasets import load_dataset
from llama_index.core import Document, VectorStoreIndex, Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.ollama import Ollama

In [3]:
ds = load_dataset("Amod/mental_health_counseling_conversations", split='train')

In [8]:
# bge-base embedding model
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")

# ollama
Settings.llm = Ollama(model="llama3", request_timeout=360.0)

In [9]:
split = ds.train_test_split()
train = [Document(text=f"Query: {doc['Context']}\n\n Response: {doc['Response']}") for doc in split['train']]
test = [Document(text=f"Query: {doc['Context']}\n\n Response: {doc['Response']}") for doc in split['test']]

In [10]:
index = VectorStoreIndex.from_documents(train[0:10])
index

<llama_index.core.indices.vector_store.base.VectorStoreIndex at 0x3660bcc50>

In [11]:
query_engine = index.as_query_engine()
response = query_engine.query("What is it like having a panic attack?")
response

Response(response="It's like when we overgeneralize and focus on one negative emotion, and suddenly, our emotional landscape becomes consumed by anxiety. We might notice ourselves getting stuck in a cycle of thinking about worst-case scenarios or replaying conversations over and over, feeling like we're constantly on edge, even for brief moments. It can be overwhelming, making it hard to breathe or relax, leaving us feeling drained and exhausted.", source_nodes=[NodeWithScore(node=TextNode(id_='2633747e-a79c-42e3-b95e-413df16201d0', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='26789b67-4095-4adb-a11b-1ab6c69067f8', node_type=<ObjectType.DOCUMENT: '4'>, metadata={}, hash='f6e13642e3299cf557a16565791406b558dfe58d549077fd22ad8d2a45b179ca')}, text='Query: I can\'t seem to feel any emotion except anxiety, not even for myself.\n\n Response: Thank you for posting. \xa0I\'m i

In [12]:
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context

In [13]:
generator_llm = Ollama(model='llama3', request_timeout=300.0)
critic_llm = Ollama(model='llama3', request_timeout=300.0)
embeddings = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")

generator = TestsetGenerator.from_llama_index(
    generator_llm,
    critic_llm,
    embeddings
)

In [19]:
distributions = {
    simple: 1,
    multi_context: 0,
    reasoning: 0,
}

In [None]:
testset = generator.generate_with_llamaindex_docs(train[0:50], test_size=10, distributions=distributions)

embedding nodes:   0%|          | 0/100 [00:00<?, ?it/s]

In [21]:
print(testset.test_data[0])

question='Based on the given context, a question that can be fully answered from the provided text is:\n\n"What are some common concerns people may have when considering opening up to their parents about depression?' contexts=['Query: I am not sure if I am depressed. I don\'t know how to bring it up to my parents, and that makes me miserable.\n\n Response: You are not alone, many people fear opening up to family members about the topic of depression or mental illness. There are many different reason why some may fear telling their parents. The most common thoughts I hear in my office are: " My parents won\'t understand me", I may cause more problems to the family", "I am worried that something bad may happen if I tell them".\xa0If possible express your current concerns and worries to your parents. You can start the conversation with your parents by saying "I have not been feeling like myself lately, and I may want to see a counselor".\xa0I think you are doing the right thing by going o