# Evaluation

- LLMs are a black box
- It's hard to tell if the application works correctly for all inputs by eye-balling the outfor for a few
- Regression
- Metrics based development
- Equilavent to testing

### RAGAS Framework

- https://docs.ragas.io/en/latest/index.html

### Evaluating our application

In [69]:
from datasets import load_dataset
from llama_index.core import Document, VectorStoreIndex, Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.ollama import Ollama

In [73]:
ds = load_dataset("Amod/mental_health_counseling_conversations", split='train')

In [71]:
# bge-base embedding model
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")

# ollama
Settings.llm = Ollama(model="llama3", request_timeout=360.0)

In [74]:
split = ds.train_test_split()
train = [Document(text=f"Query: {doc['Context']}\n\n Response: {doc['Response']}") for doc in split['train']]
test = [Document(text=f"Query: {doc['Context']}\n\n Response: {doc['Response']}") for doc in split['test']]

In [75]:
index = VectorStoreIndex.from_documents(train[0:10])
index

<llama_index.core.indices.vector_store.base.VectorStoreIndex at 0x36611cc50>

In [76]:
query_engine = index.as_query_engine()
response = query_engine.query("What is it like having a panic attack?")
response

Response(response="It's possible that you're experiencing anxiety as an overwhelming sensation, almost taking over your emotions and making it hard to feel anything else.", source_nodes=[NodeWithScore(node=TextNode(id_='8eb5de42-ad96-42e0-8dbf-d3e7af92ef8d', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='6a813b43-0aa0-438c-b0e6-ff39969a4207', node_type=<ObjectType.DOCUMENT: '4'>, metadata={}, hash='f0ddf91d4897db22364a62f5353d9edb308cc7091cd8344c310b5084bd325958')}, text='Query: I can\'t seem to feel any emotion except anxiety, not even for myself.\n\n Response: Empathy usually falls on a spectrum, meaning that some people show more than others. Empathy is the ability to look at the world through someone else\'s eyes or "walk a mile in their shoes." There could be some people in your life for whom empathy is easier to feel and those for whom you have no idea what they a

In [12]:
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context

In [13]:
generator_llm = Ollama(model='llama3', request_timeout=300.0)
critic_llm = Ollama(model='llama3', request_timeout=300.0)
embeddings = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")

generator = TestsetGenerator.from_llama_index(
    generator_llm,
    critic_llm,
    embeddings
)

In [19]:
distributions = {
    simple: 1,
    multi_context: 0,
    reasoning: 0,
}

In [61]:
testset = generator.generate_with_llamaindex_docs(train[0:10], test_size=2, distributions=distributions)

embedding nodes:   0%|          | 0/20 [00:00<?, ?it/s]

Filename and doc_id are the same for all nodes.


Generating:   0%|          | 0/2 [00:00<?, ?it/s]

In [62]:
print(testset.test_data[0])

question='Here is a question that can be fully answered from the given context:\n\n"What are some strategies that I should consider learning to manage my anxiety attacks while on Xanax medication?"\n\nThis question is formed using the topic "Psychologist consultation" and is answerable based on the provided context, which suggests considering alternative coping mechanisms in addition to medication.' contexts=["Query: I’ve been on 0.5 mg of Xanax twice a day for the past month. It hasn't been helping me at all, but when I take 1 mg during a big anxiety attack, it calms me down. I was wondering how I can ask my psychologist to up the dose to 1 mg twice a day without her thinking I'm abusing them. I just have very big anxiety attacks. Should I stay on the 0.5mg and deal with the attacks or should I ask to up the dose? I'm afraid she will take me off them and put me on something else.\n\n Response: Staying on the lower dose may give you more room to learn strategies for coping with your an

In [63]:
print("Question:")
print(testset.test_data[0].question)
print("\nResponse:")
print(query_engine.query(testset.test_data[0].question))

Question:
Here is a question that can be fully answered from the given context:

"What are some strategies that I should consider learning to manage my anxiety attacks while on Xanax medication?"

This question is formed using the topic "Psychologist consultation" and is answerable based on the provided context, which suggests considering alternative coping mechanisms in addition to medication.

Response:
Some strategies you may want to consider learning to manage your anxiety attacks include replacing self-defeating thoughts with ones that work better for you, practicing mindfulness, or relaxation techniques. These tools can help keep your anxiety in a manageable range and complement the benefits of your Xanax medication.


In [64]:
from ragas.metrics import (
    faithfulness,
    answer_relevancy,
    context_precision,
    context_recall,
)
from ragas.metrics.critique import harmfulness

metrics = [
    faithfulness,
    answer_relevancy,
    context_precision,
    context_recall,
    harmfulness,
]

In [65]:
ds = testset.to_dataset()

ds_dict = ds.to_dict()
ds_dict["question"]
ds_dict["ground_truth"]

["Staying on the lower dose may give you more room to learn strategies for coping with your anxiety. Medications are so helpful, and needed at times, but it's also important to have a variety of tools you use to manage your responses to stress. If you are not already seeing a therapist, consider finding one who can help you learn some effective strategies, like replacing self-defeating thoughts with ones that work better for you, or mindfulness, relaxation, or other tools to keep your anxiety in the manageable range!",
 "Some strategies for coping with my mom's bossy behavior and negotiating a more respectful dynamic in our relationship include seeing her responses as habits rather than a reflection of how she feels about me, having a calm tone, and negotiating a new dynamic through a contract. This can be achieved by setting clear expectations, boundaries, and communication channels with Mom. Ultimately, it is crucial to approach the situation with an open mind, empathy, and understan

In [67]:
from ragas.integrations.llama_index import evaluate

evaluator_llm = Ollama(model='llama3', request_timeout=300.0)

result = evaluate(
    query_engine=query_engine,
    metrics=metrics,
    dataset=ds_dict,
    llm=evaluator_llm,
    embeddings=embeddings,
)

Running Query Engine:   0%|          | 0/2 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/10 [00:00<?, ?it/s]

n values greater than 1 not support for LlamaIndex LLMs
n values greater than 1 not support for LlamaIndex LLMs


TimeoutError: 