# Evaluation demo
A big part of getting a RAG application into production is evaluating it's responses. LlamaIndex provides a lot of modules for this.
To learn more about this concept, read the following documentation: https://docs.llamaindex.ai/en/stable/module_guides/evaluating/usage_pattern/

In [None]:
from llama_index.indices.managed.colbert import ColbertIndex
from llama_index.llms.groq import Groq

## Load env variables

In [1]:
%load_ext dotenv
%dotenv

cannot find .env file


## Setup LLM
The first step is setting up the LLM's we'll be using. For generating our responses we'll be using llama3-8b and for evaluating the responses we'll be using the mixtral 8x7b model.

In [None]:
from os import getenv


response_llm = Groq(
    model="llama3-8b-8192",
    api_key=getenv("GROQ_API_KEY")
)

eval_llm = Groq(
    model="mixtral-8x7b-32768",
    api_key=getenv("GROQ_API_KEY")
)

## Loading the index
We'll be loading the index generated by Colbert in the `create_index` notebook.

In [None]:
index = ColbertIndex.load_from_disk("./index", "factoids")

In [None]:
from llama_index.core.evaluation import FaithfulnessEvaluator

evaluator = FaithfulnessEvaluator(llm=eval_llm)
query_engine = index.as_query_engine(llm=response_llm)

async def execute_and_eval_query(query) -> tuple[any, bool]:
    response = query_engine.query(query)
    evaluation = await evaluator.aevaluate_response(response=response)

    if evaluation.passing:
        return response, True
    
    return response, False

In [None]:
query = "What file do I need to edit to configure Prometheus?"

response, is_faithful = await execute_and_eval_query(query)
print(response)
print(is_faithful)

query = "Who is the maintainer of the Prometheus project?"
response, is_faithful = await execute_and_eval_query(query)
print(response)
print(is_faithful)