<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/docs/examples/evaluation/guideline_eval.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Guideline Evaluator

This notebook shows how to use `GuidelineEvaluator` to evaluate a question answer system given user specified guidelines.

If you're opening this Notebook on colab, you will probably need to install LlamaIndex ðŸ¦™.

In [1]:
# %pip install llama-index-llms-openai

In [2]:
# !pip install llama-index

In [3]:
from llama_index.core.evaluation import GuidelineEvaluator
# from llama_index.llms.openai import OpenAI
from jet.llm.ollama.base import Ollama

from jet.llm.ollama.base import initialize_ollama_settings
initialize_ollama_settings()

# Needed for running async functions in Jupyter Notebook
import nest_asyncio

nest_asyncio.apply()

In [4]:
GUIDELINES = [
    "The response should fully answer the query.",
    "The response should avoid being vague or ambiguous.",
    (
        "The response should be specific and use statistics or numbers when"
        " possible."
    ),
]

In [5]:
llm = Ollama(temperature=0, model="llama3.1")

evaluators = [
    GuidelineEvaluator(llm=llm, guidelines=guideline)
    for guideline in GUIDELINES
]

In [6]:
sample_data = {
    "query": "Tell me about global warming.",
    "contexts": [
        (
            "Global warming refers to the long-term increase in Earth's"
            " average surface temperature due to human activities such as the"
            " burning of fossil fuels and deforestation."
        ),
        (
            "It is a major environmental issue with consequences such as"
            " rising sea levels, extreme weather events, and disruptions to"
            " ecosystems."
        ),
        (
            "Efforts to combat global warming include reducing carbon"
            " emissions, transitioning to renewable energy sources, and"
            " promoting sustainable practices."
        ),
    ],
    "response": (
        "Global warming is a critical environmental issue caused by human"
        " activities that lead to a rise in Earth's temperature. It has"
        " various adverse effects on the planet."
    ),
}

In [7]:
for guideline, evaluator in zip(GUIDELINES, evaluators):
    eval_result = evaluator.evaluate(
        query=sample_data["query"],
        contexts=sample_data["contexts"],
        response=sample_data["response"],
    )
    print("=====")
    print(f"Guideline: {guideline}")
    print(f"Pass: {eval_result.passing}")
    print(f"Feedback: {eval_result.feedback}")

=====
Guideline: The response should fully answer the query.
Pass: False
Feedback: The response provides a brief overview of global warming, but it lacks specific details and examples to fully answer the query. It would be helpful to provide more information about the causes, effects, and potential solutions to global warming.
=====
Guideline: The response should avoid being vague or ambiguous.
Pass: False
Feedback: The response is too vague and does not provide specific information about global warming. It would be helpful to mention some of the key factors contributing to global warming, such as greenhouse gas emissions from fossil fuels, deforestation, and industrial agriculture, as well as the potential consequences, like rising sea levels, more frequent natural disasters, and altered ecosystems.
=====
Guideline: The response should be specific and use statistics or numbers when possible.
Pass: False
Feedback: The response is too general and lacks specific details. While it mention