# RAG Evaluation
_Authored by: [Aymeric Roucher](https://huggingface.co/m-ric)_

This notebook demonstrates different prompting techniques to get the most out of your LLM.

In [20]:
!pip install python-dotenv -q


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [21]:
from dotenv import load_dotenv

load_dotenv(override=True)

True

### LLM-as-a-judge

Small integer scales, + additive versions.

Good simple test = introduce a certain nuumber of random errors in sentences?

In [41]:
# Or just use this dataset: BAAI/JudgeLM-100K, filter by the same reviewer_id, check if you can get multiple reviewers to agree on the same question ids
# Judge has LLM genrated scores. here we want human preference directly. Let's use pairwise comparisons.
from datasets import load_dataset

ratings = load_dataset("lmsys/mt_bench_human_judgments")

In [None]:
ratings

### Constrained JSON outputs

In [22]:
RELEVANT_CONTEXT = """
Document:
In `transformers`, we simply set the parameter `num_return_sequences` to
the number of highest scoring beams that should be returned. Make sure
though that `num_return_sequences <= num_beams`!



``` python
# set return_num_sequences > 1
beam_outputs = model.generate(
    **model_inputs,
    max_new_tokens=40,
    num_beams=5,
    no_repeat_ngram_size=2,
    num_return_sequences=5,
    early_stopping=True
)

# now we have 3 output sequences
print("Output:\n" + 100 * '-')
for i, beam_output in enumerate(beam_outputs):
  print("{}: {}".format(i, tokenizer.decode(beam_output, skip_special_tokens=True)))
```

Document:

The weather is really nice in Paris today.
To define a stop sequence in Transformers, you should pass the stop_sequence argument in your pipeline or model.

"""

In [23]:
RAG_PROMPT_TEMPLATE_BASE = """
Answer the user query based on the source documents.

Here are the source documents: {context}

Here is the user question: {user_query}."""

RAG_PROMPT_TEMPLATE_JSON = (
    RAG_PROMPT_TEMPLATE_BASE
    + """
You should provide your answer as a JSON file, and also provide all relevant short snippets from the documents on which you directly based your answer.
These snippets should be very short, a few words at most, not whole sentences!

Preface your answer with 'Answer:\n', as follows:

Answer:
{{
  'answer': your_answer,
  'source_snippets': ['snippet_1', 'snippet_2', ...]
}}

Now begin!
"""
)

In [24]:
USER_QUERY = "How can I define a stop sequence in Transformers?"

In [25]:
from huggingface_hub import InferenceClient
import json

client = InferenceClient(model="HuggingFaceH4/zephyr-7b-beta")


def call_llm(query: str, stop=None) -> str:
    return client.post(json={"inputs": query, "parameters": {"stop": stop}})

In [27]:
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(model="gpt-4-0125-preview")


def call_llm(query: str, stop=None) -> str:
    return llm.invoke(query).content

In [30]:
answer = call_llm(
    RAG_PROMPT_TEMPLATE_JSON.format(context=RELEVANT_CONTEXT, user_query=USER_QUERY)
)

In [31]:
from ast import literal_eval


def parse_answer(answer):
    answer = answer.replace("Answer:\n", "")  # remove prefix
    answer = literal_eval(answer)
    if not isinstance(
        answer["source_snippets"], list
    ):  # parse snippets if they are not already
        answer["source_snippets"] = (
            answer["source_snippets"].strip()[1:-1].replace("'", "").split(", ")
        )
    return answer


parsed_answer = parse_answer(answer)
print(parsed_answer)

{'answer': 'To define a stop sequence in Transformers, you should pass the stop_sequence argument in your pipeline or model.', 'source_snippets': ['pass the stop_sequence argument']}


In [34]:
def turn_red(s):
    return "\x1b[1;31m" + s + "\x1b[0m"


def print_results(answer, source_text, highlight_snippets):
    print(answer)
    print("\n\n", "=" * 10 + " Source documents " + "=" * 10)
    for snippet in highlight_snippets:
        source_text = source_text.replace(snippet, turn_red(snippet))
    print(source_text)


print_results(
    parsed_answer["answer"], RELEVANT_CONTEXT, parsed_answer["source_snippets"]
)

To define a stop sequence in Transformers, you should pass the stop_sequence argument in your pipeline or model.



Document:
In `transformers`, we simply set the parameter `num_return_sequences` to
the number of highest scoring beams that should be returned. Make sure
though that `num_return_sequences <= num_beams`!



``` python
# set return_num_sequences > 1
beam_outputs = model.generate(
    **model_inputs,
    max_new_tokens=40,
    num_beams=5,
    no_repeat_ngram_size=2,
    num_return_sequences=5,
    early_stopping=True
)

# now we have 3 output sequences
print("Output:
" + 100 * '-')
for i, beam_output in enumerate(beam_outputs):
  print("{}: {}".format(i, tokenizer.decode(beam_output, skip_special_tokens=True)))
```

Document:

The weather is really nice in Paris today.
To define a stop sequence in Transformers, you should [1;31mpass the stop_sequence argument[0m in your pipeline or model.




### Classification

Demo of "do not ask too much at once"