# Example of Evaluate Relevance Between Query and Retrieved Contexts

**Authors**: 
- Komang Elang Surya Prawira (komang.e.s.prawira@gdplabs.id)

**Reviewers**: 
- Novan Parmonangan Simanjuntak (novan.p.simanjuntak@gdplabs.id)
- Surya Mahadi (made.r.s.mahadi@gdplabs.id)

## References
[1] [GDP Labs GenAI SDK - Evaluate Relevance Between Query and Retrieved Contexts](#) \
[2] [Ragas - Evaluation](https://github.com/explodinggradients/ragas/blob/main/src/ragas/evaluation.py) \
[3] [Ragas - Context Precision](https://github.com/explodinggradients/ragas/blob/main/src/ragas/metrics/_context_precision.py) \
[4] [LangChain - OpenAI](https://python.langchain.com/docs/integrations/chat/openai)

## Description

In this notebook, we will explore how to evaluate the performance of our retrieval using the retrieved contexts and ground truth responses as references. We will leverage LLM to evaluate the retrieved contexts. Below is the data needed to perform this evaluation:
1. Question: Query used to get the retrieved contexts.
2. Retrieved Contexts: Contexts retrieved for each question.

We utilize two metrics each from LlamaIndex and Ragas to calculate the score:
1. **Context Precision** measures the extent to which the retrieved contexts are relevant to the given question.

# Prepare Environment

Before we start, ensure you have a GitHub account with access to the GDP Labs GenAI SDK GitHub repository. Then, follow these steps to create a personal access token:
1. Log in to your [GitHub](https://github.com/) account.
2. Navigate to the [Personal Access Tokens](https://github.com/settings/tokens) page.
3. Select the `Generate new token` option. You can use the classic version instead of the beta version.
4. Fill in the required information, ensuring that you've checked the `repo` option to grant access to private repositories.
5. Save the newly generated token.

In [None]:
import getpass
import subprocess
import sys

def install_sdk_library() -> None:
    """Installs the `gdplabs_gen_ai` library from a private GitHub repository using a Personal Access Token.

    This function prompts the user to input their Personal Access Token for GitHub authentication. It then constructs
    the repository URL with the provided token and executes a subprocess to install the library via pip from the
    specified repository.

    Raises:
        subprocess.CalledProcessError: If the installation process returns a non-zero exit code.

    Note:
        The function utilizes `getpass.getpass()` to securely receive the Personal Access Token without echoing it.
    """
    token = getpass.getpass("Input Your Personal Access Token: ")
    repo_url_with_token = f"https://{token}@github.com/GDP-ADMIN/gen-ai-internal.git@f/retrieval_evaluator"
    cmd = ["pip", "install", "-e", f"git+{repo_url_with_token}#egg=gdplabs_gen_ai[eval]"]

    try:
        with subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT,
                              text=True, bufsize=1, universal_newlines=True) as process:
            for line in process.stdout:
                sys.stdout.write(line)

            process.wait()  # Wait for the process to complete.
            if process.returncode != 0:
                raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
    except Exception as e:
        print(f"An error occurred: {e}.")

install_sdk_library()

<b>Warning:</b>
After running the command above, you need to restart the runtime in Google Colab for the changes to take effect. Not doing so might lead to the newly installed libraries not being recognized.

To restart the runtime in Google Colab:
- Click on the `Runtime` menu.
- Select `Restart runtime`.

Once you have completed the previous step, you are ready to start the evaluation.

# Context Precision Evaluation

In [2]:
from langchain.chat_models import ChatOpenAI
from ragas.llms import LangchainLLM

from gdplabs_gen_ai.evaluation import evaluate, ContextPrecision
from gdplabs_gen_ai.evaluation.utility import convert_to_hf_dataset

## Prepare Data
You need to prepare your data in the following format.

In [3]:
# Define your data here before converting it into a Hugging Face's `Dataset` object.
retrieved_contexts = [
    ["Today AI is used everywhere.", "AI was first developed on 1970, AI stands for Artificial Intelligence."],
    ["Toyota is a car factory that success in Japan.", "Today lot of people use car as their main transportation."]
]
questions = ["What is AI?", "What is a car?"]

dataset = convert_to_hf_dataset(retrieved_contexts, questions=questions)
print(dataset)

Dataset({
    features: ['retrieved_contexts', 'questions'],
    num_rows: 2
})


## Set Up LLM and Evaluator
Next, you need to define the LLM. In this example, we will use `GPT-4` as the LLM. Remember to put your `OPENAI_API_KEY` into the environment variables, you can use `os.environ` function.

In [4]:
gpt4 = ChatOpenAI(model_name="gpt-4")
gpt4_wrapper = LangchainLLM(llm=gpt4)

context_precision = ContextPrecision(
    batch_size=10
)
context_precision.llm = gpt4_wrapper

## Calculate the Score
Finally, you can calculate the `ContextPrecision` score using the following code.

In [5]:
score_gpt4 = evaluate(
    dataset,
    metrics=[context_precision],
    column_map={"contexts": "retrieved_contexts", "question": "questions"},
)

print(score_gpt4)

evaluating with [context_precision]


100%|██████████| 1/1 [00:16<00:00, 16.86s/it]


{'context_precision': 0.2500}
