# Example of Ragas's Context Precision and Context Recall Metrics
### Using Ragas's Metrics to Evaluate Your Retrieval Performance

**Authors**: 
- Komang Elang Surya Prawira (komang.e.s.prawira@gdplabs.id)

**Reviewers**: 
- Novan Parmonangan Simanjuntak (novan.p.simanjuntak@gdplabs.id)
- Surya Mahadi (made.r.s.mahadi@gdplabs.id)

## References
[1] Lorem ipsum. \
[2] Lorem ipsum.

## Evaluate Using LLM

In this notebook, we will explore how to evaluate the performance of our retrieval when we don't have any ground truth contexts available as references. As a result, we will leverage LLM to evaluate the retrieved contexts. Below is the data needed to perform the evaluation when using LLM:
1. Questions (List[str]): A list of questions.
2. Retrieved Contexts (List[List[str]]): Contexts retrieved for each question.
3. Ground Truth Responses (List[str]): Ground truth responses for each question.

We recommend two metrics for you to use, which are Context Precision and Context Recall.
1. **Context Precision** measures the extent to which the retrieved contexts are relevant to the given question.
2. **Context Recall** measures the extent to which the ground truth responses are reflected (mentioned) in the retrieved contexts.

As stated in each corresponding description, **Context Precision** requires `Questions` and `Retrieved Contexts`, whereas **Context Recall** requires `Ground Truth Responses` and `Retrieved Contexts`.

# Prepare Environment

Before we start, ensure you have a GitHub account with access to the GLAIR GenAI Internal SDK GitHub repository. Then, follow these steps to create a personal access token:
1. Log in to your [GitHub](https://github.com/) account.
2. Navigate to the [Personal Access Tokens](https://github.com/settings/tokens) page.
3. Select the `Generate new token` option. You can use the classic version instead of the beta version.
4. Fill in the required information, ensuring that you've checked the `repo` option to grant access to private repositories.
5. Save the newly generated token.

In [None]:
import getpass
import subprocess
import sys

def install_sdk_library() -> None:
    """Installs the `glair_genai_sdk` library from a private GitHub repository using a Personal Access Token.

    This function prompts the user to input their Personal Access Token for GitHub authentication. It then constructs
    the repository URL with the provided token and executes a subprocess to install the library via pip from the
    specified repository.

    Raises:
        subprocess.CalledProcessError: If the installation process returns a non-zero exit code.

    Note:
        The function utilizes `getpass.getpass()` to securely receive the Personal Access Token without echoing it.
    """
    token = getpass.getpass("Input Your Personal Access Token: ")
    repo_url_with_token = f"https://{token}@github.com/GDP-ADMIN/gen-ai-internal-sdk.git"
    cmd = ["pip", "install", "-e", f"git+{repo_url_with_token}#egg=glair_genai_sdk"]

    try:
        with subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT,
                              text=True, bufsize=1, universal_newlines=True) as process:
            for line in process.stdout:
                sys.stdout.write(line)

            process.wait()  # Wait for the process to complete
            if process.returncode != 0:
                raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
    except Exception as e:
        print(f"An error occurred: {e}")

install_sdk_library()

<b>Warning:</b>
After running the command above, you need to restart the runtime in Google Colab for the changes to take effect. Not doing so might lead to the newly installed libraries not being recognized.

To restart the runtime in Google Colab:
- Click on the `Runtime` menu.
- Select `Restart runtime`.

In [None]:
import os
os.environ["OPENAI_API_KEY"] = "YOUR-API-KEY"

In [None]:
from ragas import evaluate
from ragas.metrics import (
    ContextPrecision,
    ContextRecall,
)

In [None]:
from datasets import Dataset
from typing import List, Optional

def convert_to_hf_dataset(questions: List[str], retrieved_contexts: List[List[str]], 
                          ground_truth_responses: Optional[List[str]] = None) -> Dataset:
    """Converts provided data into a Hugging Face Dataset format.

    Args:
        questions (List[str]): A list of questions.
        retrieved_contexts (List[List[str]]): Contexts retrieved for each question.
        ground_truth_responses (Optional[List[str]]): Ground truth responses for each question. Defaults to None.

    Returns:
        Dataset: A Hugging Face `Dataset` object containing the organized data.

    Raises:
        ValueError: If the lengths of the provided lists are inconsistent.
    """

    # Check for consistent lengths.
    lengths = [len(questions), len(retrieved_contexts)]
    if ground_truth_responses is not None:
        lengths.append(len(ground_truth_responses))

    if len(set(lengths)) > 1:
        raise ValueError("All input lists must be of the same length.")

    data = {
        'questions': questions,
        'retrieved_contexts': retrieved_contexts
    }

    if ground_truth_responses is not None:
        data['ground_truth_responses'] = ground_truth_responses

    # Convert to Hugging Face Dataset.
    dataset = Dataset.from_dict(data)

    return dataset

## Prepared Dataset

In [None]:
# from gdplabs_gen_ai.evaluation.utility import convert_to_hf_dataset

# Define your data here before converting it into a Hugging Face's `Dataset` object.
questions = ["What is AI?", "Who is Elon Musk?"]
retrieved_contexts = [["AI is dangerous", "Artificial Intelligence is hard to master"], ["Elon Musk is rich", "CEO of SpaceX is Elon Musk"]]
ground_truth_responses = [["A field of computer science"], ["An entrepreneur and business magnate"]]

dataset = convert_to_hf_dataset(questions, retrieved_contexts, ground_truth_responses)
print(dataset)
# Dataset({
#     features: ['questions', 'retrieved_contexts', 'ground_truth_responses'],
#     num_rows: 2
# })

## Context Precision & Context Recall Evaluation

### Using Default LLM

In [None]:
context_precision = ContextPrecision(
    batch_size=10
)

context_recall = ContextRecall(
    batch_size=10
)

score = evaluate(
    dataset,
    metrics=[
        context_precision,
        context_recall,
    ],
    column_map={"question": "questions", "contexts": "retrieved_contexts", "ground_truths": "ground_truth_responses"},
)

print(score)
# evaluating with [context_precision]
# 100%|██████████| 1/1 [00:01<00:00,  1.51s/it]
# evaluating with [context_recall]
# 100%|██████████| 1/1 [00:02<00:00,  2.20s/it]
# {'context_precision': 0.2500, 'context_recall': 0.0000}

### Bring Your Own LLMs

#### OpenAI via LangChain

In [None]:
from langchain.chat_models import ChatOpenAI
from ragas.llms import LangchainLLM

gpt4 = ChatOpenAI(model_name="gpt-4")
gpt4_wrapper = LangchainLLM(llm=gpt4)

print(f"LLM used by Context Precision before customization: {context_precision.llm }")
# LLM used by Context Precision before customization: OpenAI(model='gpt-3.5-turbo-16k', _api_key_env_var='OPENAI_API_KEY')

context_precision.llm = gpt4_wrapper
context_recall.llm = gpt4_wrapper

print(f"LLM used by Context Precision after customization: {context_precision.llm }")
# LLM used by Context Precision after customization: <ragas.llms.langchain.LangchainLLM object at 0x7f8acf4dca00>

score_gpt4 = evaluate(
    dataset,
    metrics=[
        context_precision,
        context_recall,
    ],
    column_map={"question": "questions", "contexts": "retrieved_contexts", "ground_truths": "ground_truth_responses"},
)

print(score_gpt4)

#### Text Generation Inference (TGI)

In [None]:
from langchain.llms import HuggingFaceTextGenInference
from ragas.llms import LangchainLLM

tgi = HuggingFaceTextGenInference(
    inference_server_url="http://localhost:8010/",
    max_new_tokens=512,
    top_k=10,
    top_p=0.95,
    typical_p=0.95,
    temperature=0.01,
    repetition_penalty=1.03,
)
tgi_wrapper = LangchainLLM(llm=tgi)

print(f"LLM used by Context Precision before customization: {context_precision.llm }")
# LLM used by Context Precision before customization: OpenAI(model='gpt-3.5-turbo-16k', _api_key_env_var='OPENAI_API_KEY')

context_precision.llm = tgi_wrapper
context_recall.llm = tgi_wrapper

print(f"LLM used by Context Precision after customization: {context_precision.llm }")
# LLM used by Context Precision after customization: <ragas.llms.langchain.LangchainLLM object at 0x7f8acf4dca00>

score_tgi = evaluate(
    dataset,
    metrics=[
        context_precision,
        context_recall,
    ],
)

print(score_tgi)