# Example of LlamaIndex Faithfullness Evaluation

**Authors**:
- Surya Mahadi (made.r.s.mahadi@gdplabs.id)

**Reviewers**:
- Novan Parmonangan Simanjuntak (novan.p.simanjuntak@gdplabs.id)
- Komang Elang Surya Prawira (komang.e.s.prawira@gdplabs.id)

## References
[1] [GLAIR GenAI Internal SDK - LlamaIndex Faithfullness Evaluation](#) \
[2] [LLamaIndex - Faithfullness Evaluation](https://docs.llamaindex.ai/en/stable/examples/evaluation/faithfulness_eval.html)

# Prepare Environment

Before we start, please make sure to install the SDK library and download the Truthful QA dataset to your local file system.

To install the SDK library, you need to create a personal access token on GitHub. Please follow these steps:
1. You need to log in to your [GitHub Account](https://github.com/).
2. Go to the [Personal Access Tokens](https://github.com/settings/tokens) page.
3. If you haven't created a Personal Access Tokens yet, you can generate one.
4. When generating a new token, make sure that you have checked the `repo` option to grant access to private repositories.
5. Now, you can copy the new token that you have generated and paste it into the script below.

In [None]:
import getpass
import subprocess
import sys

def install_sdk_library():
    token = getpass.getpass("Input Your Personal Access Token: ")

    cmd = f"pip install -e git+https://{token}@github.com/GDP-ADMIN/glair-genai-experiments-and-explorations.git#egg=glair_genai_sdk"

    with subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True) as process:
        stdout, stderr = process.communicate()

        if process.returncode != 0:
            sys.stdout.write(stderr)
            raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
        else:
            sys.stdout.write(stdout)

install_sdk_library()

<b>Warning:</b>
After running the command above, you need to restart the runtime in Google Colab for the changes to take effect. Not doing so might lead to the newly installed libraries not being recognized.

To restart the runtime in Google Colab:
- Click on the `Runtime` menu.
- Select `Restart runtime`.

# Faithfullness Evaluation
Once you have completed the previous step, you are ready to import the library

In [2]:
from llama_index import ServiceContext
from llama_index.llms import OpenAI
from gdplabs_gen_ai.evaluation import FaithfulnessEvaluator

after that, you need to prepare your data in the following format

In [7]:
ground_truth_responses = [
    "AI is artificial intelligence",
    "Car is and transportation",
]

retrieved_contexts = [
    ["Today AI is used everywhere", "AI was first developed on 1970, AI stands for Artificial Intelligence"],
    ["Toyota is a car factory that success in Japan", "Today lot of people use car as their main transportation"]
]

Next we need to create our LLM, in this example we will use GPT4 as the LLM. Remember to put your `OPENAI_API_KEY` into the env variable, you can use `os.environment` function

In [5]:
# create service context
gpt4 = OpenAI(temperature=0, model="gpt-4")
service_context_gpt4 = ServiceContext.from_defaults(llm=gpt4)

# create evaluator
evaluator_gpt4 = FaithfulnessEvaluator(service_context=service_context_gpt4)

Finally we can calculate the faithfullness score using the following code

In [12]:
scores = []

for ground_truth_response, retrieved_context in zip(ground_truth_responses, retrieved_contexts):
  # there are 2 API for evaluation, sync and async
  # in this example we use the async version
  result = await evaluator_gpt4.aevaluate(response=ground_truth_response, contexts=retrieved_context)
  # if you want to use the sync version, you can use the following code
  # result = evaluator_gpt4.evaluate(response=ground_truth_response, contexts=retrieved_context)
  scores.append(int(result.passing))

print(f"Score: {sum(scores)/len(scores)}")

Score: 1.0


The above example will calculate each response-context pair, either `PASS` or `NOT PASS` and than calculate the mean score

**note**: Since in this example we use jupyter notebook, and internally [jupyter notebook already running an event loop](https://blog.jupyter.org/ipython-7-0-async-repl-a35ce050f7f7), you can only use the async version here. If you want to use the sync version, use it outside jupyter notebook