## Installation of Required Libraries

| Platform |
|----------|
| [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/osllmai/inDox/blob/master/cookbook/indoxJudge/evaluate_indox_rag.ipynb)|
| [![GitHub](https://img.shields.io/badge/GitHub-Repository-blue?logo=github)](https://github.com/osllmai/inDox/blob/master/cookbook/indoxJudge/evaluate_indox_rag.ipynb) |


In this notebook, we will be installing the necessary packages for working with **Indox**, **IndoxJudge**, and other supporting libraries for a Retrieval-Augmented Generation (RAG) application. These packages include:

- **Indox**: A library that supports large language models (LLMs) with retrieval-augmented generation functionality.
- **IndoxJudge**: A library for evaluating LLMs using multiple metrics.
- **OpenAI**: For interacting with OpenAI's API to use various GPT models.
- **ChromaDB**: A vector database to manage and store embeddings.
- **Semantic Text Splitter**: A tool for splitting text in a meaningful way, based on semantics, to ensure better chunking for retrieval-based applications.



In [2]:
#!pip install indox indoxJudge openai chromadb semantic_text_splitter


## Setting Up the Python Environment

If you are running this project in your local IDE, please create a Python environment to ensure all dependencies are correctly managed. You can follow the steps below to set up a virtual environment named `indox`:

### Windows

1. **Create the virtual environment:**

```bash
python -m venv indox
```
1. **Activate the virtual environment:**

```bash
indox_judge\Scripts\activate
```

### macOS/Linux

1. **Create the virtual environment:**

```bash
   python3 -m venv indox
```

2. **Activate the virtual environment:**

```bash
   source indox/bin/activate
```
### Install Dependencies

Once the virtual environment is activated, install the required dependencies by running:

```bash
pip install -r requirements.txt
```


## Setting Up API Keys for OpenAI

In this section, we will set up the environment to securely load API keys for **OpenAI**. We will be using the **dotenv** library to manage environment variables, ensuring sensitive information like API keys is not hardcoded into the code. This approach enhances security and makes the project easier to manage across different environments.

### Steps:

1. Ensure you have a `.env` file in the root of your project directory.
2. In the `.env` file, add the following:
    ```
    OPENAI_API_KEY=your_openai_api_key
    ```
3. The Python script will automatically load this environment variable and use it securely in the code.


In [4]:
import os
from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.environ['OPENAI_API_KEY']

In [5]:
from indox import IndoxRetrievalAugmentation

indox = IndoxRetrievalAugmentation()

[32mINFO[0m: [1mIndoxRetrievalAugmentation initialized[0m

            ██  ███    ██  ██████   ██████  ██       ██
            ██  ████   ██  ██   ██ ██    ██   ██  ██
            ██  ██ ██  ██  ██   ██ ██    ██     ██
            ██  ██  ██ ██  ██   ██ ██    ██   ██   ██
            ██  ██  █████  ██████   ██████  ██       ██
            


## Setting Up and Using OpenAI and Azure Embeddings with Indox

This section demonstrates how to integrate **OpenAI** models and **Azure** embeddings within the **Indox** framework. We will use **Chroma** as the vector store for storing embeddings, which can later be used for retrieval in various applications, including question answering or information retrieval.

### Steps:

1. **OpenAI LLM Setup**:
   We will initialize a language model from OpenAI, specifying the API key and the model (`gpt-4o-mini` in this case).
   
2. **Azure Embeddings Setup**:
   We will use **AzureOpenAIEmbeddings** to generate embeddings, specifying an API key and the embedding model (`text-embedding-3-small`).

3. **Chroma Vector Store**:
   The embeddings generated by **Azure** will be stored in **Chroma** under a collection named `sample`, which can be used for querying and retrieving relevant vectors.


In [6]:
from indox.llms import OpenAi  # Import OpenAI model class from Indox
from indox.embeddings import AzureOpenAIEmbeddings  # Import Azure embeddings class
from indox.vector_stores import Chroma  # Import Chroma vector store to handle embedding storage

# Initialize the OpenAI language model using the API key and specifying the model "gpt-4o-mini"
openai_llm = OpenAi(api_key=OPENAI_API_KEY, model="gpt-4o-mini")

# Initialize Azure embeddings using the same API key and specifying the model "text-embedding-3-small"
azure_embed = AzureOpenAIEmbeddings(api_key=OPENAI_API_KEY, model="text-embedding-3-small")

# Create a Chroma vector store with a collection named 'sample' and set the embedding function to azure_embed
db = Chroma(collection_name="sample", embedding_function=azure_embed)

# The vector store (db) is now ready to store embeddings generated by Azure and can be used for retrieval purposes


2025-03-23 16:33:53,724 - numexpr.utils - INFO - Note: NumExpr detected 32 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16.
2025-03-23 16:33:53,724 - numexpr.utils - INFO - NumExpr defaulting to 16 threads.


[32mINFO[0m: [1mInitializing OpenAi with model: gpt-4o-mini[0m
[32mINFO[0m: [1mOpenAi initialized successfully[0m
[32mINFO[0m: [1mInitialized OpenAiEmbedding with model: text-embedding-3-small[0m


2025-03-23 16:33:55,913 - chromadb.telemetry.product.posthog - INFO - Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.


## Loading Text Content from Project Gutenberg

In this section, we demonstrate how to load the text of a book from **Project Gutenberg** using the **GutenbergReader** class provided by **Indox**. This reader allows us to fetch and read books by specifying their unique Project Gutenberg ID.

### Steps:

1. **Initialize GutenbergReader**:
   We will create an instance of `GutenbergReader` to interface with Project Gutenberg.
   
2. **Specify the Book ID**:
   Each book on Project Gutenberg has a unique ID. For example, the ID for *Alice's Adventures in Wonderland* is `"11"`.
   
3. **Fetch the Book Content**:
   Using the `get_content()` method, we will fetch the book’s text content by passing the book ID.


In [7]:
from indox.data_connectors import GutenbergReader  # Import GutenbergReader to load books from Project Gutenberg

# Initialize the GutenbergReader to access and fetch book content from Project Gutenberg
reader = GutenbergReader()

# Specify the Project Gutenberg book ID for "Alice's Adventures in Wonderland" (ID: 11)
book_id = "11"

# Fetch the content of the book using the get_content method, passing the book_id
content = reader.get_content(book_id)

# Now, 'content' contains the entire text of "Alice's Adventures in Wonderland" as fetched from Project Gutenberg


## Splitting Text with SemanticTextSplitter

In this section, we show how to split long text into smaller, semantically meaningful chunks using the **SemanticTextSplitter** from **Indox**. This is particularly useful for processing large texts (such as books or articles) into smaller units for tasks like retrieval, summarization, or further analysis.

### Steps:

1. **Initialize the SemanticTextSplitter**:
   We will instantiate the `SemanticTextSplitter` with a specified chunk size (e.g., 400 tokens or characters).
   
2. **Split the Text**:
   After fetching the text (from sources like Project Gutenberg), we will pass the content to `split_text()` to break it into smaller, semantically coherent chunks.


In [8]:
from indox.splitter import SemanticTextSplitter  # Import SemanticTextSplitter to split long text into chunks

# Initialize the SemanticTextSplitter with a chunk size of 400 (tokens/characters)
splitter = SemanticTextSplitter(400)

# Split the book content into smaller, semantically meaningful chunks using the splitter
content_chunks = splitter.split_text(content)

# 'content_chunks' now contains the text broken into smaller segments, useful for retrieval or further processing


In [9]:
content_chunks[2:5]

['Alice was beginning to get very tired of sitting by her sister on the\r\nbank, and of having nothing to do: once or twice she had peeped into\r\nthe book her sister was reading, but it had no pictures or\r\nconversations in it, “and what is the use of a book,” thought Alice\r\n“without pictures or conversations?”\r\n\r\nSo she was considering in her own mind (as well as she could, for the\r\nhot day made her feel very sleepy and stupid), whether the pleasure of\r\nmaking a daisy-chain would be worth the trouble of getting up and\r\npicking the daisies, when suddenly a White Rabbit with pink eyes ran\r\nclose by her.\r\n\r\nThere was nothing so _very_ remarkable in that; nor did Alice think it\r\nso _very_ much out of the way to hear the Rabbit say to itself, “Oh\r\ndear! Oh dear! I shall be late!” (when she thought it over afterwards,\r\nit occurred to her that she ought to have wondered at this, but at the\r\ntime it all seemed quite natural); but when the Rabbit actually _took a\r\

In [10]:
db.add(docs=content_chunks) # Add chunks to vector database

[32mINFO[0m: [1mStoring documents in the vector store[0m
[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings for texts using engine: text-embedding-3-small[0m


2025-03-23 16:34:16,373 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-03-23 16:34:17,062 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-03-23 16:34:17,710 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-03-23 16:34:18,383 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-03-23 16:34:19,032 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-03-23 16:34:19,485 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-03-23 16:34:20,053 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-03-23 16:34:20,631 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-03-23 16:34:20,822 - httpx - INFO - HTTP Request: POST https://api.openai.c

[32mINFO[0m: [1mDocument added successfully to the vector store.[0m
[32mINFO[0m: [1mDocuments stored successfully[0m


## Advanced Setup for the QuestionAnswer Retriever

This section explains the additional hyperparameters available when configuring the **QuestionAnswer** retriever in **Indox**. The retriever uses a vector database and language model to fetch relevant content and provide answers to queries. We will cover the following hyperparameters:

### Key Hyperparameters:

- **llm**: The language model used to generate answers (e.g., OpenAI GPT models).
- **vector_database**: The vector database used for embedding retrieval (e.g., Chroma).
- **top_k**: Determines how many of the most relevant documents to retrieve (default is 5).
- **document_relevancy_filter**: If set to `True`, only the most relevant documents are retrieved based on relevancy filtering.
- **generate_clustered_prompts**: If set to `True`, this enables the retriever to cluster the retrieved documents and generate summaries for each cluster. The summary is added to the retrieval context, helping improve the overall response quality by providing a more structured context.


In [11]:
# Import the QuestionAnswer retriever from Indox
# This version includes all available hyperparameters with explanations

retriever = indox.QuestionAnswer(
    llm=openai_llm,  # Language model to generate answers (e.g., OpenAI GPT-4)
    vector_database=db,  # Chroma vector store for embedding retrieval
    top_k=3,  # Number of top relevant documents to retrieve, default is 3
    document_relevancy_filter=False,  # Set to True to filter results based on document relevancy
    generate_clustered_prompts=False  # Set to True to enable clustering of retrieval results and generate summaries for each cluster
)

# If 'generate_clustered_prompts' is True:
# - The retriever will group the retrieved documents into clusters.
# - It will summarize each cluster and add the summary to the initial retrieval context,
#   which can help the language model produce a more comprehensive and coherent answer.


In [12]:
retriever.invoke("Who is the speaker talking to in the text?")

[32mINFO[0m: [1mRetrieving context and scores from the vector database[0m
[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings for texts using engine: text-embedding-3-small[0m


2025-03-23 16:35:09,203 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


[32mINFO[0m: [1mGenerating answer without document relevancy filter[0m
[32mINFO[0m: [1mAnswering question[0m
[32mINFO[0m: [1mGenerating response[0m


2025-03-23 16:35:10,438 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mResponse generated successfully[0m
[32mINFO[0m: [1mQuery answered successfully[0m


'The speaker is talking to the Caterpillar. In the context provided, Alice is engaged in a conversation with the Caterpillar, responding to its questions and remarks about her identity and experiences of change.'

In [13]:
retriever.invoke("tell me about alice's sister book?")

[32mINFO[0m: [1mRetrieving context and scores from the vector database[0m
[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings for texts using engine: text-embedding-3-small[0m


2025-03-23 16:35:14,548 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


[32mINFO[0m: [1mGenerating answer without document relevancy filter[0m
[32mINFO[0m: [1mAnswering question[0m
[32mINFO[0m: [1mGenerating response[0m


2025-03-23 16:35:15,943 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mResponse generated successfully[0m
[32mINFO[0m: [1mQuery answered successfully[0m


"Alice's sister's book did not have any pictures or conversations in it, which made Alice feel bored and disinterested. She thought to herself, “and what is the use of a book, without pictures or conversations?” This lack of engaging content contributed to Alice's feelings of tiredness and her desire for something more stimulating to do while sitting by her sister on the bank."

In [14]:
retriever.invoke("how alice story ends?")

[32mINFO[0m: [1mRetrieving context and scores from the vector database[0m
[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings for texts using engine: text-embedding-3-small[0m


2025-03-23 16:35:20,312 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


[32mINFO[0m: [1mGenerating answer without document relevancy filter[0m
[32mINFO[0m: [1mAnswering question[0m
[32mINFO[0m: [1mGenerating response[0m


2025-03-23 16:35:22,567 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mResponse generated successfully[0m
[32mINFO[0m: [1mQuery answered successfully[0m


"The story of Alice in Wonderland does not have a definitive ending in the traditional sense, as it is more of a series of whimsical adventures rather than a linear narrative. However, the story concludes with Alice waking up from her dream. After her many encounters and experiences in Wonderland, she finds herself back on the bank where she initially fell asleep. She recounts her adventures to her sister, who reflects on the curious nature of Alice's dream. The story ends with Alice's sister pondering the imaginative world Alice has experienced, suggesting that the adventures may continue in Alice's mind as she grows older."

In [15]:
chat_history = retriever.chat_history

In [16]:
chat_history

{0: {'query': 'Who is the speaker talking to in the text?',
  'llm_response': 'The speaker is talking to the Caterpillar. In the context provided, Alice is engaged in a conversation with the Caterpillar, responding to its questions and remarks about her identity and experiences of change.',
  'retrieval_context': ['“’Tis the voice of the Lobster; I heard him declare,\r\n“You have baked me too brown, I must sugar my hair.”\r\nAs a duck with its eyelids, so he with his nose\r\nTrims his belt and his buttons, and turns out his toes.”\r\n\r\n[later editions continued as follows\r\nWhen the sands are all dry, he is gay as a lark,\r\nAnd will talk in contemptuous tones of the Shark,\r\nBut, when the tide rises and sharks are around,\r\nHis voice has a timid and tremulous sound.]',
   'The Caterpillar and Alice looked at each other for some time in\r\nsilence: at last the Caterpillar took the hookah out of its mouth, and\r\naddressed her in a languid, sleepy voice.\r\n\r\n“Who are _you?_” sai

## Setting Up the RagEvaluator with OpenAI LLM as Judge

In this section, we demonstrate how to set up the **RagEvaluator** from **IndoxJudge**, which evaluates the quality of responses generated by a language model in a Retrieval-Augmented Generation (RAG) system. We use **OpenAI** as the language model (LLM) that acts as a judge for the evaluation process.

### Steps:

1. **Initialize the OpenAI Model**:
   We first create an instance of `OpenAi` with an API key and a specified model (e.g., `"gpt-4o-mini"`).
   
2. **Set Up the RagEvaluator**:
   The `RagEvaluator` will use the LLM to evaluate the quality of the conversation entries (represented by `chat_history`), judging their accuracy and relevancy.
   
### Key Parameters:
- **llm_as_judge**: The language model acting as the judge, responsible for evaluating the retrieved content.
- **entries**: The conversation or content history to be evaluated.


In [17]:
from indoxJudge.pipelines import RagEvaluator  # Import RagEvaluator for evaluating the quality of RAG systems
from indoxJudge.models import OpenAi  # Import OpenAi model from IndoxJudge to use it as a judge

# Initialize the OpenAI model with API key and model name (gpt-4o-mini)
model = OpenAi(api_key=OPENAI_API_KEY, model="gpt-4o-mini")

# Create a RagEvaluator to evaluate the conversation history
# - llm_as_judge: The OpenAI model used to evaluate the responses
# - entries: The conversation history or entries to be judged
evaluator = RagEvaluator(llm_as_judge=model, entries=chat_history)

# Now, the RagEvaluator is set to evaluate the provided chat history based on the responses retrieved from the RAG system.


2025-03-23 16:36:15,824 - matplotlib.font_manager - INFO - generated new fontManager


[32mINFO[0m: [1mInitializing OpenAi with model: gpt-4o-mini and max_tokens: 2048[0m


## Reviewing Evaluation Results with RagEvaluator

After running the `judge()` method, the **RagEvaluator** provides detailed feedback on the quality of the responses. In this section, we explore how to access and visualize the evaluation results and metric scores.

### Key Methods and Attributes:

1. **evaluator.results**:
   This attribute returns the detailed results and verdicts for each evaluation metric applied to the conversation or content entries. You can inspect the specific feedback for each metric (e.g., relevance, accuracy).

2. **evaluator.metrics_score**:
   This attribute provides the numeric scores for each metric used during the evaluation process, giving you a clear understanding of the performance of the retrieved responses.

3. **evaluator.plot()**:
   This method generates a visualization (using Matplotlib) of the evaluation scores for each metric. This chart helps you visually interpret the strengths and weaknesses of the model’s responses.


In [26]:
import nltk
nltk.download('punkt_tab')
nltk.download('averaged_perceptron_tagger_eng')

[nltk_data] Downloading package punkt_tab to
[nltk_data]     C:\Users\llmserver\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     C:\Users\llmserver\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping taggers\averaged_perceptron_tagger_eng.zip.


True

In [27]:
evaluator.judge()

[32mINFO[0m: [1mModel set for all metrics.[0m
[32mINFO[0m: [1mRagEvaluator initialized with model and metrics.[0m
[32mINFO[0m: [1mEvaluating entry: 0[0m

[32mINFO[0m: [1mEvaluating metric: Faithfulness[0m


2025-03-23 16:43:11,374 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 223 | Output: 67[0m


2025-03-23 16:43:13,731 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 188 | Output: 70[0m


2025-03-23 16:43:15,591 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 1534 | Output: 60[0m


2025-03-23 16:43:16,393 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 235 | Output: 34[0m
[32mINFO[0m: [1mToken Usage Summary:
 Total Input: 2180 | Total Output: 231 | Total: 2411[0m
[32mINFO[0m: [1mCompleted evaluation for metric: Faithfulness, score: 1.44[0m

[32mINFO[0m: [1mEvaluating metric: AnswerRelevancy[0m


2025-03-23 16:43:17,589 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 164 | Output: 54[0m


2025-03-23 16:43:18,741 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 445 | Output: 60[0m


2025-03-23 16:43:19,736 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 199 | Output: 31[0m
[32mINFO[0m: [1mToken Usage Summary:
 Total Input: 808 | Total Output: 145 | Total: 953[0m
[32mINFO[0m: [1mCompleted evaluation for metric: AnswerRelevancy, score: 1.44[0m

[32mINFO[0m: [1mEvaluating metric: ContextualRelevancy[0m


2025-03-23 16:43:20,839 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 406 | Output: 59[0m


2025-03-23 16:43:22,039 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 668 | Output: 57[0m


2025-03-23 16:43:23,474 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 691 | Output: 48[0m


2025-03-23 16:43:24,517 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 406 | Output: 55[0m


2025-03-23 16:43:25,701 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 668 | Output: 61[0m


2025-03-23 16:43:26,675 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 691 | Output: 52[0m


2025-03-23 16:43:28,964 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 257 | Output: 61[0m
[32mINFO[0m: [1mToken Usage Summary:
 Total Input: 3787 | Total Output: 393 | Total: 4180[0m
[32mINFO[0m: [1mCompleted evaluation for metric: ContextualRelevancy, score: 0.97[0m

[32mINFO[0m: [1mEvaluating metric: GEval[0m


2025-03-23 16:43:31,919 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 270 | Output: 100[0m


2025-03-23 16:43:33,849 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 1267 | Output: 58[0m
[32mINFO[0m: [1mToken Usage Summary:
 Total Input: 1537 | Total Output: 158 | Total: 1695[0m
[32mINFO[0m: [1mCompleted evaluation for metric: GEval, score: 1.3[0m

[32mINFO[0m: [1mEvaluating metric: Hallucination[0m


2025-03-23 16:43:35,846 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 1352 | Output: 129[0m


2025-03-23 16:43:36,737 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 210 | Output: 34[0m
[32mINFO[0m: [1mToken Usage Summary:
 Total Input: 1562 | Total Output: 163 | Total: 1725[0m
[32mINFO[0m: [1mCompleted evaluation for metric: Hallucination, score: 0.0[0m

[32mINFO[0m: [1mEvaluating metric: KnowledgeRetention[0m


2025-03-23 16:43:37,246 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 313 | Output: 17[0m


2025-03-23 16:43:38,238 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 527 | Output: 48[0m


2025-03-23 16:43:39,193 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 177 | Output: 45[0m
[32mINFO[0m: [1mToken Usage Summary:
 Total Input: 1017 | Total Output: 110 | Total: 1127[0m
[32mINFO[0m: [1mCompleted evaluation for metric: KnowledgeRetention, score: 0.0[0m

[32mINFO[0m: [1mEvaluating metric: BertScore[0m
[32mINFO[0m: [1m
Completed evaluation for metric: BertScore, scores: 
precision: 0.84,
recall: 0.64,
f1_score: 0.72,
                        [0m

[32mINFO[0m: [1mEvaluating metric: METEOR[0m
[32mINFO[0m: [1mCompleted evaluation for metric: METEOR, score: 0.79[0m
[32mINFO[0m: [1mCompleted evaluation for entry: 0[0m
[32mINFO[0m: [1mEvaluation Completed, Check out the results[0m
[32mINFO[0m: [1mEvaluating entry: 1[0m

[32mINFO[0m: [1mEvaluating metric: Faithfulness[0m


2025-03-23 16:43:47,362 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 259 | Output: 99[0m


2025-03-23 16:43:49,528 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 224 | Output: 100[0m


2025-03-23 16:43:50,956 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 1509 | Output: 72[0m


2025-03-23 16:43:51,719 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 235 | Output: 34[0m
[32mINFO[0m: [1mToken Usage Summary:
 Total Input: 2227 | Total Output: 305 | Total: 2532[0m
[32mINFO[0m: [1mCompleted evaluation for metric: Faithfulness, score: 1.48[0m

[32mINFO[0m: [1mEvaluating metric: AnswerRelevancy[0m


2025-03-23 16:43:53,844 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 200 | Output: 81[0m


2025-03-23 16:43:54,999 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 470 | Output: 73[0m


2025-03-23 16:43:56,485 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 197 | Output: 31[0m
[32mINFO[0m: [1mToken Usage Summary:
 Total Input: 867 | Total Output: 185 | Total: 1052[0m
[32mINFO[0m: [1mCompleted evaluation for metric: AnswerRelevancy, score: 1.48[0m

[32mINFO[0m: [1mEvaluating metric: ContextualRelevancy[0m


2025-03-23 16:43:57,489 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 594 | Output: 52[0m


2025-03-23 16:43:58,799 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 685 | Output: 83[0m


2025-03-23 16:43:59,788 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 424 | Output: 59[0m


2025-03-23 16:44:00,900 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 594 | Output: 52[0m


2025-03-23 16:44:01,952 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 685 | Output: 66[0m


2025-03-23 16:44:02,912 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 424 | Output: 59[0m


2025-03-23 16:44:03,841 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 205 | Output: 34[0m
[32mINFO[0m: [1mToken Usage Summary:
 Total Input: 3611 | Total Output: 405 | Total: 4016[0m
[32mINFO[0m: [1mCompleted evaluation for metric: ContextualRelevancy, score: 1.32[0m

[32mINFO[0m: [1mEvaluating metric: GEval[0m


2025-03-23 16:44:06,331 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 270 | Output: 95[0m


2025-03-23 16:44:07,857 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 1229 | Output: 61[0m
[32mINFO[0m: [1mToken Usage Summary:
 Total Input: 1499 | Total Output: 156 | Total: 1655[0m
[32mINFO[0m: [1mCompleted evaluation for metric: GEval, score: 0.8300000000000001[0m

[32mINFO[0m: [1mEvaluating metric: Hallucination[0m


2025-03-23 16:44:10,714 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 1322 | Output: 185[0m


2025-03-23 16:44:12,109 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 266 | Output: 67[0m
[32mINFO[0m: [1mToken Usage Summary:
 Total Input: 1588 | Total Output: 252 | Total: 1840[0m
[32mINFO[0m: [1mCompleted evaluation for metric: Hallucination, score: 1.0[0m

[32mINFO[0m: [1mEvaluating metric: KnowledgeRetention[0m


2025-03-23 16:44:12,613 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 311 | Output: 11[0m


2025-03-23 16:44:14,078 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 558 | Output: 66[0m


2025-03-23 16:44:15,585 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 195 | Output: 54[0m
[32mINFO[0m: [1mToken Usage Summary:
 Total Input: 1064 | Total Output: 131 | Total: 1195[0m
[32mINFO[0m: [1mCompleted evaluation for metric: KnowledgeRetention, score: 0.0[0m

[32mINFO[0m: [1mEvaluating metric: BertScore[0m
[32mINFO[0m: [1m
Completed evaluation for metric: BertScore, scores: 
precision: 0.86,
recall: 0.71,
f1_score: 0.77,
                        [0m

[32mINFO[0m: [1mEvaluating metric: METEOR[0m
[32mINFO[0m: [1mCompleted evaluation for metric: METEOR, score: 1.05[0m
[32mINFO[0m: [1mCompleted evaluation for entry: 1[0m
[32mINFO[0m: [1mEvaluation Completed, Check out the results[0m
[32mINFO[0m: [1mEvaluating entry: 2[0m

[32mINFO[0m: [1mEvaluating metric: Faithfulness[0m


2025-03-23 16:44:18,960 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 303 | Output: 133[0m


2025-03-23 16:44:21,468 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 268 | Output: 125[0m


2025-03-23 16:44:25,247 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 1777 | Output: 170[0m


2025-03-23 16:44:26,598 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 287 | Output: 52[0m
[32mINFO[0m: [1mToken Usage Summary:
 Total Input: 2635 | Total Output: 480 | Total: 3115[0m
[32mINFO[0m: [1mCompleted evaluation for metric: Faithfulness, score: 0.61[0m

[32mINFO[0m: [1mEvaluating metric: AnswerRelevancy[0m


2025-03-23 16:44:28,680 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 244 | Output: 136[0m


2025-03-23 16:44:30,383 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 511 | Output: 109[0m


2025-03-23 16:44:31,286 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 194 | Output: 32[0m
[32mINFO[0m: [1mToken Usage Summary:
 Total Input: 949 | Total Output: 277 | Total: 1226[0m
[32mINFO[0m: [1mCompleted evaluation for metric: AnswerRelevancy, score: 1.49[0m

[32mINFO[0m: [1mEvaluating metric: ContextualRelevancy[0m


2025-03-23 16:44:32,215 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 624 | Output: 45[0m


2025-03-23 16:44:33,321 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 669 | Output: 57[0m


2025-03-23 16:44:34,595 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 635 | Output: 57[0m


2025-03-23 16:44:35,554 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 624 | Output: 50[0m


2025-03-23 16:44:36,576 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 669 | Output: 62[0m


2025-03-23 16:44:37,646 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 635 | Output: 55[0m


2025-03-23 16:44:38,737 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 202 | Output: 33[0m
[32mINFO[0m: [1mToken Usage Summary:
 Total Input: 4058 | Total Output: 359 | Total: 4417[0m
[32mINFO[0m: [1mCompleted evaluation for metric: ContextualRelevancy, score: 1.44[0m

[32mINFO[0m: [1mEvaluating metric: GEval[0m


2025-03-23 16:44:41,091 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 270 | Output: 93[0m


2025-03-23 16:44:42,574 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 1500 | Output: 56[0m
[32mINFO[0m: [1mToken Usage Summary:
 Total Input: 1770 | Total Output: 149 | Total: 1919[0m
[32mINFO[0m: [1mCompleted evaluation for metric: GEval, score: 1.08[0m

[32mINFO[0m: [1mEvaluating metric: Hallucination[0m


2025-03-23 16:44:45,330 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 1597 | Output: 173[0m


2025-03-23 16:44:46,294 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 254 | Output: 50[0m
[32mINFO[0m: [1mToken Usage Summary:
 Total Input: 1851 | Total Output: 223 | Total: 2074[0m
[32mINFO[0m: [1mCompleted evaluation for metric: Hallucination, score: 1.33[0m

[32mINFO[0m: [1mEvaluating metric: KnowledgeRetention[0m


2025-03-23 16:44:46,812 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 308 | Output: 14[0m


2025-03-23 16:44:48,368 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 606 | Output: 55[0m


2025-03-23 16:44:49,327 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mToken Counts - Input: 184 | Output: 53[0m
[32mINFO[0m: [1mToken Usage Summary:
 Total Input: 1098 | Total Output: 122 | Total: 1220[0m
[32mINFO[0m: [1mCompleted evaluation for metric: KnowledgeRetention, score: 0.0[0m

[32mINFO[0m: [1mEvaluating metric: BertScore[0m
[32mINFO[0m: [1m
Completed evaluation for metric: BertScore, scores: 
precision: 0.8400000000000001,
recall: 0.72,
f1_score: 0.77,
                        [0m

[32mINFO[0m: [1mEvaluating metric: METEOR[0m
[32mINFO[0m: [1mCompleted evaluation for metric: METEOR, score: 1.1099999999999999[0m
[32mINFO[0m: [1mCompleted evaluation for entry: 2[0m
[32mINFO[0m: [1mEvaluation Completed, Check out the results[0m


In [28]:
evaluator.metrics_score

{'Faithfulness': 0.2,
 'AnswerRelevancy': 0.5,
 'ContextualRelevancy': 0.48,
 'GEval': 0.36,
 'Hallucination': 0.44,
 'KnowledgeRetention': 0.0,
 'precision': 0.28,
 'recall': 0.24,
 'f1_score': 0.26,
 'METEOR': 0.37,
 'evaluation_score': 0.36}

In [29]:
evaluator.plot(mode="inline")


## Join Us

Join us in exploring how Indox can revolutionize your document processing workflow, bringing clarity and organization to your data retrieval needs. Connect with us and become part of our growing community through the platforms below:

## Community

- [Discord](https://discord.com/invite/xGz5tQYaeq)
- [X (Twitter)](https://x.com/osllmai)
- [LinkedIn](https://www.linkedin.com/company/osllmai/)
- [YouTube](https://www.youtube.com/@osllm-rb9pr)
- [Telegram](https://t.me/osllmai)


*Reviewed by: Ali Nemati - March, 23, 2025*