[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/osllmai/inDox/blob/master/examples/evaluate_indox_rag.ipynb)

## Installation of Required Libraries

In this notebook, we will be installing the necessary packages for working with **Indox**, **IndoxJudge**, and other supporting libraries for a Retrieval-Augmented Generation (RAG) application. These packages include:

- **Indox**: A library that supports large language models (LLMs) with retrieval-augmented generation functionality.
- **IndoxJudge**: A library for evaluating LLMs using multiple metrics.
- **OpenAI**: For interacting with OpenAI's API to use various GPT models.
- **ChromaDB**: A vector database to manage and store embeddings.
- **Semantic Text Splitter**: A tool for splitting text in a meaningful way, based on semantics, to ensure better chunking for retrieval-based applications.


In [None]:
!pip install indox
!pip install indoxJudge
!pip install openai
!pip install chromadb
!pip install semantic_text_splitter

## Setting Up the Python Environment

If you are running this project in your local IDE, please create a Python environment to ensure all dependencies are correctly managed. You can follow the steps below to set up a virtual environment named `indox`:

### Windows

1. **Create the virtual environment:**
```bash
python -m venv indox
```
2. **Activate the virtual environment:**
```bash
indox_judge\Scripts\activate
```

### macOS/Linux

1. **Create the virtual environment:**
   ```bash
   python3 -m venv indox
```

2. **Activate the virtual environment:**
    ```bash
   source indox/bin/activate
```
### Install Dependencies

Once the virtual environment is activated, install the required dependencies by running:

```bash
pip install -r requirements.txt
```


## Setting Up API Keys for OpenAI

In this section, we will set up the environment to securely load API keys for **OpenAI**. We will be using the **dotenv** library to manage environment variables, ensuring sensitive information like API keys is not hardcoded into the code. This approach enhances security and makes the project easier to manage across different environments.

### Steps:

1. Ensure you have a `.env` file in the root of your project directory.
2. In the `.env` file, add the following:
    ```
    OPENAI_API_KEY=your_openai_api_key
    ```
3. The Python script will automatically load this environment variable and use it securely in the code.


In [1]:
import os
from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.environ['OPENAI_API_KEY']

In [2]:
from indox import IndoxRetrievalAugmentation

indox = IndoxRetrievalAugmentation()

[32mINFO[0m: [1mIndoxRetrievalAugmentation initialized[0m

            ██  ███    ██  ██████   ██████  ██       ██
            ██  ████   ██  ██   ██ ██    ██   ██  ██
            ██  ██ ██  ██  ██   ██ ██    ██     ██
            ██  ██  ██ ██  ██   ██ ██    ██   ██   ██
            ██  ██  █████  ██████   ██████  ██       ██
            


## Setting Up and Using OpenAI and Azure Embeddings with Indox

This section demonstrates how to integrate **OpenAI** models and **Azure** embeddings within the **Indox** framework. We will use **Chroma** as the vector store for storing embeddings, which can later be used for retrieval in various applications, including question answering or information retrieval.

### Steps:

1. **OpenAI LLM Setup**:
   We will initialize a language model from OpenAI, specifying the API key and the model (`gpt-4o-mini` in this case).
   
2. **Azure Embeddings Setup**:
   We will use **AzureOpenAIEmbeddings** to generate embeddings, specifying an API key and the embedding model (`text-embedding-3-small`).

3. **Chroma Vector Store**:
   The embeddings generated by **Azure** will be stored in **Chroma** under a collection named `sample`, which can be used for querying and retrieving relevant vectors.


In [3]:
from indox.llms import OpenAi  # Import OpenAI model class from Indox
from indox.embeddings import AzureOpenAIEmbeddings  # Import Azure embeddings class
from indox.vector_stores import Chroma  # Import Chroma vector store to handle embedding storage

# Initialize the OpenAI language model using the API key and specifying the model "gpt-4o-mini"
openai_llm = OpenAi(api_key=OPENAI_API_KEY, model="gpt-4o-mini")

# Initialize Azure embeddings using the same API key and specifying the model "text-embedding-3-small"
azure_embed = AzureOpenAIEmbeddings(api_key=OPENAI_API_KEY, model="text-embedding-3-small")

# Create a Chroma vector store with a collection named 'sample' and set the embedding function to azure_embed
db = Chroma(collection_name="sample", embedding_function=azure_embed)

# The vector store (db) is now ready to store embeddings generated by Azure and can be used for retrieval purposes


[32mINFO[0m: [1mInitializing OpenAi with model: gpt-4o-mini[0m
[32mINFO[0m: [1mOpenAi initialized successfully[0m
[32mINFO[0m: [1mInitialized OpenAiEmbedding with model: text-embedding-3-small[0m


## Loading Text Content from Project Gutenberg

In this section, we demonstrate how to load the text of a book from **Project Gutenberg** using the **GutenbergReader** class provided by **Indox**. This reader allows us to fetch and read books by specifying their unique Project Gutenberg ID.

### Steps:

1. **Initialize GutenbergReader**:
   We will create an instance of `GutenbergReader` to interface with Project Gutenberg.
   
2. **Specify the Book ID**:
   Each book on Project Gutenberg has a unique ID. For example, the ID for *Alice's Adventures in Wonderland* is `"11"`.
   
3. **Fetch the Book Content**:
   Using the `get_content()` method, we will fetch the book’s text content by passing the book ID.


In [4]:
from indox.data_connectors import GutenbergReader  # Import GutenbergReader to load books from Project Gutenberg

# Initialize the GutenbergReader to access and fetch book content from Project Gutenberg
reader = GutenbergReader()

# Specify the Project Gutenberg book ID for "Alice's Adventures in Wonderland" (ID: 11)
book_id = "11"

# Fetch the content of the book using the get_content method, passing the book_id
content = reader.get_content(book_id)

# Now, 'content' contains the entire text of "Alice's Adventures in Wonderland" as fetched from Project Gutenberg


## Splitting Text with SemanticTextSplitter

In this section, we show how to split long text into smaller, semantically meaningful chunks using the **SemanticTextSplitter** from **Indox**. This is particularly useful for processing large texts (such as books or articles) into smaller units for tasks like retrieval, summarization, or further analysis.

### Steps:

1. **Initialize the SemanticTextSplitter**:
   We will instantiate the `SemanticTextSplitter` with a specified chunk size (e.g., 400 tokens or characters).
   
2. **Split the Text**:
   After fetching the text (from sources like Project Gutenberg), we will pass the content to `split_text()` to break it into smaller, semantically coherent chunks.


In [5]:
from indox.splitter import SemanticTextSplitter  # Import SemanticTextSplitter to split long text into chunks

# Initialize the SemanticTextSplitter with a chunk size of 400 (tokens/characters)
splitter = SemanticTextSplitter(400)

# Split the book content into smaller, semantically meaningful chunks using the splitter
content_chunks = splitter.split_text(content)

# 'content_chunks' now contains the text broken into smaller segments, useful for retrieval or further processing


In [6]:
content_chunks[2:5]

['Alice was beginning to get very tired of sitting by her sister on the\r\nbank, and of having nothing to do: once or twice she had peeped into\r\nthe book her sister was reading, but it had no pictures or\r\nconversations in it, â\x80\x9cand what is the use of a book,â\x80\x9d thought Alice\r\nâ\x80\x9cwithout pictures or conversations?â\x80\x9d\r\n\r\nSo she was considering in her own mind (as well as she could, for the\r\nhot day made her feel very sleepy and stupid), whether the pleasure of\r\nmaking a daisy-chain would be worth the trouble of getting up and\r\npicking the daisies, when suddenly a White Rabbit with pink eyes ran\r\nclose by her.\r\n\r\nThere was nothing so _very_ remarkable in that; nor did Alice think it\r\nso _very_ much out of the way to hear the Rabbit say to itself, â\x80\x9cOh\r\ndear! Oh dear! I shall be late!â\x80\x9d (when she thought it over afterwards,\r\nit occurred to her that she ought to have wondered at this, but at the\r\ntime it all seemed quite n

In [7]:
db.add(docs=content_chunks) # Add chunks to vector database

[32mINFO[0m: [1mStoring documents in the vector store[0m
[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings for texts using engine: text-embedding-3-small[0m
[32mINFO[0m: [1mDocument added successfully to the vector store.[0m
[32mINFO[0m: [1mDocuments stored successfully[0m


## Advanced Setup for the QuestionAnswer Retriever

This section explains the additional hyperparameters available when configuring the **QuestionAnswer** retriever in **Indox**. The retriever uses a vector database and language model to fetch relevant content and provide answers to queries. We will cover the following hyperparameters:

### Key Hyperparameters:

- **llm**: The language model used to generate answers (e.g., OpenAI GPT models).
- **vector_database**: The vector database used for embedding retrieval (e.g., Chroma).
- **top_k**: Determines how many of the most relevant documents to retrieve (default is 5).
- **document_relevancy_filter**: If set to `True`, only the most relevant documents are retrieved based on relevancy filtering.
- **generate_clustered_prompts**: If set to `True`, this enables the retriever to cluster the retrieved documents and generate summaries for each cluster. The summary is added to the retrieval context, helping improve the overall response quality by providing a more structured context.


In [9]:
# Import the QuestionAnswer retriever from Indox
# This version includes all available hyperparameters with explanations

retriever = indox.QuestionAnswer(
    llm=openai_llm,  # Language model to generate answers (e.g., OpenAI GPT-4)
    vector_database=db,  # Chroma vector store for embedding retrieval
    top_k=3,  # Number of top relevant documents to retrieve, default is 3
    document_relevancy_filter=False,  # Set to True to filter results based on document relevancy
    generate_clustered_prompts=False  # Set to True to enable clustering of retrieval results and generate summaries for each cluster
)

# If 'generate_clustered_prompts' is True:
# - The retriever will group the retrieved documents into clusters.
# - It will summarize each cluster and add the summary to the initial retrieval context,
#   which can help the language model produce a more comprehensive and coherent answer.


In [10]:
retriever.invoke("Who is the speaker talking to in the text?")

[32mINFO[0m: [1mRetrieving context and scores from the vector database[0m
[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings for texts using engine: text-embedding-3-small[0m
[32mINFO[0m: [1mGenerating answer without document relevancy filter[0m
[32mINFO[0m: [1mAnswering question[0m
[32mINFO[0m: [1mGenerating response[0m
[32mINFO[0m: [1mResponse generated successfully[0m
[32mINFO[0m: [1mQuery answered successfully[0m


'The speaker is talking to the Caterpillar in the text. Alice engages in a conversation with the Caterpillar, responding to its questions and expressing her feelings about her changing identity and size.'

In [11]:
retriever.invoke("tell me about alice's sister book?")

[32mINFO[0m: [1mRetrieving context and scores from the vector database[0m
[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings for texts using engine: text-embedding-3-small[0m
[32mINFO[0m: [1mGenerating answer without document relevancy filter[0m
[32mINFO[0m: [1mAnswering question[0m
[32mINFO[0m: [1mGenerating response[0m
[32mINFO[0m: [1mResponse generated successfully[0m
[32mINFO[0m: [1mQuery answered successfully[0m


"Alice's sister's book is described as having no pictures or conversations in it. Alice finds it uninteresting and questions the usefulness of a book without these elements. She feels tired and sleepy while sitting by her sister on the bank, and her lack of engagement with the book contributes to her restlessness. This leads her to consider making a daisy-chain, but her attention is soon captured by the appearance of the White Rabbit, which ultimately leads her into her own adventures in Wonderland."

In [13]:
retriever.invoke("how alice story ends?")

[32mINFO[0m: [1mRetrieving context and scores from the vector database[0m
[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings for texts using engine: text-embedding-3-small[0m
[32mINFO[0m: [1mGenerating answer without document relevancy filter[0m
[32mINFO[0m: [1mAnswering question[0m
[32mINFO[0m: [1mGenerating response[0m
[32mINFO[0m: [1mResponse generated successfully[0m
[32mINFO[0m: [1mQuery answered successfully[0m


'The story of Alice\'s Adventures in Wonderland ends with Alice sitting with her eyes closed, half believing she is in Wonderland, while knowing that opening her eyes would return her to dull reality. She imagines her little sister growing up and keeping the loving heart of her childhood, sharing stories and memories of Wonderland with her own children. The narrative concludes with the phrase "THE END," indicating the end of Alice\'s adventures.'

In [14]:
chat_history = retriever.chat_history

## Setting Up the RagEvaluator with OpenAI LLM as Judge

In this section, we demonstrate how to set up the **RagEvaluator** from **IndoxJudge**, which evaluates the quality of responses generated by a language model in a Retrieval-Augmented Generation (RAG) system. We use **OpenAI** as the language model (LLM) that acts as a judge for the evaluation process.

### Steps:

1. **Initialize the OpenAI Model**:
   We first create an instance of `OpenAi` with an API key and a specified model (e.g., `"gpt-4o-mini"`).
   
2. **Set Up the RagEvaluator**:
   The `RagEvaluator` will use the LLM to evaluate the quality of the conversation entries (represented by `chat_history`), judging their accuracy and relevancy.
   
### Key Parameters:
- **llm_as_judge**: The language model acting as the judge, responsible for evaluating the retrieved content.
- **entries**: The conversation or content history to be evaluated.


In [15]:
from indoxJudge.pipelines import RagEvaluator  # Import RagEvaluator for evaluating the quality of RAG systems
from indoxJudge.models import OpenAi  # Import OpenAi model from IndoxJudge to use it as a judge

# Initialize the OpenAI model with API key and model name (gpt-4o-mini)
model = OpenAi(api_key=OPENAI_API_KEY, model="gpt-4o-mini")

# Create a RagEvaluator to evaluate the conversation history
# - llm_as_judge: The OpenAI model used to evaluate the responses
# - entries: The conversation history or entries to be judged
evaluator = RagEvaluator(llm_as_judge=model, entries=chat_history)

# Now, the RagEvaluator is set to evaluate the provided chat history based on the responses retrieved from the RAG system.


[32mINFO[0m: [1mInitializing OpenAi with model: gpt-4o-mini[0m


## Reviewing Evaluation Results with RagEvaluator

After running the `judge()` method, the **RagEvaluator** provides detailed feedback on the quality of the responses. In this section, we explore how to access and visualize the evaluation results and metric scores.

### Key Methods and Attributes:

1. **evaluator.results**:
   This attribute returns the detailed results and verdicts for each evaluation metric applied to the conversation or content entries. You can inspect the specific feedback for each metric (e.g., relevance, accuracy).

2. **evaluator.metrics_score**:
   This attribute provides the numeric scores for each metric used during the evaluation process, giving you a clear understanding of the performance of the retrieved responses.

3. **evaluator.plot()**:
   This method generates a visualization (using Matplotlib) of the evaluation scores for each metric. This chart helps you visually interpret the strengths and weaknesses of the model’s responses.


In [16]:
evaluator.judge()

[32mINFO[0m: [1mModel set for all metrics.[0m
[32mINFO[0m: [1mRagEvaluator initialized with model and metrics.[0m
[32mINFO[0m: [1mEvaluation Began...[0m
[32mINFO[0m: [1mEvaluating entry: 0[0m
[32mINFO[0m: [1mCompleted evaluation for entry: 0[0m
[32mINFO[0m: [1mEvaluating entry: 1[0m
[32mINFO[0m: [1mCompleted evaluation for entry: 1[0m
[32mINFO[0m: [1mEvaluating entry: 2[0m
[32mINFO[0m: [1mCompleted evaluation for entry: 2[0m
[32mINFO[0m: [1mEvaluating entry: 3[0m
[32mINFO[0m: [1mCompleted evaluation for entry: 3[0m
[32mINFO[0m: [1mEvaluation Completed, Check out the results[0m


In [17]:
evaluator.metrics_score

{'Faithfulness': 0.91,
 'AnswerRelevancy': 1.0,
 'ContextualRelevancy': 0.67,
 'GEval': 0.83,
 'Hallucination': 0.5,
 'KnowledgeRetention': 0.0,
 'precision': 0.58,
 'recall': 0.47,
 'f1_score': 0.52,
 'METEOR': 0.81,
 'evaluation_score': 0.71}

In [19]:
evaluator.plot(mode="inline")

<IPython.core.display.Javascript object>