## Indox Retrieval Augmentation
Here, we will explore how to work with Indox Retrieval Augmentation. We are using OpenAI from Indox Api, we should set our INDOX_OPENAI_API_KEY as an environment variable.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/osllmai/inDox/blob/master/cookbook/indoxRag/indox_api_openai.ipynb)

In [2]:
!pip install indoxRag chromadb duckduckgo_search

ERROR: Could not find a version that satisfies the requirement indoxRag (from versions: none)
ERROR: No matching distribution found for indoxRag


## Setting Up the Python Environment

If you are running this project in your local IDE, please create a Python environment to ensure all dependencies are correctly managed. You can follow the steps below to set up a virtual environment named `indox`:

### Windows

1. **Create the virtual environment:**
```bash
  python -m venv indox
```

2. **Activate the virtual environment:**
```bash
  indox\Scripts\activate
```


### macOS/Linux

1. **Create the virtual environment:**
   ```bash
   python3 -m venv indox
   
2. **Activate the virtual environment:**
```bash
  source indox/bin/activate
```

### Install Dependencies

Once the virtual environment is activated, install the required dependencies by running:

```bash
  pip install -r requirements.txt
```


In [3]:
!wget https://raw.githubusercontent.com/osllmai/inDox/master/Demo/sample.txt

SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc
syswgetrc = C:\Program Files (x86)\GnuWin32/etc/wgetrc
--2024-12-08 18:46:41--  https://raw.githubusercontent.com/osllmai/inDox/master/Demo/sample.txt
Resolving raw.githubusercontent.com... 185.199.110.133, 185.199.111.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com|185.199.110.133|:443... connected.
OpenSSL: error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol
Unable to establish SSL connection.


## Setting Up the Python Environment

If you are running this project in your local IDE, please create a Python environment to ensure all dependencies are correctly managed. You can follow the steps below to set up a virtual environment named `indox`:

### Windows

1. **Create the virtual environment:**
```bash
python -m venv indox
```
2. **Activate the virtual environment:**
```bash
indox_judge\Scripts\activate
```

### macOS/Linux

1. **Create the virtual environment:**
   ```bash
   python3 -m venv indox
```

2. **Activate the virtual environment:**
    ```bash
   source indox/bin/activate
```
### Install Dependencies

Once the virtual environment is activated, install the required dependencies by running:

```bash
pip install -r requirements.txt
```


In [1]:
import sys
import os

module_path = os.path.abspath('E:/Codes/inDox/libs/indoxRag')
if module_path not in sys.path:
    sys.path.append(module_path)




In [2]:
import os
from dotenv import load_dotenv
load_dotenv()
NERD_TOKEN_API= os.getenv("NERD_TOKEN_API")

### Creating an instance of IndoxTetrivalAugmentation

To effectively utilize the Indox Retrieval Augmentation capabilities, you must first create an instance of the IndoxRetrievalAugmentation class. This instance will allow you to access the methods and properties defined within the class, enabling the augmentation and retrieval functionalities.

In [3]:
from indoxRag import IndoxRetrievalAugmentation
indox = IndoxRetrievalAugmentation()

[32mINFO[0m: [1mIndoxRetrievalAugmentation initialized[0m

            ██  ███    ██  ██████   ██████  ██       ██
            ██  ████   ██  ██   ██ ██    ██   ██  ██
            ██  ██ ██  ██  ██   ██ ██    ██     ██
            ██  ██  ██ ██  ██   ██ ██    ██   ██   ██
            ██  ██  █████  ██████   ██████  ██       ██
            


### Generating response using Indox
IndoxApi class is used to handle question-answering task using Indox model. This instance creates IndoxOpenAIEmbedding class to specifying embedding model.By using ClusteredSplit function we can import pdf and text file and split them into chunks.

In [4]:
# Import necessary classes from Indox library
from indoxRag.llms import NerdToken
from indoxRag.embeddings import NerdTokenEmbedding
from indoxRag.data_loader_splitter import ClusteredSplit

# Create instances for API access and text embedding
openai_qa_indox = NerdToken(api_key=NERD_TOKEN_API)
embed_openai_indox = NerdTokenEmbedding(api_key=NERD_TOKEN_API, model="text-embedding-3-small")

# Specify the path to your text file
file_path = "sample.txt"

# Create a ClusteredSplit instance for handling file loading and chunking
loader_splitter = ClusteredSplit(file_path=file_path, embeddings=embed_openai_indox, summary_model=openai_qa_indox)

# Load and split the document into chunks using ClusteredSplit
docs = loader_splitter.load_and_chunk()

[32mINFO[0m: [1mInitialized IndoxOpenAIEmbedding with model: text-embedding-3-small[0m
[32mINFO[0m: [1mClusteredSplit initialized successfully[0m
[32mINFO[0m: [1mStarting processing for documents[0m
[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings texts using engine: text-embedding-3-small[0m
[32mINFO[0m: [1m--Generated 7 clusters--[0m
[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings texts using engine: text-embedding-3-small[0m
[32mINFO[0m: [1m--Generated 1 clusters--[0m
[32mINFO[0m: [1mCompleted chunking & clustering process[0m
[32mINFO[0m: [1mSuccessfully obtained all documents[0m


In [5]:
docs[2]

'  They took her pretty clothes away from her, put an old grey bedgown on her, and gave her wooden shoes   Just look at the proud princess, how decked out she is, they cried, and laughed, and led her into the kitchen There she had to do hard work from morning till night, get up before daybreak, carry water, light fires, cook and wash   Besides this, the sisters did her every imaginable injury - they mocked her'

 Here ChromaVectorStore handles the storage and retrieval of vector embeddings by specifying a collection name and sets up a vector store where text embeddings can be stored and queried.

In [6]:
from indoxRag.vector_stores import Chroma

# Define the collection name within the vector store
collection_name = "sample"

# Create a ChromaVectorStore instance
db = Chroma(collection_name=collection_name, embedding_function=embed_openai_indox)

2024-12-08 18:51:32,661 - chromadb.telemetry.product.posthog - INFO - Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.


### load and preprocess data
This part of code demonstrates how to load and preprocess text data from a file, split it into chunks, and store these chunks in the vector store that was set up previously.

In [7]:
db.add(docs=docs)

[32mINFO[0m: [1mStoring documents in the vector store[0m
[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings texts using engine: text-embedding-3-small[0m
[32mINFO[0m: [1mDocument added successfully to the vector store.[0m
[32mINFO[0m: [1mDocuments stored successfully[0m


### Retrieve relevant information and generate an answer
The main purpose of these lines is to perform a query on the vector store to retrieve the most relevant information (top_k=5) and generate an answer using the language model.

In [8]:
query = "How cinderella reach her happy ending?"
retriever = indox.QuestionAnswer(vector_database=db,llm=openai_qa_indox,top_k=5)

invoke(query) method sends the query to the retriever, which searches the vector store for relevant text chunks and uses the language model to generate a response based on the retrieved information.
Context property retrieves the context or the detailed information that the retriever used to generate the answer to the query. It provides insight into how the query was answered by showing the relevant text chunks and any additional information used.

In [9]:
retriever.invoke(query)

[32mINFO[0m: [1mRetrieving context and scores from the vector database[0m
[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings texts using engine: text-embedding-3-small[0m
[32mINFO[0m: [1mGenerating answer without document relevancy filter[0m
[32mINFO[0m: [1mQuery answered successfully[0m


"Cinderella reaches her happy ending through a series of transformative events facilitated by her inherent goodness, magical assistance, and the eventual recognition of her true worth. Here’s a summary of the key steps leading to her happy ending:\n\n1. **Magical Assistance**: After enduring mistreatment from her stepmother and stepsisters, Cinderella seeks solace at her mother’s grave, where she prays to a hazel tree. A little bird appears to grant her wishes, providing her with beautiful dresses and shoes that allow her to attend the royal festival.\n\n2. **The Royal Festival**: Cinderella attends the king's festival, where she captivates the prince with her beauty and grace. Each night, she must leave before he discovers her true identity, but she leaves behind a slipper, which becomes a crucial symbol of her identity.\n\n3. **The Prince's Search**: After the festival, the prince searches for the owner of the golden slipper. Cinderella’s stepsisters attempt to fit into the slipper, 

In [10]:
retriever.context

['The documentation provided appears to be a retelling of the classic fairy tale "Cinderella." Here is a detailed summary of the key elements and events described:\n\n1. **Cinderella\'s Wishes**: The story begins with Cinderella, who visits a hazel tree three times a day to weep and pray. A little white bird comes to her aid, granting her wishes by dropping down what she desires.\n\n2. **The King\'s Festival**: The king announces a grand festival lasting three days, inviting all the beautiful young girls in the kingdom so that his son can choose a bride. Cinderella\'s step-sisters are excited about the event and ask Cinderella to help them prepare by combing their hair, brushing their shoes, and fastening their buckles.\n\n3. **Cinderella\'s Desire to Attend**: Despite her step-sisters\' excitement, Cinderella wishes to attend the festival as well. She pleads with her step-mother for permission, but she is denied and left behind.\n\n4. **Magical Transformation**: In response to Cindere

### With AgenticRag

AgenticRag stands for Agentic Retrieval-Augmented Generation. This concept combines retrieval-based methods and generation-based methods in natural language processing (NLP). The key idea is to enhance the generative capabilities of a language model by incorporating relevant information retrieved from a database or a vector store.
 AgenticRag is designed to provide more contextually rich and accurate responses by utilizing external knowledge sources. It retrieves relevant pieces of information (chunks) from a vector store based on a query and then uses a language model to generate a comprehensive response that incorporates this retrieved information.

In [11]:
agent = indox.AgenticRag(llm=openai_qa_indox,vector_database=db,top_k=5)
agent.run(query)

[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings texts using engine: text-embedding-3-small[0m
[32mINFO[0m: [1mRelevant doc[0m
[32mINFO[0m: [1mRelevant doc[0m
[32mINFO[0m: [1mRelevant doc[0m
[32mINFO[0m: [1mNot Relevant doc[0m
[32mINFO[0m: [1mRelevant doc[0m
[32mINFO[0m: [1mHallucination detected, Regenerate the answer...[0m


"Cinderella reaches her happy ending through a series of transformative events and magical assistance that ultimately lead to her recognition and marriage to the prince. Here’s a summary of the key steps in her journey to happiness:\n\n1. **Mourning and Virtue**: After the death of her mother, Cinderella embodies the virtues of goodness and piety that her mother instilled in her. This moral foundation attracts divine favor and assistance.\n\n2. **Cruelty of the Stepmother**: Despite her hardships and the cruel treatment from her stepmother and stepsisters, Cinderella remains resilient and hopeful. Her daily visits to her mother’s grave symbolize her connection to her past and her desire for a better future.\n\n3. **Magical Assistance**: When Cinderella expresses her wish to attend the royal festival, a magical bird, aided by the hazel tree she planted at her mother’s grave, grants her beautiful dresses and shoes, allowing her to attend the festival undetected.\n\n4. **Captivating the P

In [12]:
query_2 = "where does messi plays right now?"

In [13]:
agent.run(query_2)

[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings texts using engine: text-embedding-3-small[0m
[32mINFO[0m: [1mNot Relevant doc[0m
[32mINFO[0m: [1mNot Relevant doc[0m
[32mINFO[0m: [1mNot Relevant doc[0m
[32mINFO[0m: [1mNot Relevant doc[0m
[32mINFO[0m: [1mNot Relevant doc[0m
[32mINFO[0m: [1mNo Relevant document found, Start web search[0m
[32mINFO[0m: [1mNo Relevant Context Found, Start Searching On Web...[0m


2024-12-08 19:08:13,159 - primp - INFO - response: https://duckduckgo.com/?q=where+does+messi+plays+right+now%3F 200 19080
2024-12-08 19:08:14,516 - primp - INFO - response: https://links.duckduckgo.com/d.js?q=where+does+messi+plays+right+now%3F&kl=wt-wt&l=wt-wt&p=&s=0&df=&vqd=4-133820980658443042021681883320626058733&bing_market=wt-WT&ex=-2 200 23789


[32mINFO[0m: [1mAnswer Base On Web Search[0m
[32mINFO[0m: [1mCheck For Hallucination In Generated Answer Base On Web Search[0m
[32mINFO[0m: [1mHallucination detected, Regenerate the answer...[0m


'Lionel Messi currently plays for Inter Miami in Major League Soccer (MLS).'