## Indox Retrieval Augmentation
Here, we will explore how to work with Indox Retrieval Augmentation. We are using OpenAI from Indox Api, we should set our INDOX_OPENAI_API_KEY as an environment variable.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/osllmai/inDox/blob/master/Demo/indox_api_openai.ipynb)

In [None]:
!pip install indox
!pip install chromadb

## Setting Up the Python Environment

If you are running this project in your local IDE, please create a Python environment to ensure all dependencies are correctly managed. You can follow the steps below to set up a virtual environment named `indox`:

### Windows

1. **Create the virtual environment:**
```bash
  python -m venv indox
```

2. **Activate the virtual environment:**
```bash
  indox\Scripts\activate
```


### macOS/Linux

1. **Create the virtual environment:**
   ```bash
   python3 -m venv indox
   
2. **Activate the virtual environment:**
```bash
  source indox/bin/activate
```

### Install Dependencies

Once the virtual environment is activated, install the required dependencies by running:

```bash
  pip install -r requirements.txt
```


In [None]:
!wget https://raw.githubusercontent.com/osllmai/inDox/master/Demo/sample.txt

## Setting Up the Python Environment

If you are running this project in your local IDE, please create a Python environment to ensure all dependencies are correctly managed. You can follow the steps below to set up a virtual environment named `indox`:

### Windows

1. **Create the virtual environment:**
```bash
python -m venv indox
```
2. **Activate the virtual environment:**
```bash
indox_judge\Scripts\activate
```

### macOS/Linux

1. **Create the virtual environment:**
   ```bash
   python3 -m venv indox
```

2. **Activate the virtual environment:**
    ```bash
   source indox/bin/activate
```
### Install Dependencies

Once the virtual environment is activated, install the required dependencies by running:

```bash
pip install -r requirements.txt
```


In [1]:
import os
from dotenv import load_dotenv
load_dotenv()
INDOX_API_KEY= os.getenv("INDOX_API_KEY")

### Creating an instance of IndoxTetrivalAugmentation

To effectively utilize the Indox Retrieval Augmentation capabilities, you must first create an instance of the IndoxRetrievalAugmentation class. This instance will allow you to access the methods and properties defined within the class, enabling the augmentation and retrieval functionalities.

In [2]:
from indox import IndoxRetrievalAugmentation
indox = IndoxRetrievalAugmentation()

[32mINFO[0m: [1mIndoxRetrievalAugmentation initialized[0m

            ██  ███    ██  ██████   ██████  ██       ██
            ██  ████   ██  ██   ██ ██    ██   ██  ██
            ██  ██ ██  ██  ██   ██ ██    ██     ██
            ██  ██  ██ ██  ██   ██ ██    ██   ██   ██
            ██  ██  █████  ██████   ██████  ██       ██
            


### Generating response using Indox
IndoxApi class is used to handle question-answering task using Indox model. This instance creates IndoxOpenAIEmbedding class to specifying embedding model.By using ClusteredSplit function we can import pdf and text file and split them into chunks.

In [3]:
# Import necessary classes from Indox library
from indox.llms import IndoxApi
from indox.embeddings import IndoxApiEmbedding
from indox.data_loader_splitter import ClusteredSplit

# Create instances for API access and text embedding
openai_qa_indox = IndoxApi(api_key=INDOX_API_KEY)
embed_openai_indox = IndoxApiEmbedding(api_key=INDOX_API_KEY, model="text-embedding-3-small")

# Specify the path to your text file
file_path = "sample.txt"

# Create a ClusteredSplit instance for handling file loading and chunking
loader_splitter = ClusteredSplit(file_path=file_path, embeddings=embed_openai_indox, summary_model=openai_qa_indox)

# Load and split the document into chunks using ClusteredSplit
docs = loader_splitter.load_and_chunk()

[32mINFO[0m: [1mInitialized IndoxOpenAIEmbedding with model: text-embedding-3-small[0m
[32mINFO[0m: [1mClusteredSplit initialized successfully[0m
[32mINFO[0m: [1mStarting processing for documents[0m
[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings texts using engine: text-embedding-3-small[0m
[32mINFO[0m: [1m--Generated 7 clusters--[0m
[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings texts using engine: text-embedding-3-small[0m
[32mINFO[0m: [1m--Generated 1 clusters--[0m
[32mINFO[0m: [1mCompleted chunking & clustering process[0m
[32mINFO[0m: [1mSuccessfully obtained all documents[0m


In [5]:
docs[2]

'  They took her pretty clothes away from her, put an old grey bedgown on her, and gave her wooden shoes   Just look at the proud princess, how decked out she is, they cried, and laughed, and led her into the kitchen There she had to do hard work from morning till night, get up before daybreak, carry water, light fires, cook and wash   Besides this, the sisters did her every imaginable injury - they mocked her'

 Here ChromaVectorStore handles the storage and retrieval of vector embeddings by specifying a collection name and sets up a vector store where text embeddings can be stored and queried.

In [6]:
from indox.vector_stores import Chroma

# Define the collection name within the vector store
collection_name = "sample"

# Create a ChromaVectorStore instance
db = Chroma(collection_name=collection_name, embedding_function=embed_openai_indox)

[32mINFO[0m: [1mConnection to the vector store database established successfully[0m


<indox.vector_stores.chroma.Chroma at 0x1f34da28b90>

### load and preprocess data
This part of code demonstrates how to load and preprocess text data from a file, split it into chunks, and store these chunks in the vector store that was set up previously.

In [7]:
db.add(docs=docs)

[32mINFO[0m: [1mStoring documents in the vector store[0m
[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings texts using engine: text-embedding-3-small[0m
[32mINFO[0m: [1mDocument added successfully to the vector store.[0m
[32mINFO[0m: [1mDocuments stored successfully[0m


<indox.vector_stores.chroma.Chroma at 0x1f34da28b90>

### Retrieve relevant information and generate an answer
The main purpose of these lines is to perform a query on the vector store to retrieve the most relevant information (top_k=5) and generate an answer using the language model.

In [8]:
query = "How cinderella reach her happy ending?"
retriever = indox.QuestionAnswer(vector_database=db,llm=openai_qa_indox,top_k=5)

invoke(query) method sends the query to the retriever, which searches the vector store for relevant text chunks and uses the language model to generate a response based on the retrieved information.
Context property retrieves the context or the detailed information that the retriever used to generate the answer to the query. It provides insight into how the query was answered by showing the relevant text chunks and any additional information used.

In [9]:
retriever.invoke(query)

[32mINFO[0m: [1mRetrieving context and scores from the vector database[0m
[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings texts using engine: text-embedding-3-small[0m
[32mINFO[0m: [1mGenerating answer without document relevancy filter[0m
[32mINFO[0m: [1mQuery answered successfully[0m


'Cinderella reaches her happy ending in the classic fairy tale by fitting perfectly into the golden slipper that only the true bride can wear. Despite the attempts of her stepsisters to deceive the prince by cutting off parts of their toes and heels to fit into the slipper, it is Cinderella who ultimately proves her identity by fitting into the shoe perfectly. The prince recognizes her as the one he danced with at the ball, and they ride off together, leaving the false brides behind. This moment of recognition and acceptance by the prince leads Cinderella to her happily ever after, where true love prevails in the end.'

In [10]:
retriever.context

['The provided documentation is a retelling of the classic fairy tale "Cinderella." It describes the story of a young maiden who is mistreated by her stepmother and stepsisters but ultimately finds her happily ever after with the prince. The story revolves around a golden slipper that only fits the true bride, leading to the stepsisters trying to force their feet into the shoe by cutting off parts of their toes and heels. However, it is only Cinderella who fits perfectly into the slipper, revealing her true identity to the prince. The prince recognizes her as the one he danced with at the ball and they ride off together, leaving the false brides behind. The story highlights themes of kindness, perseverance, and true love prevailing in the end.',
 "The documentation provided consists of a retelling of the classic fairy tale of Cinderella. It begins with the wife of a rich man passing away and advising her daughter to be good and pious. After her death, the daughter visits her mother's g

### With AgenticRag

AgenticRag stands for Agentic Retrieval-Augmented Generation. This concept combines retrieval-based methods and generation-based methods in natural language processing (NLP). The key idea is to enhance the generative capabilities of a language model by incorporating relevant information retrieved from a database or a vector store.
 AgenticRag is designed to provide more contextually rich and accurate responses by utilizing external knowledge sources. It retrieves relevant pieces of information (chunks) from a vector store based on a query and then uses a language model to generate a comprehensive response that incorporates this retrieved information.

In [10]:
!pip install duckduckgo_search

2024-07-09 19:13:47,307 INFO:Backing off send_request(...) for 0.7s (requests.exceptions.SSLError: HTTPSConnectionPool(host='us-api.i.posthog.com', port=443): Max retries exceeded with url: /batch/ (Caused by SSLError(SSLError(1, '[SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1000)'))))
2024-07-09 19:13:49,053 ERROR:Giving up send_request(...) after 4 tries (requests.exceptions.SSLError: HTTPSConnectionPool(host='us-api.i.posthog.com', port=443): Max retries exceeded with url: /batch/ (Caused by SSLError(SSLError(1, '[SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1000)'))))




In [None]:
agent = indox.AgenticRag(llm=openai_qa_indox,vector_database=db,top_k=5)
agent.run(query)

Relevant doc
Relevant doc
Relevant doc
Relevant doc
Relevant doc


"Cinderella reaches her happy ending by attending the royal festival with the help of magical elements such as a hazel tree, birds, and a golden slipper. Despite being mistreated by her stepmother and stepsisters, Cinderella's true identity is revealed with the assistance of the magical bird and two white doves. The false stepsisters' deception is exposed, and Cinderella fits perfectly into the golden slipper, proving she is the true bride sought by the prince. As a result, Cinderella marries the king's son and is able to live happily ever after."

In [None]:
query_2 = "where does messi plays right now?"

In [None]:
agent.run(query_2)

Not Relevant doc
Not Relevant doc
Not Relevant doc
Not Relevant doc
Not Relevant doc
No Relevant Context Found, Start Searching On Web...
Answer Base On Web Search
Check For Hallucination In Generated Answer Base On Web Search


'Lionel Messi currently plays for Inter Miami CF in Major League Soccer.'