
## Indox Retrieval Augmentation


Here, we will explore how to work with Indox Retrieval Augmentation. First, if you are using OpenAI, you should set your OpenAI key as an environment variable.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/osllmai/inDox/blob/master/Demo/openai_unstructured.ipynb)

In [None]:
!pip install openai
!pip install indox
!pip install chromadb
!pip install duckduckgo-search

## Setting Up the Python Environment

If you are running this project in your local IDE, please create a Python environment to ensure all dependencies are correctly managed. You can follow the steps below to set up a virtual environment named `indox`:

### Windows

1. **Create the virtual environment:**
```bash
python -m venv indox
```
2. **Activate the virtual environment:**
```bash
indox_judge\Scripts\activate
```

### macOS/Linux

1. **Create the virtual environment:**
   ```bash
   python3 -m venv indox
```

2. **Activate the virtual environment:**
    ```bash
   source indox/bin/activate
```
### Install Dependencies

Once the virtual environment is activated, install the required dependencies by running:

```bash
pip install -r requirements.txt
```


In [1]:
import os
from dotenv import load_dotenv

load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

### Creating an instance of IndoxRetrivalAugmentation

To effectively utilize the Indox Retrieval Augmentation capabilities, you must first create an instance of the IndoxRetrievalAugmentation class. This instance will allow you to access the methods and properties defined within the class, enabling the augmentation and retrieval functionalities.

In [2]:
from indox import IndoxRetrievalAugmentation

indox = IndoxRetrievalAugmentation()

[32mINFO[0m: [1mIndoxRetrievalAugmentation initialized[0m

            ██  ███    ██  ██████   ██████  ██       ██
            ██  ████   ██  ██   ██ ██    ██   ██  ██
            ██  ██ ██  ██  ██   ██ ██    ██     ██
            ██  ██  ██ ██  ██   ██ ██    ██   ██   ██
            ██  ██  █████  ██████   ██████  ██       ██
            


### Generating response using OpenAI's language models 
OpenAIQA class is used to handle question-answering task using OpenAI's language models. This instance creates OpenAiEmbedding class to specifying embedding model. Here ChromaVectorStore handles the storage and retrieval of vector embeddings by specifying a collection name and sets up a vector store where text embeddings can be stored and queried.

In [3]:
from indox.llms import OpenAi
from indox.embeddings import OpenAiEmbedding

openai_qa = OpenAi(api_key=OPENAI_API_KEY, model="gpt-3.5-turbo-0125")
embed_openai = OpenAiEmbedding(api_key=OPENAI_API_KEY, model="text-embedding-3-small")

from indox.vector_stores import Chroma

db = Chroma(collection_name="sample", embedding_function=embed_openai)

[32mINFO[0m: [1mInitializing OpenAi with model: gpt-3.5-turbo-0125[0m
[32mINFO[0m: [1mOpenAi initialized successfully[0m
[32mINFO[0m: [1mInitialized OpenAI embeddings with model: text-embedding-3-small[0m
[32mINFO[0m: [1mConnection to the vector store database established successfully[0m


<indox.vector_stores.Chroma.ChromaVectorStore at 0x249ce6666c0>

### load and preprocess data
This part of code demonstrates how to load and preprocess text data from a file, split it into chunks, and store these chunks in the vector store that was set up previously.

In [None]:
!wget https://raw.githubusercontent.com/osllmai/inDox/master/Demo/sample.txt

In [4]:
file_path = "sample.txt"

In [5]:
from indox.data_loader_splitter import UnstructuredLoadAndSplit

loader_splitter = UnstructuredLoadAndSplit(file_path=file_path, max_chunk_size=400, remove_sword=False)
docs = loader_splitter.load_and_chunk()

[32mINFO[0m: [1mUnstructuredLoadAndSplit initialized successfully[0m
[32mINFO[0m: [1mGetting all documents[0m
[32mINFO[0m: [1mStarting processing[0m
[32mINFO[0m: [1mUsing title-based chunking[0m
[32mINFO[0m: [1mCompleted chunking process[0m
[32mINFO[0m: [1mSuccessfully obtained all documents[0m


In [6]:
docs[0].page_content

"The wife of a rich man fell sick, and as she felt that her endwas drawing near, she called her only daughter to her bedside andsaid, dear child, be good and pious, and then thegood God will always protect you, and I will look down on youfrom heaven and be near you. Thereupon she closed her eyes anddeparted. Every day the maiden went out to her mother's grave,"

In [7]:
db.add(docs=docs)

[32mINFO[0m: [1mStoring documents in the vector store[0m
[32mINFO[0m: [1mDocument added successfully to the vector store.[0m
[32mINFO[0m: [1mDocuments stored successfully[0m


<indox.vector_stores.Chroma.ChromaVectorStore at 0x249ce6666c0>

### Retrieve relevant information and generate an answer
The main purpose of these lines is to perform a query on the vector store to retrieve the most relevant information (top_k=5) and generate an answer using the language model.

In [8]:
query = "How Cinderella reach her happy ending?"
retriever = indox.QuestionAnswer(vector_database=db, llm=openai_qa, top_k=5)

invoke(query) method sends the query to the retriever, which searches the vector store for relevant text chunks and uses the language model to generate a response based on the retrieved information.
Context property retrieves the context or the detailed information that the retriever used to generate the answer to the query. It provides insight into how the query was answered by showing the relevant text chunks and any additional information used.

In [9]:
retriever.invoke(query)

[32mINFO[0m: [1mRetrieving context and scores from the vector database[0m
[32mINFO[0m: [1mGenerating answer without document relevancy filter[0m
[32mINFO[0m: [1mAnswering question[0m
[32mINFO[0m: [1mGenerating response[0m
[32mINFO[0m: [1mResponse generated successfully[0m
[32mINFO[0m: [1mQuery answered successfully[0m


'Cinderella reached her happy ending by being kind, patient, and having a pure heart. Despite facing mistreatment from her step-family, she remained humble and continued to do good deeds. With the help of a little white bird and magical assistance, Cinderella was able to attend the royal festival where the prince fell in love with her. Ultimately, her kindness, inner beauty, and resilience led her to marry the prince and live happily ever after.'

In [10]:
retriever.context

["to appear among the number, they were delighted, called cinderellaand said, comb our hair for us, brush our shoes and fasten ourbuckles, for we are going to the wedding at the king's palace.Cinderella obeyed, but wept, because she too would have liked togo with them to the dance, and begged her step-mother to allowher to do so. You go, cinderella, said she, covered in dust and",
 'cinderella expressed a wish, the bird threw down to her what shehad wished for.It happened, however, that the king gave orders for a festivalwhich was to last three days, and to which all the beautiful younggirls in the country were invited, in order that his son might choosehimself a bride. When the two step-sisters heard that they too were',
 "which they had wished for, and to cinderella he gave the branchfrom the hazel-bush. Cinderella thanked him, went to her mother'sgrave and planted the branch on it, and wept so much that the tearsfell down on it and watered it. And it grew and became a handsometree. 

### With AgenticRag

AgenticRag stands for Agentic Retrieval-Augmented Generation. This concept combines retrieval-based methods and generation-based methods in natural language processing (NLP). The key idea is to enhance the generative capabilities of a language model by incorporating relevant information retrieved from a database or a vector store. 
 AgenticRag is designed to provide more contextually rich and accurate responses by utilizing external knowledge sources. It retrieves relevant pieces of information (chunks) from a vector store based on a query and then uses a language model to generate a comprehensive response that incorporates this retrieved information.

In [11]:
agent = indox.AgenticRag(llm=openai_qa, vector_database=db, top_k=5)
agent.run(query)

2024-06-18 17:35:30,845 INFO:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-06-18 17:35:33,908 INFO:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


Relevant doc


2024-06-18 17:35:34,935 INFO:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


Relevant doc


2024-06-18 17:35:36,219 INFO:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


Not Relevant doc


2024-06-18 17:35:36,984 INFO:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


Relevant doc


2024-06-18 17:35:38,100 INFO:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


Not Relevant doc


2024-06-18 17:35:38,103 INFO:Answering question: How Cinderella reach her happy ending?
2024-06-18 17:35:38,103 INFO:Attempting to generate an answer for the question: How Cinderella reach her happy ending?
2024-06-18 17:35:40,757 INFO:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-06-18 17:35:40,758 INFO:Answer generated successfully
2024-06-18 17:35:42,148 INFO:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-06-18 17:35:42,149 INFO:Agent answer generated successfully
2024-06-18 17:35:42,149 INFO:Hallucination detected, Regenerate the answer...
2024-06-18 17:35:42,150 INFO:Answering question: How Cinderella reach her happy ending?
2024-06-18 17:35:42,150 INFO:Attempting to generate an answer for the question: How Cinderella reach her happy ending?
2024-06-18 17:35:45,066 INFO:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-06-18 17:35:45,068 INFO:Answer generated successfull

"Cinderella reached her happy ending by receiving help from the hazel tree, which grew from the branch given to her by the prince. She planted the branch on her mother's grave, and the tree grew into a handsome tree. Cinderella would sit beneath the tree, weep, and pray, and a little white bird would always come to her. This bird helped Cinderella by providing her with the beautiful dresses and shoes she needed to attend the wedding at the king's palace. Ultimately, with the help of the magical tree and the little white bird, Cinderella was able to overcome the obstacles set by her stepmother and stepsisters and attend the royal wedding, leading to her happy ending."

In [12]:
query_2 = "Where does Messi play right now?"

In [13]:
agent.run(query_2)

2024-06-18 17:35:58,307 INFO:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-06-18 17:36:00,728 INFO:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


Not Relevant doc


2024-06-18 17:36:01,635 INFO:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


Not Relevant doc


2024-06-18 17:36:03,147 INFO:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


Not Relevant doc


2024-06-18 17:36:04,179 INFO:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


Not Relevant doc


2024-06-18 17:36:05,349 INFO:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


Not Relevant doc


2024-06-18 17:36:05,351 INFO:No Relevant document found, Start web search


No Relevant Context Found, Start Searching On Web...
Answer Base On Web Search


2024-06-18 17:36:13,280 INFO:Answering question: Where does Messi play right now?
2024-06-18 17:36:13,281 INFO:Attempting to generate an answer for the question: Where does Messi play right now?
2024-06-18 17:36:15,879 INFO:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-06-18 17:36:15,880 INFO:Answer generated successfully


Check For Hallucination In Generated Answer Base On Web Search


2024-06-18 17:36:17,221 INFO:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-06-18 17:36:17,222 INFO:Agent answer generated successfully
2024-06-18 17:36:17,223 INFO:Hallucination detected, Regenerate the answer...
2024-06-18 17:36:17,223 INFO:Answering question: Where does Messi play right now?
2024-06-18 17:36:17,224 INFO:Attempting to generate an answer for the question: Where does Messi play right now?
2024-06-18 17:36:18,432 INFO:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-06-18 17:36:18,434 INFO:Answer generated successfully


"Messi currently plays for Major League Soccer's Inter Miami CF."