## Indox Retrieval Augmentation
Here, we will explore how to work with Indox Retrieval Augmentation. We are using Mistral as LLM model and HuggingFace for our embedding, we should set our HUGGINGFACE_API_KEY and MISTRAL_API_KEY as an environment variable.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/osllmai/inDox/blob/master/Demo/mistral_clusteredSplit.ipynb)

In [None]:
!pip install indox
!pip install mistralai
!pip install chromadb

## Setting Up the Python Environment

If you are running this project in your local IDE, please create a Python environment to ensure all dependencies are correctly managed. You can follow the steps below to set up a virtual environment named `indox`:

### Windows

1. **Create the virtual environment:**
```bash
  python -m venv indox
```

2. **Activate the virtual environment:**
```bash
  indox\Scripts\activate
```


### macOS/Linux

1. **Create the virtual environment:**
   ```bash
   python3 -m venv indox
   
2. **Activate the virtual environment:**
```bash
  source indox/bin/activate
```

### Install Dependencies

Once the virtual environment is activated, install the required dependencies by running:

```bash
  pip install -r requirements.txt
```


In [None]:
!wget https://raw.githubusercontent.com/osllmai/inDox/master/Demo/sample.txt

## Setting Up the Python Environment

If you are running this project in your local IDE, please create a Python environment to ensure all dependencies are correctly managed. You can follow the steps below to set up a virtual environment named `indox`:

### Windows

1. **Create the virtual environment:**
```bash
python -m venv indox
```
2. **Activate the virtual environment:**
```bash
indox_judge\Scripts\activate
```

### macOS/Linux

1. **Create the virtual environment:**
   ```bash
   python3 -m venv indox
```

2. **Activate the virtual environment:**
    ```bash
   source indox/bin/activate
```
### Install Dependencies

Once the virtual environment is activated, install the required dependencies by running:

```bash
pip install -r requirements.txt
```


In [1]:
import os
from dotenv import load_dotenv

load_dotenv()
MISTRAL_API_KEY = os.getenv('MISTRAL_API_KEY')
HUGGINGFACE_API_KEY = os.getenv('HUGGINGFACE_API_KEY')

### Creating an instance of IndoxTetrivalAugmentation

To effectively utilize the Indox Retrieval Augmentation capabilities, you must first create an instance of the IndoxRetrievalAugmentation class. This instance will allow you to access the methods and properties defined within the class, enabling the augmentation and retrieval functionalities.

In [2]:
from indox import IndoxRetrievalAugmentation
indox = IndoxRetrievalAugmentation()

[32mINFO[0m: [1mIndoxRetrievalAugmentation initialized[0m

            ██  ███    ██  ██████   ██████  ██       ██
            ██  ████   ██  ██   ██ ██    ██   ██  ██
            ██  ██ ██  ██  ██   ██ ██    ██     ██
            ██  ██  ██ ██  ██   ██ ██    ██   ██   ██
            ██  ██  █████  ██████   ██████  ██       ██
            


In [3]:
indox.__version__

'0.1.15'

### Generating response using Mistral's language models 
MistralQA class is used to handle question-answering task using Mistral's language models from HuggingFace. This instance creates HuggingFaceEmbedding class to specifying embedding model.By using UnstructuredLoadAndSplit function we can import various file types and split them into chunks.

In [7]:
from indox.llms import Mistral
from indox.embeddings import HuggingFaceEmbedding
from indox.data_loader_splitter import ClusteredSplit
from indox.embeddings import MistralEmbedding
mistral_qa = Mistral(api_key=MISTRAL_API_KEY)
# embed_hf = HuggingFaceEmbedding(model="multi-qa-mpnet-base-cos-v1")
embed_mistral = MistralEmbedding(MISTRAL_API_KEY,model="mistral-embed")
file_path = "sample.txt"



[32mINFO[0m: [1mInitializing MistralAI with model: mistral-medium-latest[0m
[32mINFO[0m: [1mMistralAI initialized successfully[0m
[32mINFO[0m: [1mInitialized MistralEmbedding with model: mistral-embed[0m


In [8]:
loader_splitter = ClusteredSplit(file_path=file_path,summary_model=mistral_qa,embeddings=embed_mistral)
docs = loader_splitter.load_and_chunk()

[32mINFO[0m: [1mClusteredSplit initialized successfully[0m
[32mINFO[0m: [1mStarting processing for documents[0m
[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings for texts using engine: mistral-embed[0m
[32mINFO[0m: [1m--Generated 6 clusters--[0m
[32mINFO[0m: [1mGenerating summary for documentation[0m
[32mINFO[0m: [1mGenerating summary for documentation[0m
[32mINFO[0m: [1mGenerating summary for documentation[0m
[32mINFO[0m: [1mGenerating summary for documentation[0m
[32mINFO[0m: [1mGenerating summary for documentation[0m
[32mINFO[0m: [1mGenerating summary for documentation[0m
[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings for texts using engine: mistral-embed[0m
[32mINFO[0m: [1m--Generated 1 clusters--[0m
[32mINFO[0m: [1mGenerating summary for documentation[0m
[32mINFO[0m: [1mCompleted chunking & clustering process[0m
[32mINFO[0m: [1mSuccessfully obt

 Here ChromaVectorStore handles the storage and retrieval of vector embeddings by specifying a collection name and sets up a vector store where text embeddings can be stored and queried.

In [9]:
from indox.vector_stores import Chroma
db = Chroma(collection_name="sample",embedding_function=embed_mistral)

[32mINFO[0m: [1mConnection to the vector store database established successfully[0m


<indox.vector_stores.chroma.Chroma at 0x27bb81d07a0>

### load and preprocess data
This part of code demonstrates how to load and preprocess text data from a file, split it into chunks, and store these chunks in the vector store that was set up previously.

In [10]:
from indox.data_loader_splitter import UnstructuredLoadAndSplit
loader_splitter = UnstructuredLoadAndSplit(file_path=file_path,max_chunk_size=400)
docs = loader_splitter.load_and_chunk()

[32mINFO[0m: [1mUnstructuredLoadAndSplit initialized successfully[0m
[32mINFO[0m: [1mGetting all documents[0m
[32mINFO[0m: [1mStarting processing[0m
[32mINFO[0m: [1mUsing title-based chunking[0m
[32mINFO[0m: [1mCompleted chunking process[0m
[32mINFO[0m: [1mSuccessfully obtained all documents[0m


In [11]:
len(docs)

40

In [12]:
db.add(docs=docs)

[32mINFO[0m: [1mStoring documents in the vector store[0m
[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings for texts using engine: mistral-embed[0m
[32mINFO[0m: [1mDocument added successfully to the vector store.[0m
[32mINFO[0m: [1mDocuments stored successfully[0m


<indox.vector_stores.chroma.Chroma at 0x27bb81d07a0>

### Retrieve relevant information and generate an answer
The main purpose of these lines is to perform a query on the vector store to retrieve the most relevant information (top_k=5) and generate an answer using the language model.

In [13]:
query = "How cinderella reach her happy ending?"
retriever = indox.QuestionAnswer(vector_database=db,llm=mistral_qa,top_k=5)

invoke(query) method sends the query to the retriever, which searches the vector store for relevant text chunks and uses the language model to generate a response based on the retrieved information.
Context property retrieves the context or the detailed information that the retriever used to generate the answer to the query. It provides insight into how the query was answered by showing the relevant text chunks and any additional information used.

In [14]:
retriever.invoke(query)

[32mINFO[0m: [1mRetrieving context and scores from the vector database[0m
[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings for texts using engine: mistral-embed[0m
[32mINFO[0m: [1mGenerating answer without document relevancy filter[0m
[32mINFO[0m: [1mAnswering question[0m
[32mINFO[0m: [1mAttempting to generate an answer for the question[0m
[32mINFO[0m: [1mQuery answered successfully[0m


"Cinderella's happy ending began when her fairy godmother appeared and granted her wishes. With the help of the fairy godmother, Cinderella was able to attend the king's festival and dance with the prince. Despite the stepsisters' attempts to keep her from attending, Cinderella was able to go to the ball with the help of her magical dress, glass slippers, and carriage.\n\nAt the ball, the prince fell in love with Cinderella and danced with her all night. However, when the clock struck midnight, Cinderella had to leave in a hurry and accidentally left behind one of her glass slippers. The prince, determined to find the woman he had fallen in love with, searched the entire kingdom for the owner of the glass slipper. When he finally found Cinderella and the slipper fit her foot, he knew he had found his true love.\n\nIn the end, Cinderella married the prince and lived happily ever after. Despite the hardships she faced with her evil stepmother and stepsisters, Cinderella was able to overc

In [18]:
retriever.context

['by the hearth in the cinders. And as on that account she alwayslooked dusty and dirty, they called her cinderella.It happened that the father was once going to the fair, and heasked his two step-daughters what he should bring back for them.Beautiful dresses, said one, pearls and jewels, said the second.And you, cinderella, said he, what will you have. Father',
 "to appear among the number, they were delighted, called cinderellaand said, comb our hair for us, brush our shoes and fasten ourbuckles, for we are going to the wedding at the king's palace.Cinderella obeyed, but wept, because she too would have liked togo with them to the dance, and begged her step-mother to allowher to do so. You go, cinderella, said she, covered in dust and",
 'cinderella expressed a wish, the bird threw down to her what shehad wished for.It happened, however, that the king gave orders for a festivalwhich was to last three days, and to which all the beautiful younggirls in the country were invited, in orde

### With AgenticRag

AgenticRag stands for Agentic Retrieval-Augmented Generation. This concept combines retrieval-based methods and generation-based methods in natural language processing (NLP). The key idea is to enhance the generative capabilities of a language model by incorporating relevant information retrieved from a database or a vector store. 
 AgenticRag is designed to provide more contextually rich and accurate responses by utilizing external knowledge sources. It retrieves relevant pieces of information (chunks) from a vector store based on a query and then uses a language model to generate a comprehensive response that incorporates this retrieved information.

In [20]:
agent = indox.AgenticRag(llm=mistral_qa,vector_database=db,top_k=5)
agent.run("where does messi plays right now?")

2024-07-09 19:30:22,358 INFO:HTTP Request: POST https://api.mistral.ai/v1/embeddings "HTTP/1.1 200 OK"
2024-07-09 19:30:23,247 INFO:HTTP Request: POST https://api.mistral.ai/v1/chat/completions "HTTP/1.1 200 OK"
2024-07-09 19:30:29,288 INFO:HTTP Request: POST https://api.mistral.ai/v1/chat/completions "HTTP/1.1 200 OK"
2024-07-09 19:30:30,881 INFO:HTTP Request: POST https://api.mistral.ai/v1/chat/completions "HTTP/1.1 200 OK"
2024-07-09 19:30:32,840 INFO:HTTP Request: POST https://api.mistral.ai/v1/chat/completions "HTTP/1.1 200 OK"
2024-07-09 19:30:33,450 INFO:HTTP Request: POST https://api.mistral.ai/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mNo Relevant document found, Start web search[0m
[32mINFO[0m: [1mNo Relevant Context Found, Start Searching On Web...[0m
[32mINFO[0m: [1mAnswer Base On Web Search[0m
[32mINFO[0m: [1mAnswering question[0m
[32mINFO[0m: [1mAttempting to generate an answer for the question[0m


2024-07-09 19:30:39,215 INFO:HTTP Request: POST https://api.mistral.ai/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mCheck For Hallucination In Generated Answer Base On Web Search[0m


2024-07-09 19:30:39,726 INFO:HTTP Request: POST https://api.mistral.ai/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mHallucination detected, Regenerate the answer...[0m
[32mINFO[0m: [1mAnswering question[0m
[32mINFO[0m: [1mAttempting to generate an answer for the question[0m


2024-07-09 19:30:40,687 INFO:HTTP Request: POST https://api.mistral.ai/v1/chat/completions "HTTP/1.1 200 OK"


'Based on the provided context information, Lionel Messi currently plays for Inter Miami CF in Major League Soccer (MLS).'