# Mistral as a Question Answering 

In this notebook, we will demonstrate how to securely handle `inDox` as system for question answering system with open source models which are available on internet like `Mistral`. so firstly you should buil environment variables and API keys in Python using the `dotenv` library. Environment variables are a crucial part of configuring your applications, especially when dealing with sensitive information like API keys.

::: {.callout-note}
Because we are using **HuggingFace** models you need to define your `HUGGINGFACE_API_KEY` in `.env` file. This allows us to keep our API keys and other sensitive information out of our codebase, enhancing security and maintainability.
:::

Let's start by importing the required libraries and loading our environment variables.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/osllmai/inDox/blob/master/Demo/mistral_eval.ipynb)

In [None]:
!pip install indox
!pip install chromadb
!pip install mistralai

## Setting Up the Python Environment

If you are running this project in your local IDE, please create a Python environment to ensure all dependencies are correctly managed. You can follow the steps below to set up a virtual environment named `indox`:

### Windows

1. **Create the virtual environment:**
```bash
python -m venv indox
```
2. **Activate the virtual environment:**
```bash
indox_judge\Scripts\activate
```

### macOS/Linux

1. **Create the virtual environment:**
   ```bash
   python3 -m venv indox
```

2. **Activate the virtual environment:**
    ```bash
   source indox/bin/activate
```
### Install Dependencies

Once the virtual environment is activated, install the required dependencies by running:

```bash
pip install -r requirements.txt
```


In [14]:
import os
from dotenv import load_dotenv

load_dotenv()
HUGGINGFACE_API_KEY = os.getenv('HUGGINGFACE_API_KEY')
MISTRAL_API_KEY = os.getenv('MISTRAL_API_KEY')


### Import Essential Libraries 
Then, we import essential libraries for our `Indox` question answering system:
- `IndoxRetrievalAugmentation`: Enhances the retrieval process for better QA performance.
- `MistralQA`: A powerful QA model from Indox, built on top of the Hugging Face model.
- `HuggingFaceEmbedding`: Utilizes Hugging Face embeddings for improved semantic understanding.
- `UnstructuredLoadAndSplit`: A utility for loading and splitting unstructured data.

In [2]:
from indox import IndoxRetrievalAugmentation
from indox.llms import Mistral
from indox.embeddings import HuggingFaceEmbedding
from indox.data_loader_splitter import UnstructuredLoadAndSplit

### Building the Indox System and Initializing Models

Next, we will build our `inDox` system and initialize the Mistral question answering model along with the embedding model. This setup will allow us to leverage the advanced capabilities of Indox for our question answering tasks.


In [18]:
indox = IndoxRetrievalAugmentation()
mistral_qa = Mistral(api_key=MISTRAL_API_KEY)
embed = HuggingFaceEmbedding(model="multi-qa-mpnet-base-cos-v1")

[32mINFO[0m: [1mIndoxRetrievalAugmentation initialized[0m

            ██  ███    ██  ██████   ██████  ██       ██
            ██  ████   ██  ██   ██ ██    ██   ██  ██
            ██  ██ ██  ██  ██   ██ ██    ██     ██
            ██  ██  ██ ██  ██   ██ ██    ██   ██   ██
            ██  ██  █████  ██████   ██████  ██       ██
            
[32mINFO[0m: [1mInitializing MistralAI with model: mistral-medium-latest[0m
[32mINFO[0m: [1mMistralAI initialized successfully[0m


2024-07-09 19:37:28,314 INFO:Load pretrained SentenceTransformer: multi-qa-mpnet-base-cos-v1
2024-07-09 19:37:29,358 INFO:Use pytorch device: cpu


[32mINFO[0m: [1mInitialized HuggingFace embeddings with model: multi-qa-mpnet-base-cos-v1[0m


### Setting Up Reference Directory and File Path

To demonstrate the capabilities of our Indox question answering system, we will use a sample directory. This directory will contain our reference data, which we will use for testing and evaluation.

First, we specify the path to our sample file. In this case, we are using a file named `sample.txt` located in our working directory. This file will serve as our reference data for the subsequent steps.

Let's define the file path for our reference data.

In [None]:
!wget https://raw.githubusercontent.com/osllmai/inDox/master/Demo/sample.txt

In [4]:
file_path = "sample.txt"

### Chunking Reference Data with UnstructuredLoadAndSplit

To effectively utilize our reference data, we need to process and chunk it into manageable parts. This ensures that our question answering system can efficiently handle and retrieve relevant information.

We use the `UnstructuredLoadAndSplit` utility for this task. This tool allows us to load the unstructured data from our specified file and split it into smaller chunks. This process enhances the performance of our retrieval and QA models by making the data more accessible and easier to process.

In this step, we define the file path for our reference data and use `UnstructuredLoadAndSplit` to chunk the data with a maximum chunk size of 400 characters.

Let's proceed with chunking our reference data.


In [5]:
load_splitter = UnstructuredLoadAndSplit(file_path=file_path,max_chunk_size=400)
docs = load_splitter.load_and_chunk()

[32mINFO[0m: [1mUnstructuredLoadAndSplit initialized successfully[0m
[32mINFO[0m: [1mGetting all documents[0m
[32mINFO[0m: [1mStarting processing[0m
[32mINFO[0m: [1mUsing title-based chunking[0m
[32mINFO[0m: [1mCompleted chunking process[0m
[32mINFO[0m: [1mSuccessfully obtained all documents[0m


### Connecting Embedding Model to Indox

With our reference data chunked and ready, the next step is to connect our embedding model to the Indox system. This connection enables the system to leverage the embeddings for better semantic understanding and retrieval performance.

Let's connect the embedding model to Indox.


In [6]:
from indox.vector_stores import Chroma
db = Chroma(collection_name="sample",embedding_function=embed)

2024-07-09 19:35:34,772 INFO:Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.


### Storing Data in the Vector Store

After connecting our embedding model to the Indox system, the next step is to store our chunked reference data in the vector store. This process ensures that our data is indexed and readily available for retrieval during the question-answering process.

Let's proceed with storing the data in the vector store.


In [9]:
db.add(docs=docs)

[32mINFO[0m: [1mStoring documents in the vector store[0m
[32mINFO[0m: [1mDocument added successfully to the vector store.[0m
[32mINFO[0m: [1mDocuments stored successfully[0m


<indox.vector_stores.Chroma.ChromaVectorStore at 0x1a933e0ac30>

## Testing the RAG System with Indox
With our Retrieval-Augmented Generation (RAG) system built using Indox, we are now ready to test it with a sample question. This test will demonstrate how effectively our system can retrieve and generate accurate answers based on the reference data stored in the vector store.

We'll use a sample query to test our system:
- **Query**: "How did Cinderella reach her happy ending?"

This question will be processed by our Indox system to retrieve relevant information and generate an appropriate response.

Let's test our RAG system with the sample question.

In [10]:
query = "How cinderella reach her happy ending?"

Now that our Retrieval-Augmented Generation (RAG) system with Indox is fully set up, we can test it with a sample question. We'll use the `answer_question` submethod to get a response from the system.

::: {.callout-note}

The `answer_question` method processes the query using the connected QA model and retrieves relevant information from the vector store. It returns a list where:
- The first index contains the answer.
- The second index contains the contexts and their respective scores.

:::

We'll pass this query to the `answer_question` method and print the response.


In [19]:
retriever = indox.QuestionAnswer(vector_database=db,llm=mistral_qa,top_k=5)

**Answer:**

In [20]:
answer = retriever.invoke(query=query)

[32mINFO[0m: [1mRetrieving context and scores from the vector database[0m
[32mINFO[0m: [1mGenerating answer without document relevancy filter[0m
[32mINFO[0m: [1mAnswering question[0m
[32mINFO[0m: [1mAttempting to generate an answer for the question[0m


2024-07-09 19:37:41,404 INFO:HTTP Request: POST https://api.mistral.ai/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mQuery answered successfully[0m


In [21]:
answer

"Based on the context provided, it is not explicitly stated how Cinderella reaches her happy ending. However, it can be inferred that she is invited to a festival by the king, where his son is looking for a bride. It is mentioned that Cinderella's stepsisters are also invited to the festival, but there is no mention of Cinderella attending it in the provided text. Additionally, there is a mention of a bird granting Cinderella's wish, and a scene where her father cuts down a pear tree looking for an unknown maiden who is believed to be Cinderella. However, without further context, it is impossible to determine how Cinderella ultimately reaches her happy ending."

**Contexts**

In [22]:
context = retriever.context
context

['by the hearth in the cinders. And as on that account she alwayslooked dusty and dirty, they called her cinderella.It happened that the father was once going to the fair, and heasked his two step-daughters what he should bring back for them.Beautiful dresses, said one, pearls and jewels, said the second.And you, cinderella, said he, what will you have. Father',
 'by the hearth in the cinders. And as on that account she alwayslooked dusty and dirty, they called her cinderella.It happened that the father was once going to the fair, and heasked his two step-daughters what he should bring back for them.Beautiful dresses, said one, pearls and jewels, said the second.And you, cinderella, said he, what will you have. Father',
 'cinderella expressed a wish, the bird threw down to her what shehad wished for.It happened, however, that the king gave orders for a festivalwhich was to last three days, and to which all the beautiful younggirls in the country were invited, in order that his son migh