[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/osllmai/inDox/blob/feature/milvus/Demo/Milvus.ipynb)

In [11]:
!pip install indox
!pip install openai
!pip install pymilvus



## Setting Up the Python Environment

If you are running this project in your local IDE, please create a Python environment to ensure all dependencies are correctly managed. You can follow the steps below to set up a virtual environment named `indox`:

### Windows

1. **Create the virtual environment:**
```bash
python -m venv indox
```
2. **Activate the virtual environment:**
```bash
indox_judge\Scripts\activate
```

### macOS/Linux

1. **Create the virtual environment:**
   ```bash
   python3 -m venv indox
```

2. **Activate the virtual environment:**
    ```bash
   source indox/bin/activate
```
### Install Dependencies

Once the virtual environment is activated, install the required dependencies by running:

```bash
pip install -r requirements.txt
```


In [12]:
import os
from dotenv import load_dotenv

load_dotenv()
INDOX_API_KEY = os.getenv("INDOX_API_KEY")

## Initial Setup

The following imports are essential for setting up the Indox application. These imports include the main Indox retrieval augmentation module, question-answering models, embeddings, and data loader splitter.


In [13]:
from indox import IndoxRetrievalAugmentation
from indox.llms import IndoxApi
from indox.embeddings import IndoxApiEmbedding
from indox.data_loader_splitter import ClusteredSplit
from pymilvus import MilvusClient
import json


In [14]:
indox = IndoxRetrievalAugmentation()
openai_qa_indox = IndoxApi(api_key=INDOX_API_KEY)
embed_openai_indox = IndoxApiEmbedding(api_key=INDOX_API_KEY, model="text-embedding-3-small")


[32mINFO[0m: [1mIndoxRetrievalAugmentation initialized[0m

            ██  ███    ██  ██████   ██████  ██       ██
            ██  ████   ██  ██   ██ ██    ██   ██  ██
            ██  ██ ██  ██  ██   ██ ██    ██     ██
            ██  ██  ██ ██  ██   ██ ██    ██   ██   ██
            ██  ██  █████  ██████   ██████  ██       ██
            
[32mINFO[0m: [1mInitialized IndoxOpenAIEmbedding with model: text-embedding-3-small[0m


In [15]:
#!wget https://raw.githubusercontent.com/osllmai/inDox/master/Demo/sample.txt

In [16]:
file_path = "sample.txt"
loader_splitter = ClusteredSplit(file_path=file_path, embeddings=embed_openai_indox, summary_model=openai_qa_indox)


[32mINFO[0m: [1mClusteredSplit initialized successfully[0m


In [17]:
from indox.vector_stores.milvus import Document, Milvus
raw_docs = loader_splitter.load_and_chunk()
docs = [Document(page_content=doc) for doc in raw_docs]


[32mINFO[0m: [1mStarting processing for documents[0m
[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings texts using engine: text-embedding-3-small[0m
[32mINFO[0m: [1m--Generated 1 clusters--[0m
[32mINFO[0m: [1mCompleted chunking & clustering process[0m
[32mINFO[0m: [1mSuccessfully obtained all documents[0m


## Vector Store Connection and Document Storage

In this step, we connect the Indox application to the vector store and store the processed documents.


In [18]:
db = Milvus(collection_name="sample", embedding_model=embed_openai_indox, qa_model=openai_qa_indox)

DEBUG:pymilvus.milvus_client.milvus_client:Created new connection using: 920b74cd66fa487ca50486f4da7d1106


[32mINFO[0m: [1mIndoxRetrievalAugmentation initialized[0m

            ██  ███    ██  ██████   ██████  ██       ██
            ██  ████   ██  ██   ██ ██    ██   ██  ██
            ██  ██ ██  ██  ██   ██ ██    ██     ██
            ██  ██  ██ ██  ██   ██ ██    ██   ██   ██
            ██  ██  █████  ██████   ██████  ██       ██
            
[32mINFO[0m: [1mConnection to the vector store database established successfully[0m


In [19]:
Indox = IndoxRetrievalAugmentation()

Indox.connect_to_vectorstore(db)

[32mINFO[0m: [1mIndoxRetrievalAugmentation initialized[0m

            ██  ███    ██  ██████   ██████  ██       ██
            ██  ████   ██  ██   ██ ██    ██   ██  ██
            ██  ██ ██  ██  ██   ██ ██    ██     ██
            ██  ██  ██ ██  ██   ██ ██    ██   ██   ██
            ██  ██  █████  ██████   ██████  ██       ██
            
[32mINFO[0m: [1mConnection to the vector store database established successfully[0m


<indox.vector_stores.milvus.Milvus at 0x225cca1e0b0>

In [21]:
db.store_in_vectorstore(docs)



Creating embeddings:   0%|          | 0/2 [00:00<?, ?it/s][A[A

[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings texts using engine: text-embedding-3-small[0m




Creating embeddings:  50%|█████     | 1/2 [00:02<00:02,  2.29s/it][A[A

[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings texts using engine: text-embedding-3-small[0m




Creating embeddings: 100%|██████████| 2/2 [00:04<00:00,  2.34s/it][A[A


## Querying and Interpreting the Response

In this step, we query the Indox application with a specific question and use the QA model to get the response. 



In [23]:
query = "How cinderella reach her happy ending?"
retriever = indox.QuestionAnswer(vector_database=db, llm=openai_qa_indox, top_k=5)
retriever.invoke(query)

[32mINFO[0m: [1mRetrieving context and scores from the vector database[0m
[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings texts using engine: text-embedding-3-small[0m
[32mINFO[0m: [1mGenerating answer without document relevancy filter[0m
[32mINFO[0m: [1mQuery answered successfully[0m


"In the story of Cinderella, she reaches her happy ending through her kindness, resilience, and unwavering belief in herself. Despite facing adversity and mistreatment from her stepfamily, Cinderella remains good-hearted and hopeful. With the help of her fairy godmother, she attends the royal ball and captures the heart of the prince. Ultimately, Cinderella's pure heart and inner strength lead her to her happy ending, where she finds love, acceptance, and a new life full of joy and prosperity."