[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/osllmai/inDox/blob/master/Demo/openai_clusterSplit.ipynb)

In [None]:
!pip install indox
!pip install openai
!pip install chromadb

## Setting Up the Python Environment

If you are running this project in your local IDE, please create a Python environment to ensure all dependencies are correctly managed. You can follow the steps below to set up a virtual environment named `indox`:

### Windows

1. **Create the virtual environment:**
```bash
python -m venv indox
```
2. **Activate the virtual environment:**
```bash
indox_judge\Scripts\activate
```

### macOS/Linux

1. **Create the virtual environment:**
   ```bash
   python3 -m venv indox
```

2. **Activate the virtual environment:**
    ```bash
   source indox/bin/activate
```
### Install Dependencies

Once the virtual environment is activated, install the required dependencies by running:

```bash
pip install -r requirements.txt
```


In [1]:
import os
from dotenv import load_dotenv

load_dotenv()
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')

## Initial Setup

The following imports are essential for setting up the Indox application. These imports include the main Indox retrieval augmentation module, question-answering models, embeddings, and data loader splitter.


In [2]:
from indox import IndoxRetrievalAugmentation
from indox.llms import OpenAi
from indox.embeddings import OpenAiEmbedding
from indox.data_loader_splitter import ClusteredSplit

In this step, we initialize the Indox Retrieval Augmentation, the QA model, and the embedding model. Note that the models used for QA and embedding can vary depending on the specific requirements.


In [3]:
Indox = IndoxRetrievalAugmentation()
qa_model = OpenAi(api_key=OPENAI_API_KEY,model="gpt-3.5-turbo-0125")
embed = OpenAiEmbedding(api_key=OPENAI_API_KEY,model="text-embedding-3-small")

[32mINFO[0m: [1mIndoxRetrievalAugmentation initialized[0m

            ██  ███    ██  ██████   ██████  ██       ██
            ██  ████   ██  ██   ██ ██    ██   ██  ██
            ██  ██ ██  ██  ██   ██ ██    ██     ██
            ██  ██  ██ ██  ██   ██ ██    ██   ██   ██
            ██  ██  █████  ██████   ██████  ██       ██
            
[32mINFO[0m: [1mInitializing OpenAi with model: gpt-3.5-turbo-0125[0m
[32mINFO[0m: [1mOpenAi initialized successfully[0m
[32mINFO[0m: [1mInitialized OpenAiEmbedding with model: text-embedding-3-small[0m


## Data Loader Setup

We set up the data loader using the `ClusteredSplit` class. This step involves loading documents, configuring embeddings, and setting options for processing the text.


In [None]:
!wget https://raw.githubusercontent.com/osllmai/inDox/master/Demo/sample.txt

In [4]:
loader_splitter = ClusteredSplit(file_path="sample.txt",embeddings=embed,remove_sword=False,re_chunk=False,chunk_size=300,summary_model=qa_model)

[32mINFO[0m: [1mClusteredSplit initialized successfully[0m


In [5]:
docs = loader_splitter.load_and_chunk()

[32mINFO[0m: [1mStarting processing for documents[0m
[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings for texts using engine: text-embedding-3-small[0m
[32mINFO[0m: [1m--Generated 1 clusters--[0m
[32mINFO[0m: [1mGenerating summary for documentation[0m
[32mINFO[0m: [1mGenerating response[0m
[32mINFO[0m: [1mResponse generated successfully[0m
[32mINFO[0m: [1mCompleted chunking & clustering process[0m
[32mINFO[0m: [1mSuccessfully obtained all documents[0m


## Vector Store Connection and Document Storage

In this step, we connect the Indox application to the vector store and store the processed documents.


In [6]:
from indox.vector_stores import Chroma
db = Chroma(collection_name="sample",embedding_function=embed)

In [8]:
db.add(docs=docs)

[32mINFO[0m: [1mStoring documents in the vector store[0m
[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings for texts using engine: text-embedding-3-small[0m
[32mINFO[0m: [1mDocument added successfully to the vector store.[0m
[32mINFO[0m: [1mDocuments stored successfully[0m


<indox.vector_stores.chroma.Chroma at 0x213e3f459a0>

## Querying and Interpreting the Response

In this step, we query the Indox application with a specific question and use the QA model to get the response. 



In [9]:
retriever = Indox.QuestionAnswer(vector_database=db,llm=qa_model,top_k=5)

In [10]:
retriever.invoke(query="How cinderella reach happy ending?")

[32mINFO[0m: [1mRetrieving context and scores from the vector database[0m
[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings for texts using engine: text-embedding-3-small[0m
[32mINFO[0m: [1mGenerating answer without document relevancy filter[0m
[32mINFO[0m: [1mAnswering question[0m
[32mINFO[0m: [1mGenerating response[0m
[32mINFO[0m: [1mResponse generated successfully[0m
[32mINFO[0m: [1mQuery answered successfully[0m


"Cinderella reached her happy ending by attending the royal festival with the help of a magical hazel tree and a little white bird. Despite her stepmother and stepsisters' attempts to keep her from going, Cinderella was able to attend the festival in a splendid dress and golden slippers provided by the bird. At the festival, the prince danced only with Cinderella and was captivated by her beauty. When Cinderella tried to leave, the prince tried to follow her, but she escaped. However, the prince found her golden slipper that she left behind on the staircase. The prince then searched for the owner of the slipper and eventually found Cinderella, fitting the slipper perfectly. This led to Cinderella marrying the prince and living happily ever after."

In [11]:
retriever.context

["They never once thought of cinderella, and believed that she was sitting at home in the dirt, picking lentils out of the ashes   The prince approached her, took her by the hand and danced with her He would dance with no other maiden, and never let loose of her hand, and if any one else came to invite her, he said, this is my partner She danced till it was evening, and then she wanted to go home But the king's son said, I will go with you and bear you company, for he wished to see to whom the beautiful maiden belonged She escaped from him, however, and sprang into the pigeon-house   The king's son waited until her father came, and then he told him that the unknown maiden had leapt into the pigeon-house   The old man thought, can it be cinderella   And they had to bring him an axe and a pickaxe that he might hew the pigeon-house to pieces, but no one was inside it   And when they got home cinderella lay in her dirty clothes among the ashes, and a dim little oil-lamp was burning on the 