## Deeplake Vector Store
Here, we will explore how to work with Deeplake. We are using OpenAI from Indox Api, we should set our INDOX_OPENAI_API_KEY as an environment variable.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/osllmai/inDox/blob/master/cookbook/indoxArcg/Deeplake_VectorStore.ipynb)

In [None]:
!pip install indoxArcg openai deeplake

## Setting Up the Python Environment

If you are running this project in your local IDE, please create a Python environment to ensure all dependencies are correctly managed. You can follow the steps below to set up a virtual environment named `indoxArcg`:

### Windows

1. **Create the virtual environment:**
```bash
python -m venv indoxArcg
```
2. **Activate the virtual environment:**
```bash
indoxArcg\Scripts\activate
```

### macOS/Linux

1. **Create the virtual environment:**
   ```bash
   python3 -m venv indoxArcg
```

2. **Activate the virtual environment:**
    ```bash
   source indoxArcg/bin/activate
```
### Install Dependencies

Once the virtual environment is activated, install the required dependencies by running:

```bash
pip install -r requirements.txt
```


In [None]:
import os
from dotenv import load_dotenv
load_dotenv()
NERD_TOKEN_API= os.getenv("NERD_TOKEN_API")

### Creating an instance of IndoxTetrivalAugmentation

To effectively utilize the Indox Retrieval Augmentation capabilities, you must first create an instance of the IndoxRetrievalAugmentation class. This instance will allow you to access the methods and properties defined within the class, enabling the augmentation and retrieval functionalities.

### Generating response using Indox
IndoxApi class is used to handle question-answering task using Indox model. This instance creates IndoxOpenAIEmbedding class to specifying embedding model.By using ClusteredSplit function we can import pdf and text file and split them into chunks.

In [None]:
# Import necessary classes from Indox library
from indoxArcg.llms import IndoxApi
from indoxArcg.embeddings import IndoxApiEmbedding
from indoxArcg.data_loader_splitter import ClusteredSplit

# Create instances for API access and text embedding
openai_qa_indox = IndoxApi(api_key=INDOX_API_KEY)
embed_openai_indox = IndoxApiEmbedding(api_key=INDOX_API_KEY, model="text-embedding-3-small")

# Specify the path to your text file
file_path = "sample.txt"

# Create a ClusteredSplit instance for handling file loading and chunking
loader_splitter = ClusteredSplit(file_path=file_path, embeddings=embed_openai_indox, summary_model=openai_qa_indox)

# Load and split the document into chunks using ClusteredSplit
docs = loader_splitter.load_and_chunk()

[32mINFO[0m: [1mInitialized IndoxOpenAIEmbedding with model: text-embedding-3-small[0m
[32mINFO[0m: [1mClusteredSplit initialized successfully[0m
[32mINFO[0m: [1mStarting processing for documents[0m
[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings texts using engine: text-embedding-3-small[0m
[32mINFO[0m: [1m--Generated 1 clusters--[0m
[32mINFO[0m: [1mCompleted chunking & clustering process[0m
[32mINFO[0m: [1mSuccessfully obtained all documents[0m


 Here Deeplake VectorStore handles the storage and retrieval of vector embeddings by specifying a collection name and sets up a vector store where text embeddings can be stored and queried.

In [None]:
from indoxArcg.vector_stores import Deeplake
collection_name = "sample"

db = Deeplake(collection_name=collection_name, embedding_function=embed_openai_indox)


Deep Lake Dataset in /content/vector_store/sample already exists, loading from the storage


### load and preprocess data
This part of code demonstrates how to load and preprocess text data from a file, split it into chunks, and store these chunks in the vector store that was set up previously.

In [6]:
db.add(docs=docs)

Creating 2 embeddings in 1 batches of size 2::   0%|          | 0/1 [00:00<?, ?it/s]

[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings texts using engine: text-embedding-3-small[0m


Creating 2 embeddings in 1 batches of size 2:: 100%|██████████| 1/1 [00:02<00:00,  2.60s/it]

Dataset(path='/content/vector_store/sample', tensors=['embedding', 'id', 'metadata', 'text'])

  tensor      htype      shape      dtype  compression
  -------    -------    -------    -------  ------- 
 embedding  embedding  (48, 1536)  float32   None   
    id        text      (48, 1)      str     None   
 metadata     json      (48, 1)      str     None   
   text       text      (48, 1)      str     None   





In [7]:
from indoxArcg.pipelines.rag import RAG
retriever = RAG(llm=openai_qa_indox,vector_store=db,enable_web_fallback=False,top_k= 5)

invoke(query) method sends the query to the retriever, which searches the vector store for relevant text chunks and uses the language model to generate a response based on the retrieved information.
Context property retrieves the context or the detailed information that the retriever used to generate the answer to the query. It provides insight into how the query was answered by showing the relevant text chunks and any additional information used.

In [8]:
query = "How cinderella reach her happy ending?"

retriever.infer(query)

[32mINFO[0m: [1mRetrieving context and scores from the vector database[0m
[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings texts using engine: text-embedding-3-small[0m
[32mINFO[0m: [1mGenerating answer without document relevancy filter[0m
[32mINFO[0m: [1mQuery answered successfully[0m


'Cinderella reaches her happy ending through a series of transformative events that lead to her escape from a life of hardship and her eventual union with the prince. Here’s a summary of the key steps in her journey:\n\n1. **Kindness and Resilience**: Despite being mistreated by her stepmother and stepsisters, Cinderella remains kind and hopeful. Her resilience in the face of adversity sets the foundation for her eventual happiness.\n\n2. **The Invitation to the Ball**: When the royal family announces a ball to which all young women are invited, Cinderella dreams of attending. Although her stepfamily forbids her from going, her desire to participate in the event highlights her longing for a better life.\n\n3. **The Fairy Godmother**: In'