# DuckDB 
In this notebook, we will demonstrate how to use DuckDB, for accessing and querying data efficiently. DuckDB is designed to work seamlessly with modern analytical workloads, making it a powerful tool for data analysis, research, and question-answering systems.

To begin, ensure you have DuckDB installed in your Python environment. You can easily install it using `pip install duckdb`. DuckDB does not require a server, so you can start querying data directly in your local environment without any additional setup.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/osllmai/inDox/blob/master/Demo/duckdb.ipynb)

In [None]:
!pip install indox
!pip install chromadb
!pip install semantic_text_splitter
!pip install sentence-transformers

## Setting Up the Python Environment

If you are running this project in your local IDE, please create a Python environment to ensure all dependencies are correctly managed. You can follow the steps below to set up a virtual environment named `indox`:

### Windows

1. **Create the virtual environment:**
```bash
python -m venv indox
```
2. **Activate the virtual environment:**
```bash
indox_judge\Scripts\activate
```

### macOS/Linux

1. **Create the virtual environment:**
   ```bash
   python3 -m venv indox
```

2. **Activate the virtual environment:**
    ```bash
   source indox/bin/activate
```
### Install Dependencies

Once the virtual environment is activated, install the required dependencies by running:

```bash
pip install -r requirements.txt
```


### Load Hugging face API key 

In [1]:
import os
from dotenv import load_dotenv

load_dotenv('api.env')

HUGGINGFACE_API_KEY = os.environ['HUGGINGFACE_API_KEY']

In [2]:
from indox import IndoxRetrievalAugmentation
indox = IndoxRetrievalAugmentation()

[32mINFO[0m: [1mIndoxRetrievalAugmentation initialized[0m

            ██  ███    ██  ██████   ██████  ██       ██
            ██  ████   ██  ██   ██ ██    ██   ██  ██
            ██  ██ ██  ██  ██   ██ ██    ██     ██
            ██  ██  ██ ██  ██   ██ ██    ██   ██   ██
            ██  ██  █████  ██████   ██████  ██       ██
            


Initialize a language model and an embedding model using the indox library with Hugging Face and Azure services. The HuggingFaceModel class is used to create an instance of the Mistral-7B-Instruct model for tasks like question answering, while the AzureEmbedding would handle embedding tasks.

In [3]:
from indox.llms import HuggingFaceModel
from indox.embeddings import HuggingFaceEmbedding
mistral_qa = HuggingFaceModel(api_key=HUGGINGFACE_API_KEY,model="mistralai/Mistral-7B-Instruct-v0.2")
embed = HuggingFaceEmbedding(api_key=HUGGINGFACE_API_KEY,model="multi-qa-mpnet-base-cos-v1")

[32mINFO[0m: [1mInitializing HuggingFaceModel with model: mistralai/Mistral-7B-Instruct-v0.2[0m
[32mINFO[0m: [1mHuggingFaceModel initialized successfully[0m
[32mINFO[0m: [1mInitialized HuggingFaceEmbedding with model: multi-qa-mpnet-base-cos-v1[0m


### Load Sample text 

In [4]:
file_path = "sample.txt"
with open(file_path, "r") as file:
    text = file.read()

use the `RecursiveCharacterTextSplitter` class from the indox library to divide a large text into smaller, manageable chunks

In [5]:
from indox.splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(400,20)
content_chunks = splitter.split_text(text)

In [6]:
content_chunks

["The wife of a rich man fell sick, and as she felt that her end\n\nwas drawing near, she called her only daughter to her bedside and\n\nsaid, dear child, be good and pious, and then the\n\ngood God will always protect you, and I will look down on you\n\nfrom heaven and be near you.  Thereupon she closed her eyes and\n\ndeparted.  Every day the maiden went out to her mother's grave,",
 'and wept, and she remained pious and good.  When winter came\n\nthe snow spread a white sheet over the grave, and by the time the\n\nspring sun had drawn it off again, the man had taken another wife.\n\nThe woman had brought with her into the house two daughters,\n\nwho were beautiful and fair of face, but vile and black of heart.\n\nNow began a bad time for the poor step-child.  Is the stupid goose',
 'to sit in the parlor with us, they said.  He who wants to eat bread\n\nmust earn it.  Out with the kitchen-wench.  They took her pretty\n\nclothes away from her, put an old grey bedgown on her, and gave\

### Set up vector store
Set up a vector store using the `DuckDB` class from the indox library.

In [7]:
from indox.vector_stores import DuckDB
vector_store = DuckDB(
    embedding_function=embed,
    vector_key="embedding",   
    id_key="id",              
    text_key="text",          
    table_name="embeddings"
)

### Storing Data in the Vector Store

In [8]:
vector_store.add(texts=content_chunks)

[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings for texts using model: SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)[0m


Batches:   0%|          | 0/2 [00:00<?, ?it/s]

['899c965b-d352-4df5-902f-7b3778354a00',
 'ceb6e2f6-5d79-4105-933b-96cfe81b3bc1',
 '02a34963-ee70-4895-8dea-e846c022233d',
 'b3228b7f-fdcb-4b31-8377-3295b4f23246',
 '1978ca17-8f99-4ffe-8ed3-bb26eea44117',
 '8d79f2ef-90d1-465d-ad74-19f6648fe589',
 '9969dd11-7d27-4a1c-a63f-a5b09a9e347f',
 '49d10bc5-b0ca-4182-b652-524dfd207dae',
 'f298f90a-0d9e-428d-bc0a-4cd7a1ea83af',
 '5e39cd79-cfbf-4489-835c-c35c503f04e6',
 'fdb61563-be85-4c99-a4d7-d982f5a86dba',
 '632dd926-4052-4932-a189-10eb3dd6891d',
 '6f1e76a3-4f01-430e-ac51-a34184af2bfd',
 '4da1d23c-dd2f-4303-acfd-778b9ef906e6',
 '93e80298-47c5-4869-a2b2-1e6184162f52',
 'bb3c4763-2127-4399-994c-b319a8bcda59',
 '5b5bc067-c743-4e7c-b9ed-fd6bde84b9c3',
 '4a244958-7c75-4930-a1f4-ccadaa72a25f',
 '639674c4-c18a-4ed0-b1e9-6992daf1e2a3',
 '1ccbd68c-23bd-4196-b021-44a06b9db4f6',
 '4273aa39-ce16-4b4b-8789-fa45e9639c41',
 '0c81ee55-70ce-4a8e-825b-256129eead66',
 '41a14c4d-c6db-4fa1-ad6e-229426be6fac',
 '2e1a9a3a-5cfe-49f1-85c2-b46c2d27b38b',
 'eb64854f-88ba-

In [9]:
query = "How cinderella reach her happy ending?"


In [10]:
retriever = indox.QuestionAnswer(vector_database=vector_store,llm=mistral_qa,top_k=5)


In [11]:
answer = retriever.invoke(query=query)


[32mINFO[0m: [1mRetrieving context and scores from the vector database[0m
[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings for texts using model: SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)[0m


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

[32mINFO[0m: [1mGenerating answer without document relevancy filter[0m
[32mINFO[0m: [1mAnswering question[0m
[32mINFO[0m: [1mSending request to Hugging Face API[0m
[32mINFO[0m: [1mReceived successful response from Hugging Face API[0m
[32mINFO[0m: [1mQuery answered successfully[0m


In [12]:
answer

"Cinderella reached her happy ending when the prince recognized her as the beautiful maiden he had met at the ball, after she had wished for the items the bird dropped for her from the tree. Despite her step-mother and step-sisters' attempts to prevent her from attending the royal festival, Cinderella went to the hazel-tree and repeated the magic words, causing silver and gold to be showered upon her. The prince was once again drawn to her, and they"

In [13]:
context = retriever.context
context

['by the hearth in the cinders.  And as on that account she always\n\nlooked dusty and dirty, they called her cinderella.\n\nIt happened that the father was once going to the fair, and he\n\nasked his two step-daughters what he should bring back for them.\n\nBeautiful dresses, said one, pearls and jewels, said the second.\n\nAnd you, cinderella, said he, what will you have.  Father',
 'cinderella expressed a wish, the bird threw down to her what she\n\nhad wished for.\n\nIt happened, however, that the king gave orders for a festival\n\nwhich was to last three days, and to which all the beautiful young\n\ngirls in the country were invited, in order that his son might choose\n\nhimself a bride.  When the two step-sisters heard that they too were',
 "glove.  And when she rose up and the king's son looked at her\n\nface he recognized the beautiful maiden who had danced with\n\nhim and cried, that is the true bride.  The step-mother and\n\nthe two sisters were horrified and became pale with