# DuckDB 
In this notebook, we will demonstrate how to use DuckDB, for accessing and querying data efficiently. DuckDB is designed to work seamlessly with modern analytical workloads, making it a powerful tool for data analysis, research, and question-answering systems.

To begin, ensure you have DuckDB installed in your Python environment. You can easily install it using `pip install duckdb`. DuckDB does not require a server, so you can start querying data directly in your local environment without any additional setup.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/osllmai/inDox/blob/master/cookbook/indoxArcg/duckdb.ipynb)

In [None]:
!pip install duckdb semantic_text_splitter sentence-transformers indoxArcg

## Setting Up the Python Environment

If you are running this project in your local IDE, please create a Python environment to ensure all dependencies are correctly managed. You can follow the steps below to set up a virtual environment named `indoxArcg`:

### Windows

1. **Create the virtual environment:**
```bash
python -m venv indoxArcg
```
2. **Activate the virtual environment:**
```bash
indoxArcg\Scripts\activate
```

### macOS/Linux

1. **Create the virtual environment:**
   ```bash
   python3 -m venv indoxArcg
```

2. **Activate the virtual environment:**
    ```bash
   source indoxArcg/bin/activate
```
### Install Dependencies

Once the virtual environment is activated, install the required dependencies by running:

```bash
pip install -r requirements.txt
```


### Load Hugging face API key 

In [2]:
import os
from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.environ['OPENAI_API_KEY']
HUGGINGFACE_API_KEY = os.environ['HUGGINGFACE_API_KEY']

Initialize a language model and an embedding model using the indox library with Hugging Face and Azure services. The HuggingFaceAPIModel class is used to create an instance of the Mistral-7B-Instruct model for tasks like question answering, while the AzureEmbedding would handle embedding tasks.

In [4]:
from indoxArcg.llms import HuggingFaceAPIModel
from indoxArcg.embeddings import AzureOpenAIEmbeddings
mistral_qa = HuggingFaceAPIModel(api_key=HUGGINGFACE_API_KEY,model="mistralai/Mistral-7B-Instruct-v0.2")
azure_embed = AzureOpenAIEmbeddings(api_key=OPENAI_API_KEY,model="text-embedding-3-small")

[32mINFO[0m: [1mInitializing HuggingFaceAPIModel with model: mistralai/Mistral-7B-Instruct-v0.2[0m
[32mINFO[0m: [1mHuggingFaceAPIModel initialized successfully[0m
[32mINFO[0m: [1mInitialized OpenAiEmbedding with model: text-embedding-3-small[0m


### Load Sample text 

In [None]:
!wget https://raw.githubusercontent.com/osllmai/inDox/master/Demo/sample.txt

In [5]:
file_path = "sample.txt"
with open(file_path, "r") as file:
    text = file.read()

use the `RecursiveCharacterTextSplitter` class from the indox library to divide a large text into smaller, manageable chunks

In [6]:
from indoxArcg.splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(400,20)
content_chunks = splitter.split_text(text)

In [7]:
content_chunks[:3]

["The wife of a rich man fell sick, and as she felt that her end\n\nwas drawing near, she called her only daughter to her bedside and\n\nsaid, dear child, be good and pious, and then the\n\ngood God will always protect you, and I will look down on you\n\nfrom heaven and be near you.  Thereupon she closed her eyes and\n\ndeparted.  Every day the maiden went out to her mother's grave,",
 'and wept, and she remained pious and good.  When winter came\n\nthe snow spread a white sheet over the grave, and by the time the\n\nspring sun had drawn it off again, the man had taken another wife.\n\nThe woman had brought with her into the house two daughters,\n\nwho were beautiful and fair of face, but vile and black of heart.\n\nNow began a bad time for the poor step-child.  Is the stupid goose',
 'to sit in the parlor with us, they said.  He who wants to eat bread\n\nmust earn it.  Out with the kitchen-wench.  They took her pretty\n\nclothes away from her, put an old grey bedgown on her, and gave\

### Set up vector store
Set up a vector store using the `DuckDB` class from the indox library.

In [8]:
from indoxArcg.vector_stores import DuckDB
vector_store = DuckDB(
    embedding_function=azure_embed,
    vector_key="embedding",   
    id_key="id",              
    text_key="text",          
    table_name="embeddings"
)

2024-12-08 17:03:31,777 - numexpr.utils - INFO - NumExpr defaulting to 8 threads.


### Storing Data in the Vector Store

In [9]:
vector_store.add(texts=content_chunks)

[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings for texts using engine: text-embedding-3-small[0m


2024-12-08 17:03:34,939 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-12-08 17:03:35,679 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-12-08 17:03:36,414 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-12-08 17:03:37,061 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-12-08 17:03:37,732 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-12-08 17:03:38,498 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-12-08 17:03:39,164 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-12-08 17:03:39,601 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-12-08 17:03:40,028 - httpx - INFO - HTTP Request: POST https://api.openai.c

['9bdf5adf-c389-4702-a12a-b81c92463f5c',
 'a3bac611-fc45-4b5e-ac30-8976eb69f971',
 '3f9bb310-2933-45e1-82d7-fb68fa0b6e67',
 'ac0a5b85-f013-4348-a844-fef414fa24b3',
 '4058a03a-2a12-44f6-a4e1-a7fa2846243b',
 '1902e169-9113-4e2b-bb0d-987073931315',
 '173de74f-9268-4c65-8023-73d1e4a94723',
 '744d3820-fcd0-4e2a-b386-15fe4e0cd309',
 '5040d7ee-63f9-4edf-ac52-46f09c90f2ba',
 '666e6dfb-1c95-432a-b28e-aaa83e16e96a',
 '15da04a3-fa30-4d15-b564-0620d0519405',
 '0e2a2afc-435d-4600-add5-b3337f9fd779',
 '943bac28-5cd9-4df4-b637-a785ec6161e0',
 '5bcebbc8-b437-4cd8-8f85-f93535edb0e5',
 '9a47b82a-569c-4972-bdca-86425a4e2ee3',
 '7257d6ed-e325-4404-9e92-f756a306d935',
 'd4bca687-ffc8-4d65-b0a4-234e4e8fd893',
 '5464cca8-cd5b-4609-ba2b-96cd0b783a0d',
 '1aa002e0-b3ff-4ebe-9022-79c88f8a3879',
 'af27cf5a-b575-4fb4-acb7-79386dda7065',
 '2229c46c-145a-483e-98dc-e48dfecd6088',
 'b5095a93-501f-4d89-ac46-740a07159bb8',
 'ebb3ecad-7619-42c5-a2a3-3f4b983035a3',
 '2cf222bb-a3ed-45c1-a321-fbe74a9be719',
 'f9267c35-02d8-

In [10]:
query = "How cinderella reach her happy ending?"
from indoxArcg.pipelines.rag import RAG
retriever = RAG(llm=mistral_qa,vector_store=vector_store,enable_web_fallback=False,top_k= 5)

In [12]:
answer = retriever.infer(query=query)


[32mINFO[0m: [1mRetrieving context and scores from the vector database[0m
[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings for texts using engine: text-embedding-3-small[0m


2024-12-08 17:03:55,452 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


[32mINFO[0m: [1mGenerating answer without document relevancy filter[0m
[32mINFO[0m: [1mAnswering question[0m
[32mINFO[0m: [1mSending request to Hugging Face API[0m
[32mINFO[0m: [1mReceived successful response from Hugging Face API[0m
[32mINFO[0m: [1mQuery answered successfully[0m


In [13]:
answer

"Cinderella reaches her happy ending when she attends the king's palace for the wedding wearing a golden dress and glass slippers that were magically given to her by her fairy godmother. Her step-sisters and mother do not recognize her, and she dances with the prince, who falls in love with her and identifies her as the mysterious maiden he had previously met at the ball. As they ride away together, two white doves from the hazel tree"