# SingleStoreDB
In this notebook, we will demonstrate how to use SingleStoreDB, for accessing and querying data efficiently. SingleStoreDB is designed to work seamlessly with modern analytical workloads, making it a powerful tool for data analysis, research, and question-answering systems.

To begin, ensure you have singlestoredb installed in your Python environment. You can easily install it using `pip install singlestoredb`. 

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/osllmai/inDox/blob/master/Demo/singlestoredb.ipynb)

In [None]:
!pip install indox
!pip install singlestoredb
!pip install semantic_text_splitter
!pip install sentence-transformers

## Setting Up the Python Environment

If you are running this project in your local IDE, please create a Python environment to ensure all dependencies are correctly managed. You can follow the steps below to set up a virtual environment named `indox`:

### Windows

1. **Create the virtual environment:**
```bash
python -m venv indox
```
2. **Activate the virtual environment:**
```bash
indox_judge\Scripts\activate
```

### macOS/Linux

1. **Create the virtual environment:**
   ```bash
   python3 -m venv indox
```

2. **Activate the virtual environment:**
    ```bash
   source indox/bin/activate
```
### Install Dependencies

Once the virtual environment is activated, install the required dependencies by running:

```bash
pip install -r requirements.txt
```


### Load Hugging face API key 

In [16]:
import os
from dotenv import load_dotenv

load_dotenv('api.env')

HUGGINGFACE_API_KEY = os.environ['HUGGINGFACE_API_KEY']

In [17]:
from indox import IndoxRetrievalAugmentation
indox = IndoxRetrievalAugmentation()

[32mINFO[0m: [1mIndoxRetrievalAugmentation initialized[0m

            ██  ███    ██  ██████   ██████  ██       ██
            ██  ████   ██  ██   ██ ██    ██   ██  ██
            ██  ██ ██  ██  ██   ██ ██    ██     ██
            ██  ██  ██ ██  ██   ██ ██    ██   ██   ██
            ██  ██  █████  ██████   ██████  ██       ██
            


Initialize a language model and an embedding model using the indox library with Hugging Face. The HuggingFaceModel class is used to create an instance of the Mistral-7B-Instruct model for tasks like question answering.

In [18]:
from indox.llms import HuggingFaceModel
from indox.embeddings import HuggingFaceEmbedding


mistral_qa = HuggingFaceModel(api_key=HUGGINGFACE_API_KEY,model="mistralai/Mistral-7B-Instruct-v0.2")
embed = HuggingFaceEmbedding(api_key=HUGGINGFACE_API_KEY,model="multi-qa-mpnet-base-cos-v1")

[32mINFO[0m: [1mInitializing HuggingFaceModel with model: mistralai/Mistral-7B-Instruct-v0.2[0m
[32mINFO[0m: [1mHuggingFaceModel initialized successfully[0m


2024-09-08 14:17:40,160 - sentence_transformers.SentenceTransformer - INFO - Use pytorch device_name: cuda
2024-09-08 14:17:40,161 - sentence_transformers.SentenceTransformer - INFO - Load pretrained SentenceTransformer: multi-qa-mpnet-base-cos-v1


[32mINFO[0m: [1mInitialized HuggingFaceEmbedding with model: multi-qa-mpnet-base-cos-v1[0m


### Load Sample text 

In [19]:
!wget https://raw.githubusercontent.com/osllmai/inDox/master/Demo/sample.txt

--2024-09-08 14:17:47--  https://raw.githubusercontent.com/osllmai/inDox/master/Demo/sample.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


185.199.110.133, 185.199.109.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14025 (14K) [text/plain]
Saving to: ‘sample.txt.7’


2024-09-08 14:17:49 (121 MB/s) - ‘sample.txt.7’ saved [14025/14025]



In [20]:
file_path = "sample.txt"
with open(file_path, "r") as file:
    text = file.read()

use the `RecursiveCharacterTextSplitter` class from the indox library to divide a large text into smaller, manageable chunks

In [21]:
from indox.splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(400,20)
content_chunks = splitter.split_text(text)

In [22]:
content_chunks[:3]

["The wife of a rich man fell sick, and as she felt that her end\n\nwas drawing near, she called her only daughter to her bedside and\n\nsaid, dear child, be good and pious, and then the\n\ngood God will always protect you, and I will look down on you\n\nfrom heaven and be near you.  Thereupon she closed her eyes and\n\ndeparted.  Every day the maiden went out to her mother's grave,",
 'and wept, and she remained pious and good.  When winter came\n\nthe snow spread a white sheet over the grave, and by the time the\n\nspring sun had drawn it off again, the man had taken another wife.\n\nThe woman had brought with her into the house two daughters,\n\nwho were beautiful and fair of face, but vile and black of heart.\n\nNow began a bad time for the poor step-child.  Is the stupid goose',
 'to sit in the parlor with us, they said.  He who wants to eat bread\n\nmust earn it.  Out with the kitchen-wench.  They took her pretty\n\nclothes away from her, put an old grey bedgown on her, and gave\

### Set up vector store
Set up a vector store using the `SinlgeStoreDB` class from the indox library.

In [23]:
from indox.vector_stores import SingleStoreVectorDB

connection_params = {
    "host": "host",
    "port": port,
    "user": "user",
    "password": "password",
    "database": "databasename"
}

db = SingleStoreVectorDB(connection_params=connection_params,embedding_function=embed)

Vector index 'idx_embeddings_vector' already exists.


### Storing Data in the Vector Store

In [24]:
db.add_texts(content_chunks)


[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings for texts using model: SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)[0m


Batches: 100%|██████████| 2/2 [00:00<00:00, 11.63it/s]


### Answering query 

In [25]:
query = "How cinderella reach her happy ending?"


In [26]:
retriever = indox.QuestionAnswer(vector_database=db,llm=mistral_qa,top_k=5)


In [27]:
answer = retriever.invoke(query=query)


[32mINFO[0m: [1mRetrieving context and scores from the vector database[0m
[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings for texts using model: SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)[0m


Batches: 100%|██████████| 1/1 [00:00<00:00, 84.55it/s]


[32mINFO[0m: [1mGenerating answer without document relevancy filter[0m
[32mINFO[0m: [1mAnswering question[0m
[32mINFO[0m: [1mSending request to Hugging Face API[0m
[32mINFO[0m: [1mReceived successful response from Hugging Face API[0m
[32mINFO[0m: [1mQuery answered successfully[0m


In [28]:
answer

'Cinderella reached her happy ending by escaping from her wicked stepmother and stepsisters and attending the royal ball in disguise. When the prince saw her there, he was instantly attracted to her and identified her as the mysterious maiden he had met earlier. After recognizing each other, they rode away together and live happily ever after. However, due to the wickedness of the stepmother and stepsisters, they tried to prevent Cinderella from attending the ball by'