# Quick Start

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/osllmai/inDox/blob/master/cookbook/indoxArcg/quick_start.ipynb)

In [None]:
!pip install indoxArcg
!pip install openai
!pip install chromadb
!pip install semantic_text_splitter

## Setting Up the Python Environment

If you are running this project in your local IDE, please create a Python environment to ensure all dependencies are correctly managed. You can follow the steps below to set up a virtual environment named `indoxArcg`:

### Windows

1. **Create the virtual environment:**
```bash
python -m venv indoxArcg
```
2. **Activate the virtual environment:**
```bash
indoxArcg\Scripts\activate
```

### macOS/Linux

1. **Create the virtual environment:**
   ```bash
   python3 -m venv indoxArcg
```

2. **Activate the virtual environment:**
    ```bash
   source indoxArcg/bin/activate
```
### Install Dependencies

Once the virtual environment is activated, install the required dependencies by running:

```bash
pip install -r requirements.txt
```


In [None]:
import sys
import os
module_path = os.path.abspath('E:/Codes/inDox/libs/indoxArcg')
if module_path not in sys.path:
    sys.path.append(module_path)



In [2]:
import os
from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.environ['OPENAI_API_KEY']

## Initial Setup

The following imports are essential for setting up the Indox application. These imports include the main Indox retrieval augmentation module, question-answering models, embeddings, and data loader splitter.

### Generating response using OpenAI's language models 
OpenAIQA class is used to handle question-answering task using OpenAI's language models. This instance creates OpenAiEmbedding class to specifying embedding model. Here ChromaVectorStore handles the storage and retrieval of vector embeddings by specifying a collection name and sets up a vector store where text embeddings can be stored and queried.

In [None]:
!pip install --upgrade numpy transformers

In [None]:
# Initialize OpenAI language model and embedding model with API key
# Set up Chroma vector store for storing and retrieving embeddings
from indoxArcg.embeddings import OpenAiEmbedding
from indoxArcg.llms import OpenAi # if you have an issue, please run "!pip install --upgrade numpy transformers"

openai_qa = OpenAi(api_key=OPENAI_API_KEY, model="gpt-4o-mini")
embed_openai = OpenAiEmbedding(api_key=OPENAI_API_KEY, model="text-embedding-3-small")

from indoxArcg.vector_stores import Chroma
db = Chroma(collection_name="sample", embedding_function=embed_openai)

[32mINFO[0m: [1mInitializing OpenAi with model: gpt-4o-mini[0m
[32mINFO[0m: [1mOpenAi initialized successfully[0m
[32mINFO[0m: [1mInitialized OpenAiEmbedding with model: text-embedding-3-small[0m


2025-01-20 19:53:52,178 - chromadb.telemetry.product.posthog - INFO - Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.


### load and preprocess data
This part of code demonstrates how to load and preprocess text data from a file, split it into chunks, and store these chunks in the vector store that was set up previously.

In [None]:
# !wget https://raw.githubusercontent.com/osllmai/inDox/doc-v3/cookbook/indoxArcg/sample.txt

# # Check if the file is downloaded and display its content
# with open('sample.txt', 'r') as file:
#     content = file.read()
#     print(content)

The wife of a rich man fell sick, and as she felt that her end
was drawing near, she called her only daughter to her bedside and
said, dear child, be good and pious, and then the
good God will always protect you, and I will look down on you
from heaven and be near you.  Thereupon she closed her eyes and
departed.  Every day the maiden went out to her mother's grave,
and wept, and she remained pious and good.  When winter came
the snow spread a white sheet over the grave, and by the time the
spring sun had drawn it off again, the man had taken another wife.
The woman had brought with her into the house two daughters,
who were beautiful and fair of face, but vile and black of heart.
Now began a bad time for the poor step-child.  Is the stupid goose
to sit in the parlor with us, they said.  He who wants to eat bread
must earn it.  Out with the kitchen-wench.  They took her pretty
clothes away from her, put an old grey bedgown on her, and gave
her wooden shoes.  Just look at the proud prin

'wget' is not recognized as an internal or external command,
operable program or batch file.


In [2]:
file_path = "sample.txt"

In [None]:
from indoxArcg.data_loaders import Txt

loader = Txt(txt_path=file_path)
doc = loader.load()
doc

In [None]:
# Split the loaded document into smaller chunks of text using SemanticTextSplitter
from indoxArcg.splitter import SemanticTextSplitter
splitter = SemanticTextSplitter(chunk_size=400)
docs = splitter.split_text(doc)

In [8]:
docs

["The wife of a rich man fell sick, and as she felt that her end\nwas drawing near, she called her only daughter to her bedside and\nsaid, dear child, be good and pious, and then the\ngood God will always protect you, and I will look down on you\nfrom heaven and be near you.  Thereupon she closed her eyes and\ndeparted.  Every day the maiden went out to her mother's grave,\nand wept, and she remained pious and good.  When winter came\nthe snow spread a white sheet over the grave, and by the time the\nspring sun had drawn it off again, the man had taken another wife.\nThe woman had brought with her into the house two daughters,\nwho were beautiful and fair of face, but vile and black of heart.\nNow began a bad time for the poor step-child.  Is the stupid goose\nto sit in the parlor with us, they said.  He who wants to eat bread\nmust earn it.  Out with the kitchen-wench.  They took her pretty\nclothes away from her, put an old grey bedgown on her, and gave\nher wooden shoes.  Just look 

In [9]:
db.add(docs=docs)

[32mINFO[0m: [1mStoring documents in the vector store[0m


2025-01-20 19:53:56,345 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-01-20 19:53:59,001 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-01-20 19:53:59,758 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-01-20 19:54:00,935 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-01-20 19:54:01,751 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-01-20 19:54:02,718 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-01-20 19:54:03,719 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-01-20 19:54:04,888 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-01-20 19:54:05,584 - httpx - INFO - HTTP Request: POST https://api.openai.c

[32mINFO[0m: [1mDocument added successfully to the vector store.[0m
[32mINFO[0m: [1mDocuments stored successfully[0m


### Retrieve relevant information and generate an answer
The main purpose of these lines is to perform a query on the vector store to retrieve the most relevant information (top_k=5) and generate an answer using the language model.

In [11]:
from indoxArcg.pipelines.rag import RAG


query = "How cinderella reach her happy ending?"
retriever = RAG(llm=openai_qa,vector_store=db)

infer(query) method sends the query to the retriever, which searches the vector store for relevant text chunks and uses the language model to generate a response based on the retrieved information.
Context property retrieves the context or the detailed information that the retriever used to generate the answer to the query. It provides insight into how the query was answered by showing the relevant text chunks and any additional information used.

### Basic Retrieval (just vector store lookup):

In [12]:
answer = retriever.infer(question=query,top_k=5)

2025-01-20 19:55:17,630 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-01-20 19:55:22,263 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [13]:
from pprint import pprint
pprint(answer)


('Cinderella reached her happy ending through a series of magical events '
 'facilitated by a little bird that lived in a hazel tree planted at her '
 "mother's grave. After being mistreated by her stepmother and stepsisters, "
 'she expressed her wishes to the bird, which granted her beautiful dresses '
 "and slippers to wear to the king's festival. Despite her attempts to escape "
 "and hide from the king's son, he was determined to find her. Ultimately, "
 "when the king's son searched for the owner of a golden slipper that she left "
 'behind, Cinderella was called to try it on. The slipper fit perfectly, '
 "revealing her as the true bride. The king's son recognized her as the "
 'beautiful maiden he had danced with, and they rode away together, leading to '
 'their wedding. Additionally, as they passed by the hazel tree, two doves '
 "confirmed Cinderella's identity, ensuring her happy ending.")


### Hybrid Retrieval (validates context & uses web fallback if needed):

In [14]:
answer = retriever.infer(
    question="who is the next president of united states?",
    top_k=5,
    smart_retrieval=True,
)

[32mINFO[0m: [1mUsing smart retrieval[0m


2025-01-20 19:55:30,040 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-01-20 19:55:31,546 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mNot relevant doc[0m


2025-01-20 19:55:32,384 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mNot relevant doc[0m


2025-01-20 19:55:33,527 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mNot relevant doc[0m


2025-01-20 19:55:34,548 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mNot relevant doc[0m


2025-01-20 19:55:35,880 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mNot relevant doc[0m
[32mINFO[0m: [1mNo relevant documents found in initial context[0m
[32mINFO[0m: [1mPerforming web search for additional context[0m


2025-01-20 19:55:38,517 - primp - INFO - response: https://lite.duckduckgo.com/lite/ 200 20765
2025-01-20 19:55:39,384 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mRelevant doc[0m


2025-01-20 19:55:40,856 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mNot relevant doc[0m


2025-01-20 19:55:41,839 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mRelevant doc[0m


2025-01-20 19:55:42,714 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mNot relevant doc[0m


2025-01-20 19:55:43,598 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mRelevant doc[0m


2025-01-20 19:55:44,892 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [15]:
from pprint import pprint
pprint(answer)


('The next president of the United States is Republican Donald Trump, who will '
 'be sworn in for a second term.')


### Advanced Retrieval (with multi-query):

In [17]:
answer = retriever.infer(
    question=query,
    top_k=5,
    use_clustering=False,
    use_multi_query=True
)

[32mINFO[0m: [1mMulti-query retrieval initialized[0m
[32mINFO[0m: [1mRunning multi-query retrieval for: How cinderella reach her happy ending?[0m


2025-01-20 19:58:30,157 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mGenerated queries: ['Here are three different queries you can use to gather information about how Cinderella reaches her happy ending:', '1. **Query for Fairy Tale Summary**:', '- "What are the key events in the story of Cinderella that lead to her happy ending?"', '2. **Query for Character Development**:', '- "How do Cinderella\'s character traits and actions contribute to her achieving a happy ending in the fairy tale?"', '3. **Query for Themes and Motifs**:', '- "What themes and motifs in the Cinderella story illustrate how she ultimately reaches her happy ending?"'][0m


2025-01-20 19:58:31,752 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-01-20 19:58:33,101 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-01-20 19:58:33,900 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-01-20 19:58:35,280 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-01-20 19:58:36,455 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-01-20 19:58:37,464 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-01-20 19:58:39,111 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


[32mINFO[0m: [1mRetrieved 35 relevant passages[0m


2025-01-20 19:58:49,616 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mGenerated final response[0m


2025-01-20 19:58:55,239 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [18]:
from pprint import pprint
pprint(answer)


('Cinderella reaches her happy ending through a combination of her unwavering '
 'goodness, perseverance, and the magical assistance she receives from her '
 "mother's spirit, symbolized by the hazel tree and the little bird. After "
 "planting a hazel branch on her mother's grave and weeping over it, a magical "
 'tree grows, which becomes a source of comfort and help for her. Whenever she '
 'expresses a wish beneath the tree, the little white bird grants her those '
 'wishes, providing her with beautiful dresses and shoes that allow her to '
 "attend the royal festival despite her stepmother's attempts to keep her from "
 'going.\n'
 '\n'
 'At the festival, Cinderella captures the attention of the prince, who dances '
 'only with her. However, she must leave quickly each time, leaving behind a '
 'golden slipper on the staircase. The prince then searches for the owner of '
 'the slipper, declaring he will marry the girl whose foot fits it. While her '
 'stepsisters attempt to fit in