# Quick Start

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/osllmai/inDox/blob/master/cookbook/indoxArcg/quick_start.ipynb)

In [3]:
import sys
import os
module_path = os.path.abspath('D:/osllm/inDox/libs/indoxArcg')
if module_path not in sys.path:
    sys.path.append(module_path)
    print("module path added to sys.path")
    




In [None]:
!pip install indoxArcg
!pip install transformers
!pip install torch
!pip install openai
!pip install chromadb
!pip install semantic_text_splitter

Collecting indoxArcg
  Downloading indoxArcg-0.0.4-py3-none-any.whl.metadata (12 kB)
Collecting latex2markdown (from indoxArcg)
  Downloading latex2markdown-0.2.1.tar.gz (161 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting loguru (from indoxArcg)
  Downloading loguru-0.7.3-py3-none-any.whl.metadata (22 kB)
Collecting protobuf (from indoxArcg)
  Downloading protobuf-6.30.1-cp310-abi3-win_amd64.whl.metadata (593 bytes)
Collecting PyPDF2 (from indoxArcg)
  Downloading pypdf2-3.0.1-py3-none-any.whl.metadata (6.8 kB)
Collecting python-dotenv (from indoxArcg)
  Using cached python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Collecting tenacity (from indoxArcg)
  Using cached tenacity-9.0.0-py3-none-any.whl.metadata (1.2 kB)
Collecting tokenizers (from indoxArcg)
  Downloading tokenizers-0.21.1-cp39-abi3-win_amd64.whl.metadata (6.9 kB)
Collecting umap_learn (from indoxArcg)
  Using cached umap_learn-0.5.7-py3-none-any.whl

## Setting Up the Python Environment

If you are running this project in your local IDE, please create a Python environment to ensure all dependencies are correctly managed. You can follow the steps below to set up a virtual environment named `indoxArcg`:

### Windows

1. **Create the virtual environment:**
```bash
python -m venv indoxArcg
```
2. **Activate the virtual environment:**
```bash
indoxArcg\Scripts\activate
```

### macOS/Linux

1. **Create the virtual environment:**
   ```bash
   python3 -m venv indoxArcg
```

2. **Activate the virtual environment:**
    ```bash
   source indoxArcg/bin/activate
```
### Install Dependencies

Once the virtual environment is activated, install the required dependencies by running:

```bash
pip install -r requirements.txt
```


In [15]:
import os
from dotenv import load_dotenv

load_dotenv()


OPENAI_API_KEY=os.getenv("OPENAI_API_KEY")

## Initial Setup

The following imports are essential for setting up the Indox application. These imports include the main Indox retrieval augmentation module, question-answering models, embeddings, and data loader splitter.

### Generating response using OpenAI's language models
OpenAIQA class is used to handle question-answering task using OpenAI's language models. This instance creates OpenAiEmbedding class to specifying embedding model. Here ChromaVectorStore handles the storage and retrieval of vector embeddings by specifying a collection name and sets up a vector store where text embeddings can be stored and queried.

if you have an issue like ModuleNotFoundError: No module named 'torch', you can install torch by running the following command:
```bash
pip install torch
```

if you have an issue like ModuleNotFoundError: No module named 'transformers', you can install transformers by running the following command:
```bash
pip install transformers
```



In [6]:
!pip install --upgrade numpy transformers torch

Collecting torch
  Downloading torch-2.6.0-cp310-cp310-win_amd64.whl.metadata (28 kB)
Collecting networkx (from torch)
  Downloading networkx-3.4.2-py3-none-any.whl.metadata (6.3 kB)
Collecting jinja2 (from torch)
  Downloading jinja2-3.1.6-py3-none-any.whl.metadata (2.9 kB)
Collecting sympy==1.13.1 (from torch)
  Using cached sympy-1.13.1-py3-none-any.whl.metadata (12 kB)
Collecting MarkupSafe>=2.0 (from jinja2->torch)
  Downloading MarkupSafe-3.0.2-cp310-cp310-win_amd64.whl.metadata (4.1 kB)
Downloading torch-2.6.0-cp310-cp310-win_amd64.whl (204.2 MB)
   ---------------------------------------- 0.0/204.2 MB ? eta -:--:--
   - -------------------------------------- 8.4/204.2 MB 47.2 MB/s eta 0:00:05
   --- ------------------------------------ 18.4/204.2 MB 46.4 MB/s eta 0:00:05
   ----- ---------------------------------- 28.6/204.2 MB 46.5 MB/s eta 0:00:04
   ------- -------------------------------- 40.1/204.2 MB 48.1 MB/s eta 0:00:04
   --------- ------------------------------ 50.3/2

In [5]:
# Initialize OpenAI language model and embedding model with API key
# Set up Chroma vector store for storing and retrieving embeddings
from indoxArcg.embeddings import OpenAiEmbedding
from indoxArcg.llms import OpenAi # if you have an issue, please run "!pip install --upgrade numpy transformers"

openai_qa = OpenAi(api_key=OPENAI_API_KEY, model="gpt-4o-mini")
embed_openai = OpenAiEmbedding(api_key=OPENAI_API_KEY, model="text-embedding-3-small")

from indoxArcg.vector_stores import Chroma
db = Chroma(collection_name="sample", embedding_function=embed_openai)

  from .autonotebook import tqdm as notebook_tqdm


[32mINFO[0m: [1mInitializing OpenAi with model: gpt-4o-mini[0m
[32mINFO[0m: [1mOpenAi initialized successfully[0m
[32mINFO[0m: [1mInitialized OpenAiEmbedding with model: text-embedding-3-small[0m


2025-03-19 02:32:56,621 - chromadb.telemetry.product.posthog - INFO - Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.


### load and preprocess data
This part of code demonstrates how to load and preprocess text data from a file, split it into chunks, and store these chunks in the vector store that was set up previously.

In [6]:
!wget https://raw.githubusercontent.com/osllmai/inDox/doc-v3/cookbook/indoxArcg/sample.txt

# Check if the file is downloaded and display its content
with open('sample.txt', 'r') as file:
    content = file.read()
    print(content)

The wife of a rich man fell sick, and as she felt that her end
was drawing near, she called her only daughter to her bedside and
said, dear child, be good and pious, and then the
good God will always protect you, and I will look down on you
from heaven and be near you.  Thereupon she closed her eyes and
departed.  Every day the maiden went out to her mother's grave,
and wept, and she remained pious and good.  When winter came
the snow spread a white sheet over the grave, and by the time the
spring sun had drawn it off again, the man had taken another wife.
The woman had brought with her into the house two daughters,
who were beautiful and fair of face, but vile and black of heart.
Now began a bad time for the poor step-child.  Is the stupid goose
to sit in the parlor with us, they said.  He who wants to eat bread
must earn it.  Out with the kitchen-wench.  They took her pretty
clothes away from her, put an old grey bedgown on her, and gave
her wooden shoes.  Just look at the proud prin

'wget' is not recognized as an internal or external command,
operable program or batch file.


In [7]:
file_path = "sample.txt"

In [8]:
from indoxArcg.data_loaders import Txt

loader = Txt(txt_path=file_path)
doc = loader.load()

doc[0:100]

'The wife of a rich man fell sick, and as she felt that her end\nwas drawing near, she called her only'

In [16]:
# Split the loaded document into smaller chunks of text using SemanticTextSplitter
from indoxArcg.splitter import SemanticTextSplitter
splitter = SemanticTextSplitter(chunk_size=400)
docs = splitter.split_text(doc)


In [9]:
docs[0]

"The wife of a rich man fell sick, and as she felt that her end\nwas drawing near, she called her only daughter to her bedside and\nsaid, dear child, be good and pious, and then the\ngood God will always protect you, and I will look down on you\nfrom heaven and be near you.  Thereupon she closed her eyes and\ndeparted.  Every day the maiden went out to her mother's grave,\nand wept, and she remained pious and good.  When winter came\nthe snow spread a white sheet over the grave, and by the time the\nspring sun had drawn it off again, the man had taken another wife.\nThe woman had brought with her into the house two daughters,\nwho were beautiful and fair of face, but vile and black of heart.\nNow began a bad time for the poor step-child.  Is the stupid goose\nto sit in the parlor with us, they said.  He who wants to eat bread\nmust earn it.  Out with the kitchen-wench.  They took her pretty\nclothes away from her, put an old grey bedgown on her, and gave\nher wooden shoes.  Just look a

In [10]:
db.add(docs=docs)

[32mINFO[0m: [1mStoring documents in the vector store[0m


2025-03-19 02:33:00,434 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-03-19 02:33:01,236 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-03-19 02:33:01,423 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-03-19 02:33:01,779 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-03-19 02:33:02,042 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-03-19 02:33:02,448 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-03-19 02:33:03,005 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-03-19 02:33:03,393 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-03-19 02:33:03,594 - httpx - INFO - HTTP Request: POST https://api.openai.c

[32mINFO[0m: [1mDocument added successfully to the vector store.[0m
[32mINFO[0m: [1mDocuments stored successfully[0m


In [11]:
print(dir(db))


['_Chroma__query_collection', '_INDOX_DEFAULT_COLLECTION_NAME', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__len__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_add_documents', '_add_texts', '_client', '_client_settings', '_collection', '_embedding_function', '_persist_directory', '_similarity_search', '_similarity_search_with_score', 'add', 'delete', 'delete_collection', 'embeddings', 'get', 'override_relevance_score_fn', 'update_document', 'update_documents']


In [12]:
db.get()['ids']

['3347a135-2626-45ef-a89b-bcaab5f8f6c4',
 'ae6521b1-4b62-46d1-b9d3-a930c2110b26',
 '26f36d00-a8bc-43d5-a132-c6fe98940566',
 '43a43db2-0d84-492d-882b-619034b56499',
 'f95805b8-083e-4ac6-9c9d-b3ceaa4a3364',
 '402874d9-7dc6-40f8-bba0-36c50f9112e1',
 'd3c88599-a2b1-4701-bf2c-0c87e669d8b8',
 'ac991130-b17b-45db-8674-ea00118cc6f9',
 '7b4dba54-d3c5-4a04-99fc-d3bc733fa517']

In [13]:
db.get()['documents'][0]

"The wife of a rich man fell sick, and as she felt that her end\nwas drawing near, she called her only daughter to her bedside and\nsaid, dear child, be good and pious, and then the\ngood God will always protect you, and I will look down on you\nfrom heaven and be near you.  Thereupon she closed her eyes and\ndeparted.  Every day the maiden went out to her mother's grave,\nand wept, and she remained pious and good.  When winter came\nthe snow spread a white sheet over the grave, and by the time the\nspring sun had drawn it off again, the man had taken another wife.\nThe woman had brought with her into the house two daughters,\nwho were beautiful and fair of face, but vile and black of heart.\nNow began a bad time for the poor step-child.  Is the stupid goose\nto sit in the parlor with us, they said.  He who wants to eat bread\nmust earn it.  Out with the kitchen-wench.  They took her pretty\nclothes away from her, put an old grey bedgown on her, and gave\nher wooden shoes.  Just look a

### Retrieve relevant information and generate an answer
The main purpose of these lines is to perform a query on the vector store to retrieve the most relevant information (top_k=5) and generate an answer using the language model.

In [17]:
from indoxArcg.pipelines.rag import RAG


query = "How cinderella reach her happy ending?"
retriever = RAG(llm=openai_qa,vector_store=db)

infer(query) method sends the query to the retriever, which searches the vector store for relevant text chunks and uses the language model to generate a response based on the retrieved information.
Context property retrieves the context or the detailed information that the retriever used to generate the answer to the query. It provides insight into how the query was answered by showing the relevant text chunks and any additional information used.

### Basic Retrieval (just vector store lookup):

In [18]:
answer = retriever.infer(question=query,top_k=5)

2025-03-19 02:34:20,574 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-03-19 02:34:22,705 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [19]:
from pprint import pprint
pprint(answer)

('Cinderella reached her happy ending through a series of magical events '
 'facilitated by a little white bird that lived in a hazel tree planted at her '
 "mother's grave. After being mistreated by her stepmother and stepsisters, "
 'she expressed her wishes to the bird, which provided her with beautiful '
 "dresses and slippers to wear to the king's festival. Despite her attempts to "
 "escape the king's son after each night of dancing, he devised a plan to find "
 'her by using a golden slipper that she left behind. When the slipper fit her '
 "perfectly, the king's son recognized her as the beautiful maiden he had "
 "danced with. Ultimately, Cinderella was taken away by the king's son, and as "
 'they passed the hazel tree, two doves confirmed her identity, leading to her '
 'joyful marriage and happy ending.')


### Hybrid Retrieval (validates context & uses web fallback if needed):

In [20]:
pip install duckduckgo-search==4.1.1


Note: you may need to restart the kernel to use updated packages.


In [28]:
import sys
import os

module_path = os.path.abspath('D:/osllm/inDox/libs/indoxArcg')

if module_path not in sys.path:
    sys.path.append(module_path)
    print("Module path added to sys.path")

try:
    import indoxArcg  # Replace with the actual module name inside `indoxArcg`
    print("Module imported successfully")
except ImportError as e:
    print(f"Error importing module: {e}")


Module imported successfully


In [27]:
pip install pytest


Collecting pytest
  Downloading pytest-8.3.5-py3-none-any.whl.metadata (7.6 kB)
Collecting iniconfig (from pytest)
  Downloading iniconfig-2.0.0-py3-none-any.whl.metadata (2.6 kB)
Collecting pluggy<2,>=1.5 (from pytest)
  Downloading pluggy-1.5.0-py3-none-any.whl.metadata (4.8 kB)
Downloading pytest-8.3.5-py3-none-any.whl (343 kB)
Downloading pluggy-1.5.0-py3-none-any.whl (20 kB)
Downloading iniconfig-2.0.0-py3-none-any.whl (5.9 kB)
Installing collected packages: pluggy, iniconfig, pytest
Successfully installed iniconfig-2.0.0 pluggy-1.5.0 pytest-8.3.5
Note: you may need to restart the kernel to use updated packages.


In [33]:
from unittest.mock import MagicMock
from indoxArcg.pipelines.rag.rag import StandardRetriever, RetrievalResult

# Mock vector store
vector_store = MagicMock()
vector_store._similarity_search_with_score.return_value = [
    (MagicMock(page_content="Test Document 1"), 0.95),
    (MagicMock(page_content="Test Document 2"), 0.85),
]

# Initialize retriever
retriever = StandardRetriever(vector_store, top_k=2)

# Test retrieval
results = retriever.retrieve("What is AI?")

# Display results
for res in results:
    print(f"Content: {res.content}, Score: {res.score}")


Content: Test Document 1, Score: 0.95
Content: Test Document 2, Score: 0.85


In [34]:
from unittest.mock import MagicMock
from indoxArcg.pipelines.rag.rag import RAG

# Mock LLM and vector store
mock_llm = MagicMock()
mock_vector_store = MagicMock()

# Mock LLM's response
mock_llm.answer_question.return_value = "AI stands for Artificial Intelligence."

# Mock vector store retrieval
mock_vector_store._similarity_search_with_score.return_value = [
    (MagicMock(page_content="AI is the simulation of human intelligence in machines."), 0.95)
]

# Initialize RAG
rag = RAG(llm=mock_llm, vector_store=mock_vector_store)

# Test inference
question = "What is AI?"
answer = rag.infer(question, top_k=1)

print(f"Question: {question}")
print(f"Answer: {answer}")


Question: What is AI?
Answer: AI stands for Artificial Intelligence.


In [35]:
import indoxArcg.pipelines.rag.rag as rag_module

dir(rag_module)


['AnswerGenerationError',
 'AnswerValidator',
 'Any',
 'BaseRetriever',
 'ContextRetrievalError',
 'Dict',
 'List',
 'MultiQueryRetriever',
 'Optional',
 'QueryResult',
 'RAG',
 'RAGError',
 'RetrievalResult',
 'StandardRetriever',
 'WebSearchFallback',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'dataclass',
 'logger',
 'sys',

In [26]:
answer = retriever.infer(
    question="who is the next president of united states?",
    top_k=5,
    smart_retrieval=True,
)

[32mINFO[0m: [1mUsing smart retrieval[0m


2025-03-19 02:44:30,853 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


[32mERROR[0m: [31m[1mError grading document: 'generator' object has no attribute 'strip'[0m
[31mERROR[0m: [31m[1mError grading document: 'generator' object has no attribute 'strip'[0m
[32mERROR[0m: [31m[1mSkipping this document due to an error.[0m
[31mERROR[0m: [31m[1mSkipping this document due to an error.[0m
[32mERROR[0m: [31m[1mError grading document: 'generator' object has no attribute 'strip'[0m
[31mERROR[0m: [31m[1mError grading document: 'generator' object has no attribute 'strip'[0m
[32mERROR[0m: [31m[1mSkipping this document due to an error.[0m
[31mERROR[0m: [31m[1mSkipping this document due to an error.[0m
[32mERROR[0m: [31m[1mError grading document: 'generator' object has no attribute 'strip'[0m
[31mERROR[0m: [31m[1mError grading document: 'generator' object has no attribute 'strip'[0m
[32mERROR[0m: [31m[1mSkipping this document due to an error.[0m
[31mERROR[0m: [31m[1mSkipping this document due to an error.[0m
[32

AnswerGenerationError: Answer generation failed: No relevant context found for the question

In [22]:
from pprint import pprint
pprint(answer)


('Cinderella reached her happy ending through a series of magical events '
 'facilitated by a little white bird that lived in a hazel tree planted at her '
 "mother's grave. After being mistreated by her stepmother and stepsisters, "
 'she expressed her wishes to the bird, which provided her with beautiful '
 "dresses and slippers to wear to the king's festival. Despite her attempts to "
 "escape the king's son after each night of dancing, he devised a plan to find "
 'her by using a golden slipper that she left behind. When the slipper fit her '
 "perfectly, the king's son recognized her as the beautiful maiden he had "
 "danced with. Ultimately, Cinderella was taken away by the king's son, and as "
 'they passed the hazel tree, two doves confirmed her identity, leading to her '
 'joyful marriage and happy ending.')


### Advanced Retrieval (with multi-query):

In [None]:
answer = retriever.infer(
    question=query,
    top_k=5,
    use_clustering=False,
    use_multi_query=True
)

[32mINFO[0m: [1mMulti-query retrieval initialized[0m
[32mINFO[0m: [1mRunning multi-query retrieval for: How cinderella reach her happy ending?[0m


2025-01-20 19:58:30,157 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mGenerated queries: ['Here are three different queries you can use to gather information about how Cinderella reaches her happy ending:', '1. **Query for Fairy Tale Summary**:', '- "What are the key events in the story of Cinderella that lead to her happy ending?"', '2. **Query for Character Development**:', '- "How do Cinderella\'s character traits and actions contribute to her achieving a happy ending in the fairy tale?"', '3. **Query for Themes and Motifs**:', '- "What themes and motifs in the Cinderella story illustrate how she ultimately reaches her happy ending?"'][0m


2025-01-20 19:58:31,752 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-01-20 19:58:33,101 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-01-20 19:58:33,900 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-01-20 19:58:35,280 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-01-20 19:58:36,455 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-01-20 19:58:37,464 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-01-20 19:58:39,111 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


[32mINFO[0m: [1mRetrieved 35 relevant passages[0m


2025-01-20 19:58:49,616 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mGenerated final response[0m


2025-01-20 19:58:55,239 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [None]:
from pprint import pprint
pprint(answer)


('Cinderella reaches her happy ending through a combination of her unwavering '
 'goodness, perseverance, and the magical assistance she receives from her '
 "mother's spirit, symbolized by the hazel tree and the little bird. After "
 "planting a hazel branch on her mother's grave and weeping over it, a magical "
 'tree grows, which becomes a source of comfort and help for her. Whenever she '
 'expresses a wish beneath the tree, the little white bird grants her those '
 'wishes, providing her with beautiful dresses and shoes that allow her to '
 "attend the royal festival despite her stepmother's attempts to keep her from "
 'going.\n'
 '\n'
 'At the festival, Cinderella captures the attention of the prince, who dances '
 'only with her. However, she must leave quickly each time, leaving behind a '
 'golden slipper on the staircase. The prince then searches for the owner of '
 'the slipper, declaring he will marry the girl whose foot fits it. While her '
 'stepsisters attempt to fit in