# Quick Start

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/osllmai/inDox/blob/master/cookbook/indoxArcg/quick_start.ipynb)

In [None]:
!pip install indoxArcg
!pip install openai
!pip install chromadb
!pip install semantic_text_splitter

## Setting Up the Python Environment

If you are running this project in your local IDE, please create a Python environment to ensure all dependencies are correctly managed. You can follow the steps below to set up a virtual environment named `indoxArcg`:

### Windows

1. **Create the virtual environment:**
```bash
python -m venv indoxArcg
```
2. **Activate the virtual environment:**
```bash
indoxArcg\Scripts\activate
```

### macOS/Linux

1. **Create the virtual environment:**
   ```bash
   python3 -m venv indoxArcg
```

2. **Activate the virtual environment:**
    ```bash
   source indoxArcg/bin/activate
```
### Install Dependencies

Once the virtual environment is activated, install the required dependencies by running:

```bash
pip install -r requirements.txt
```


In [1]:
import sys
import os
module_path = os.path.abspath('E:/Codes/inDox/libs/indoxArcg')
if module_path not in sys.path:
    sys.path.append(module_path)




In [2]:
import os
from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.environ['OPENAI_API_KEY']

## Initial Setup

The following imports are essential for setting up the Indox application. These imports include the main Indox retrieval augmentation module, question-answering models, embeddings, and data loader splitter.

### Generating response using OpenAI's language models 
OpenAIQA class is used to handle question-answering task using OpenAI's language models. This instance creates OpenAiEmbedding class to specifying embedding model. Here ChromaVectorStore handles the storage and retrieval of vector embeddings by specifying a collection name and sets up a vector store where text embeddings can be stored and queried.

In [3]:
from indoxArcg.embeddings import OpenAiEmbedding
from indoxArcg.llms import OpenAi

openai_qa = OpenAi(api_key=OPENAI_API_KEY, model="gpt-4o-mini")
embed_openai = OpenAiEmbedding(api_key=OPENAI_API_KEY, model="text-embedding-3-small")

from indoxArcg.vector_stores import Chroma
db = Chroma(collection_name="sample",embedding_function=embed_openai)

[32mINFO[0m: [1mInitializing OpenAi with model: gpt-4o-mini[0m
[32mINFO[0m: [1mOpenAi initialized successfully[0m
[32mINFO[0m: [1mInitialized OpenAiEmbedding with model: text-embedding-3-small[0m


2025-01-19 19:44:45,358 - chromadb.telemetry.product.posthog - INFO - Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.


### load and preprocess data
This part of code demonstrates how to load and preprocess text data from a file, split it into chunks, and store these chunks in the vector store that was set up previously.

In [None]:
!wget https://raw.githubusercontent.com/osllmai/inDox/master/Demo/sample.txt

In [5]:
file_path = "sample.txt"

In [6]:
from indoxArcg.data_loaders import Txt

loader = Txt(txt_path=file_path)
doc = loader.load()

In [7]:
from indoxArcg.splitter import SemanticTextSplitter
splitter = SemanticTextSplitter(chunk_size=400)
docs = splitter.split_text(doc)

In [8]:
docs

["The wife of a rich man fell sick, and as she felt that her end\nwas drawing near, she called her only daughter to her bedside and\nsaid, dear child, be good and pious, and then the\ngood God will always protect you, and I will look down on you\nfrom heaven and be near you.  Thereupon she closed her eyes and\ndeparted.  Every day the maiden went out to her mother's grave,\nand wept, and she remained pious and good.  When winter came\nthe snow spread a white sheet over the grave, and by the time the\nspring sun had drawn it off again, the man had taken another wife.\nThe woman had brought with her into the house two daughters,\nwho were beautiful and fair of face, but vile and black of heart.\nNow began a bad time for the poor step-child.  Is the stupid goose\nto sit in the parlor with us, they said.  He who wants to eat bread\nmust earn it.  Out with the kitchen-wench.  They took her pretty\nclothes away from her, put an old grey bedgown on her, and gave\nher wooden shoes.  Just look 

In [9]:
db.add(docs=docs)

[32mINFO[0m: [1mStoring documents in the vector store[0m


2025-01-19 19:44:57,819 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-01-19 19:44:59,091 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-01-19 19:45:00,175 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-01-19 19:45:01,454 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-01-19 19:45:02,703 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-01-19 19:45:03,804 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-01-19 19:45:04,823 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-01-19 19:45:05,533 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-01-19 19:45:06,810 - httpx - INFO - HTTP Request: POST https://api.openai.c

[32mINFO[0m: [1mDocument added successfully to the vector store.[0m
[32mINFO[0m: [1mDocuments stored successfully[0m


### Retrieve relevant information and generate an answer
The main purpose of these lines is to perform a query on the vector store to retrieve the most relevant information (top_k=5) and generate an answer using the language model.

In [10]:
from indoxArcg.pipelines.rag import RAG


query = "How cinderella reach her happy ending?"
retriever = RAG(llm=openai_qa,vector_store=db)

invoke(query) method sends the query to the retriever, which searches the vector store for relevant text chunks and uses the language model to generate a response based on the retrieved information.
Context property retrieves the context or the detailed information that the retriever used to generate the answer to the query. It provides insight into how the query was answered by showing the relevant text chunks and any additional information used.

### Basic Retrieval (just vector store lookup):

In [None]:
answer = retriever.infer(question=query,top_k=5)

2025-01-19 19:29:39,831 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


[32mINFO[0m: [1mAnswering question[0m
[32mINFO[0m: [1mGenerating response[0m


2025-01-19 19:29:44,865 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32mINFO[0m: [1mResponse generated successfully[0m


In [11]:
from pprint import pprint
pprint(answer)

('Cinderella reached her happy ending through a series of magical events and '
 'her perseverance despite the hardships she faced. After being mistreated by '
 'her stepmother and stepsisters, she received help from a magical hazel tree '
 'and a little white bird that granted her wishes. Each time she wished for '
 "beautiful dresses to wear to the king's festival, the bird provided her with "
 'more splendid attire, allowing her to attend the events where she captured '
 "the attention of the king's son.\n"
 '\n'
 'Despite her efforts to escape and return home after each festival, the '
 "king's son devised a plan to find her by using a golden slipper that she "
 'left behind. When he searched for the owner of the slipper, Cinderella was '
 'finally called to try it on. The slipper fit perfectly, revealing her true '
 "identity as the beautiful maiden he had danced with. The king's son "
 'recognized her as the true bride, and they rode away together, leading to '
 "their wedding. In

### Hybrid Retrieval (validates context & uses web fallback if needed):

In [11]:
answer = retriever.infer(
    question="who is the president of united states?",
    top_k=5,
    use_smart_retrieval=True,
    min_relevance_score=0.7
)

[32mINFO[0m: [1mUsing smart retrieval[0m


2025-01-19 19:47:41,658 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


[32mERROR[0m: [31m[1mError grading document: RetryError[<Future at 0x21967c0c110 state=finished raised TypeError>][0m
[31mERROR[0m: [31m[1mError grading document: RetryError[<Future at 0x21967c0c110 state=finished raised TypeError>][0m
[32mERROR[0m: [31m[1mError grading document: RetryError[<Future at 0x21967b7ed20 state=finished raised TypeError>][0m
[31mERROR[0m: [31m[1mError grading document: RetryError[<Future at 0x21967b7ed20 state=finished raised TypeError>][0m
[32mERROR[0m: [31m[1mError grading document: RetryError[<Future at 0x21967b0edb0 state=finished raised TypeError>][0m
[31mERROR[0m: [31m[1mError grading document: RetryError[<Future at 0x21967b0edb0 state=finished raised TypeError>][0m
[32mERROR[0m: [31m[1mError grading document: RetryError[<Future at 0x21967cc4620 state=finished raised TypeError>][0m
[31mERROR[0m: [31m[1mError grading document: RetryError[<Future at 0x21967cc4620 state=finished raised TypeError>][0m
[32mERROR[0m: 

ModuleNotFoundError: No module named 'duckduckgo_search'

In [21]:
pprint(retriever.conversation_history[1])

QueryResult(question='where does messi play right now?',
            answer='Based on the provided context, there is no information '
                   'regarding where Messi currently plays. The context appears '
                   'to be a narrative from a fairy tale, likely "Cinderella," '
                   'and does not mention any sports or athletes. Therefore, I '
                   "cannot provide an answer to the question about Messi's "
                   'current playing location. \n'
                   '\n'
                   'As of my last knowledge update in October 2023, Lionel '
                   'Messi plays for Inter Miami CF in Major League Soccer '
                   '(MLS).',
            context=['the left, and the younger at the right, and then the '
                     'pigeons\n'
                     'pecked out the other eye from each.  And thus, for '
                     'their\n'
                     'wickedness and falsehood, they were punished with '
  