# Quick Start

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/osllmai/inDox/blob/master/Demo/quick_start.ipynb)

In [None]:
!pip install indox
!pip install openai
!pip install chromadb
!pip install semantic_text_splitter

## Setting Up the Python Environment

If you are running this project in your local IDE, please create a Python environment to ensure all dependencies are correctly managed. You can follow the steps below to set up a virtual environment named `indox`:

### Windows

1. **Create the virtual environment:**
```bash
python -m venv indox
```
2. **Activate the virtual environment:**
```bash
indox_judge\Scripts\activate
```

### macOS/Linux

1. **Create the virtual environment:**
   ```bash
   python3 -m venv indox
```

2. **Activate the virtual environment:**
    ```bash
   source indox/bin/activate
```
### Install Dependencies

Once the virtual environment is activated, install the required dependencies by running:

```bash
pip install -r requirements.txt
```


In [1]:
import os
from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.environ['OPENAI_API_KEY']

## Initial Setup

The following imports are essential for setting up the Indox application. These imports include the main Indox retrieval augmentation module, question-answering models, embeddings, and data loader splitter.

In [2]:
from indox import IndoxRetrievalAugmentation

indox = IndoxRetrievalAugmentation()

[32mINFO[0m: [1mIndoxRetrievalAugmentation initialized[0m

            ██  ███    ██  ██████   ██████  ██       ██
            ██  ████   ██  ██   ██ ██    ██   ██  ██
            ██  ██ ██  ██  ██   ██ ██    ██     ██
            ██  ██  ██ ██  ██   ██ ██    ██   ██   ██
            ██  ██  █████  ██████   ██████  ██       ██
            


### Generating response using OpenAI's language models 
OpenAIQA class is used to handle question-answering task using OpenAI's language models. This instance creates OpenAiEmbedding class to specifying embedding model. Here ChromaVectorStore handles the storage and retrieval of vector embeddings by specifying a collection name and sets up a vector store where text embeddings can be stored and queried.

In [None]:
from indox.embeddings import OpenAiEmbedding
from indox.llms import OpenAi

openai_qa = OpenAi(api_key=OPENAI_API_KEY, model="gpt-4o-mini")
embed_openai = OpenAiEmbedding(api_key=OPENAI_API_KEY, model="text-embedding-3-small")

from indox.vector_stores import Chroma
db = Chroma(collection_name="sample",embedding_function=embed_openai)

### load and preprocess data
This part of code demonstrates how to load and preprocess text data from a file, split it into chunks, and store these chunks in the vector store that was set up previously.

In [4]:
!wget https://raw.githubusercontent.com/osllmai/inDox/master/Demo/sample.txt

^C


In [6]:
file_path = "sample.txt"

In [9]:
from indox.data_loaders import Txt

loader = Txt(txt_path=file_path)
doc = loader.load()

In [11]:
from indox.splitter import SemanticTextSplitter
splitter = SemanticTextSplitter(chunk_size=400)
docs = splitter.split_text(doc)

In [12]:
docs

["The wife of a rich man fell sick, and as she felt that her end\nwas drawing near, she called her only daughter to her bedside and\nsaid, dear child, be good and pious, and then the\ngood God will always protect you, and I will look down on you\nfrom heaven and be near you.  Thereupon she closed her eyes and\ndeparted.  Every day the maiden went out to her mother's grave,\nand wept, and she remained pious and good.  When winter came\nthe snow spread a white sheet over the grave, and by the time the\nspring sun had drawn it off again, the man had taken another wife.\nThe woman had brought with her into the house two daughters,\nwho were beautiful and fair of face, but vile and black of heart.\nNow began a bad time for the poor step-child.  Is the stupid goose\nto sit in the parlor with us, they said.  He who wants to eat bread\nmust earn it.  Out with the kitchen-wench.  They took her pretty\nclothes away from her, put an old grey bedgown on her, and gave\nher wooden shoes.  Just look 

In [13]:
db.add(docs=docs)

[32mINFO[0m: [1mStoring documents in the vector store[0m
[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings for texts using engine: text-embedding-3-small[0m
[32mINFO[0m: [1mDocument added successfully to the vector store.[0m
[32mINFO[0m: [1mDocuments stored successfully[0m


### Retrieve relevant information and generate an answer
The main purpose of these lines is to perform a query on the vector store to retrieve the most relevant information (top_k=5) and generate an answer using the language model.

In [15]:
query = "How Cinderella reach her happy ending?"
retriever = indox.QuestionAnswer(vector_database=db, llm=openai_qa, top_k=5)

invoke(query) method sends the query to the retriever, which searches the vector store for relevant text chunks and uses the language model to generate a response based on the retrieved information.
Context property retrieves the context or the detailed information that the retriever used to generate the answer to the query. It provides insight into how the query was answered by showing the relevant text chunks and any additional information used.

In [16]:
answer = retriever.invoke(query)
context = retriever.context

[32mINFO[0m: [1mRetrieving context and scores from the vector database[0m
[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings for texts using engine: text-embedding-3-small[0m
[32mINFO[0m: [1mGenerating answer without document relevancy filter[0m
[32mINFO[0m: [1mAnswering question[0m
[32mINFO[0m: [1mGenerating response[0m
[32mINFO[0m: [1mResponse generated successfully[0m
[32mINFO[0m: [1mQuery answered successfully[0m


In [17]:
answer

"Cinderella reached her happy ending by attending the three-day festival at the king's palace with the help of the hazel tree and the little white bird. Each day, she went to her mother's grave, where she asked the tree for beautiful dresses and accessories, which the bird would then provide for her. On the third day, she received a splendid dress and golden slippers. At the festival, the king's son danced only with her and was captivated by her beauty. When Cinderella tried to leave, the king's son followed her, but she escaped. However, he found her golden slipper, and he declared that he would marry the woman whose foot fit the slipper. After trying the slipper on Cinderella, they recognized each other, and the king's son declared her as the true bride. Despite the opposition from her stepmother and stepsisters, Cinderella rode away with the king's son, and they lived happily ever after."

### With AgenticRag

AgenticRag stands for Agentic Retrieval-Augmented Generation. This concept combines retrieval-based methods and generation-based methods in natural language processing (NLP). The key idea is to enhance the generative capabilities of a language model by incorporating relevant information retrieved from a database or a vector store. 
 AgenticRag is designed to provide more contextually rich and accurate responses by utilizing external knowledge sources. It retrieves relevant pieces of information (chunks) from a vector store based on a query and then uses a language model to generate a comprehensive response that incorporates this retrieved information.

In [None]:
agent = indox.AgenticRag(llm=openai_qa, vector_database=db, top_k=5)
agent.run(query)

In [3]:
import os
from dotenv import load_dotenv
load_dotenv()
INDOX_API_KEY= os.getenv("INDOX_API_KEY")

In [4]:
from indox.llms import IndoxApi
llm = IndoxApi(api_key=INDOX_API_KEY)
llm.chat("tell me a joke")

'Why did the scarecrow win an award?\n\nBecause he was outstanding in his field!'