# Quick Start

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/osllmai/inDox/blob/master/Demo/quick_start.ipynb)

In [None]:
!pip install indox
!pip install openai
!pip install chromadb

## Setting Up the Python Environment

If you are running this project in your local IDE, please create a Python environment to ensure all dependencies are correctly managed. You can follow the steps below to set up a virtual environment named `indox`:

### Windows

1. **Create the virtual environment:**
```bash
python -m venv indox
```
2. **Activate the virtual environment:**
```bash
indox_judge\Scripts\activate
```

### macOS/Linux

1. **Create the virtual environment:**
   ```bash
   python3 -m venv indox
```

2. **Activate the virtual environment:**
    ```bash
   source indox/bin/activate
```
### Install Dependencies

Once the virtual environment is activated, install the required dependencies by running:

```bash
pip install -r requirements.txt
```


In [5]:
import os
from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.environ['OPENAI_API_KEY']

## Initial Setup

The following imports are essential for setting up the Indox application. These imports include the main Indox retrieval augmentation module, question-answering models, embeddings, and data loader splitter.

In [6]:
from indox import IndoxRetrievalAugmentation

indox = IndoxRetrievalAugmentation()

[32mINFO[0m: [1mIndoxRetrievalAugmentation initialized[0m

            ██  ███    ██  ██████   ██████  ██       ██
            ██  ████   ██  ██   ██ ██    ██   ██  ██
            ██  ██ ██  ██  ██   ██ ██    ██     ██
            ██  ██  ██ ██  ██   ██ ██    ██   ██   ██
            ██  ██  █████  ██████   ██████  ██       ██
            


### Generating response using OpenAI's language models 
OpenAIQA class is used to handle question-answering task using OpenAI's language models. This instance creates OpenAiEmbedding class to specifying embedding model. Here ChromaVectorStore handles the storage and retrieval of vector embeddings by specifying a collection name and sets up a vector store where text embeddings can be stored and queried.

In [7]:
from indox.llms import OpenAi
from indox.embeddings import OpenAiEmbedding

openai_qa = OpenAi(api_key=OPENAI_API_KEY, model="gpt-3.5-turbo-0125")
embed_openai = OpenAiEmbedding(api_key=OPENAI_API_KEY, model="text-embedding-3-small")

from indox.vector_stores import Chroma
db = Chroma(collection_name="sample",embedding_function=embed_openai)
indox.connect_to_vectorstore(vectorstore_database=db)


[32mINFO[0m: [1mInitializing OpenAi with model: gpt-3.5-turbo-0125[0m
[32mINFO[0m: [1mOpenAi initialized successfully[0m
[32mINFO[0m: [1mInitialized OpenAiEmbedding with model: text-embedding-3-small[0m


### load and preprocess data
This part of code demonstrates how to load and preprocess text data from a file, split it into chunks, and store these chunks in the vector store that was set up previously.

In [4]:
!wget https://raw.githubusercontent.com/osllmai/inDox/master/Demo/sample.txt

^C


In [None]:
file_path = "sample.txt"

In [None]:
from indox.data_loader_splitter import SimpleLoadAndSplit

loader_splitter = SimpleLoadAndSplit(file_path=file_path, max_chunk_size=400)
docs = loader_splitter.load_and_chunk()

In [10]:
indox.store_in_vectorstore(docs=docs)

[32mINFO[0m: [1mStoring documents in the vector store[0m
[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings for texts using engine: text-embedding-3-small[0m
[32mINFO[0m: [1mDocument added successfully to the vector store.[0m
[32mINFO[0m: [1mDocuments stored successfully[0m


<indox.vector_stores.weaviate.Weaviate at 0x2b70f8f10a0>

### Retrieve relevant information and generate an answer
The main purpose of these lines is to perform a query on the vector store to retrieve the most relevant information (top_k=5) and generate an answer using the language model.

In [8]:
query = "How Cinderella reach her happy ending?"
retriever = indox.QuestionAnswer(vector_database=db, llm=openai_qa, top_k=5)

invoke(query) method sends the query to the retriever, which searches the vector store for relevant text chunks and uses the language model to generate a response based on the retrieved information.
Context property retrieves the context or the detailed information that the retriever used to generate the answer to the query. It provides insight into how the query was answered by showing the relevant text chunks and any additional information used.

In [9]:
answer = retriever.invoke(query)
context = retriever.context

[32mINFO[0m: [1mRetrieving context and scores from the vector database[0m
[32mINFO[0m: [1mEmbedding documents[0m
[32mINFO[0m: [1mStarting to fetch embeddings for texts using engine: text-embedding-3-small[0m
1
[]
Embedding: [0.0326363667845726, 0.0010570012964308262, 0.02692067250609398, 0.031448688358068466, -0.009588015265762806, -0.018829625099897385, 0.006052820943295956, 0.0707162469625473, 0.019250260666012764, 0.01079424936324358, -0.008660142309963703, 0.016466641798615456, -0.046715281903743744, 0.026054657995700836, 0.0055424910970032215, 0.012421119026839733, -0.004051709547638893, 0.014524295926094055, -0.0027449559420347214, 0.006581708323210478, 0.02407519705593586, 0.03565504401922226, 0.03953973576426506, -0.04372134804725647, -0.03276008367538452, 0.008969433605670929, -0.004026966169476509, -0.03867372125387192, 0.019918328151106834, 0.03389827162027359, -0.005595070775598288, -0.03469005599617958, 0.0032228101044893265, -0.02322155423462391, -0.0101261809

IndexError: list index out of range

In [14]:
answer

"Cinderella reached her happy ending by attending the wedding at the king's palace despite her stepmother's initial reluctance. She was able to go to the dance and participate in the festivities, ultimately capturing the prince's heart and living happily ever after."

### With AgenticRag

AgenticRag stands for Agentic Retrieval-Augmented Generation. This concept combines retrieval-based methods and generation-based methods in natural language processing (NLP). The key idea is to enhance the generative capabilities of a language model by incorporating relevant information retrieved from a database or a vector store. 
 AgenticRag is designed to provide more contextually rich and accurate responses by utilizing external knowledge sources. It retrieves relevant pieces of information (chunks) from a vector store based on a query and then uses a language model to generate a comprehensive response that incorporates this retrieved information.

In [None]:
agent = indox.AgenticRag(llm=openai_qa, vector_database=db, top_k=5)
agent.run(query)