[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/osllmai/inDox/blob/master/cookbook/indoxRag/openai_agenticrag.ipynb)

In [None]:
!pip install indoxRag
!pip install openai
!pip install chromadb
!pip install duckduckgo-search

## Setting Up the Python Environment

If you are running this project in your local IDE, please create a Python environment to ensure all dependencies are correctly managed. You can follow the steps below to set up a virtual environment named `indox`:

### Windows

1. **Create the virtual environment:**
```bash
python -m venv indox
```
2. **Activate the virtual environment:**
```bash
indox\Scripts\activate
```

### macOS/Linux

1. **Create the virtual environment:**
   ```bash
   python3 -m venv indox
```

2. **Activate the virtual environment:**
    ```bash
   source indox/bin/activate
```
### Install Dependencies

Once the virtual environment is activated, install the required dependencies by running:

```bash
pip install -r requirements.txt
```


## Agentic RAG
Here, we will explore how to work with Agentic RAG. We are using OpenAI and we should set our OPENAI_API_KEY as an environment variable.

In [1]:
import os
from dotenv import load_dotenv

load_dotenv()
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')

## Creating an instance of IndoxRetrievalAugmentation
You must first create an instance of IndoxRetrievalAugmentation class. This instance will allow you to access the methods and properties defined within the class, enabling the augmentation and retrieval functionalities.

In [None]:
from indoxRag import IndoxRetrievalAugmentation
from indoxRag.llms import OpenAi
from indoxRag.embeddings import OpenAiEmbedding
from indoxRag.data_loader_splitter import SimpleLoadAndSplit

Create OpenAi model as LLM_model and OpenAiEmbedding as Embedding model and using them to generate response.

In [3]:
indox = IndoxRetrievalAugmentation()
llm_model = OpenAi(api_key=OPENAI_API_KEY,model="gpt-3.5-turbo-0125")
embed = OpenAiEmbedding(api_key=OPENAI_API_KEY,model="text-embedding-3-small")

[32mINFO[0m: [1mIndoxRetrievalAugmentation initialized[0m

            ██  ███    ██  ██████   ██████  ██       ██
            ██  ████   ██  ██   ██ ██    ██   ██  ██
            ██  ██ ██  ██  ██   ██ ██    ██     ██
            ██  ██  ██ ██  ██   ██ ██    ██   ██   ██
            ██  ██  █████  ██████   ██████  ██       ██
            
[32mINFO[0m: [1mInitializing OpenAi with model: gpt-3.5-turbo-0125[0m
[32mINFO[0m: [1mOpenAi initialized successfully[0m
[32mINFO[0m: [1mInitialized OpenAI embeddings with model: text-embedding-3-small[0m


In [4]:
indox.__version__

'0.1.13'

### You can download the file from the below address 

In [None]:
!wget https://raw.githubusercontent.com/osllmai/inDox/master/Demo/sample.txt

## Preprocess Data
using SimpleLoadAndSplit class to preprocess text data from a file, split text into chunks

In [6]:
loader_splitter = SimpleLoadAndSplit(file_path="sample.txt",remove_sword=False)

[32mINFO[0m: [1mUnstructuredLoadAndSplit initialized successfully[0m


In [7]:
docs = loader_splitter.load_and_chunk()

[32mINFO[0m: [1mStarting processing[0m
[32mINFO[0m: [1mCreated initial document elements[0m
[32mINFO[0m: [1mCompleted chunking process[0m
[32mINFO[0m: [1mSuccessfully obtained all documents[0m


## Create ChromaVectoreStore instance
Here ChromaVectorStore handles the storage and retrieval of vector embeddings by specifying a collection name and sets up a vector store where text embeddings can be stored and queried.

In [None]:
from indoxRag.vector_stores import Chroma

# Define the collection name within the vector store
collection_name = "sample"

# Create a ChromaVectorStore instance
db = Chroma(collection_name=collection_name, embedding_function=embed)


[32mINFO[0m: [1mConnection to the vector store database established successfully[0m


<indox.vector_stores.Chroma.ChromaVectorStore at 0x208f44d0560>

store the chunks in the vector store that was set up previously.

In [9]:
db.add(docs=docs)

[32mINFO[0m: [1mStoring documents in the vector store[0m
[32mINFO[0m: [1mDocument added successfully to the vector store.[0m
[32mINFO[0m: [1mDocuments stored successfully[0m


<indox.vector_stores.Chroma.ChromaVectorStore at 0x208f44d0560>

## Retrieve relevant information by question-answering model
At this step we are using QuestionAnswer model and try to retrieve the answer just by our file and without any agent

In [10]:
query = "Where does messi plays right now?"
retriever = indox.QuestionAnswer(vector_database=db,llm=llm_model,top_k=3)

In [11]:
retriever.invoke(query)

[32mINFO[0m: [1mRetrieving context and scores from the vector database[0m
[32mINFO[0m: [1mGenerating answer without document relevancy filter[0m
[32mINFO[0m: [1mAnswering question[0m
[32mINFO[0m: [1mGenerating response[0m
[32mINFO[0m: [1mResponse generated successfully[0m
[32mINFO[0m: [1mQuery answered successfully[0m


"I'm sorry, but the given context does not contain any information about Lionel Messi's current football club."

## Retrieve information by using Agnet
Here we are using Agent to retrieve answer. As you can see, our last try was unsuccessful but now after the agent couldn't find the answer it started to search on the internet.
Note: to be more familiar with AgenticRAG pleas read [this page]("https://docs.osllm.ai/agenticRag.html")

In [12]:
agent = indox.AgenticRag(llm=llm_model,vector_database=db,top_k=3)
agent.run(query)

[32mINFO[0m: [1mGenerating response[0m
[32mERROR[0m: [31m[1mError generating response: Request timed out.[0m
[31mERROR[0m: [31m[1mError generating response: Request timed out.[0m
[32mINFO[0m: [1mGenerating response[0m
[32mINFO[0m: [1mResponse generated successfully[0m
[32mINFO[0m: [1mNot relevant doc[0m
[32mINFO[0m: [1mGenerating response[0m
[32mINFO[0m: [1mResponse generated successfully[0m
[32mINFO[0m: [1mNot relevant doc[0m
[32mINFO[0m: [1mGenerating response[0m
[32mINFO[0m: [1mResponse generated successfully[0m
[32mINFO[0m: [1mNot relevant doc[0m
[32mINFO[0m: [1mNo Relevant document found, Start web search[0m
[32mINFO[0m: [1mNo Relevant Context Found, Start Searching On Web...[0m
[32mINFO[0m: [1mAnswer Base On Web Search[0m
[32mINFO[0m: [1mAnswering question[0m
[32mINFO[0m: [1mGenerating response[0m
[32mINFO[0m: [1mResponse generated successfully[0m
[32mINFO[0m: [1mCheck For Hallucination In Generated Answ

"Lionel Messi currently plays for Major League Soccer's Inter Miami CF."