[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/osllmai/inDox/blob/master/Demo/openai_agenticrag.ipynb)

In [None]:
!pip install indox
!pip install openai
!pip install chromadb
!pip install duckduckgo-search

## Agentic RAG
Here, we will explore how to work with Agentic RAG. We are using OpenAI and we should set our OPENAI_API_KEY as an environment variable.

In [1]:
import os
from dotenv import load_dotenv

load_dotenv()
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')

## Creating an instance of IndoxRetrievalAugmentation
You must first create an instance of IndoxRetrievalAugmentation class. This instance will allow you to access the methods and properties defined within the class, enabling the augmentation and retrieval functionalities.

In [2]:
from indox import IndoxRetrievalAugmentation
from indox.llms import OpenAi
from indox.embeddings import OpenAiEmbedding
from indox.data_loader_splitter import SimpleLoadAndSplit

Create OpenAi model as LLM_model and OpenAiEmbedding as Embedding model and using them to generate response.

In [3]:
indox = IndoxRetrievalAugmentation()
llm_model = OpenAi(api_key=OPENAI_API_KEY,model="gpt-3.5-turbo-0125")
embed = OpenAiEmbedding(api_key=OPENAI_API_KEY,model="text-embedding-3-small")

2024-07-07 12:05:55,022 INFO:IndoxRetrievalAugmentation initialized
2024-07-07 12:05:55,025 INFO:Initializing OpenAi with model: gpt-3.5-turbo-0125
2024-07-07 12:05:55,647 INFO:OpenAi initialized successfully
2024-07-07 12:05:57,022 INFO:Initialized OpenAI embeddings with model: text-embedding-3-small


In [4]:
indox.__version__

'0.1.13'

### You can download the file from the below address 

In [None]:
!wget https://raw.githubusercontent.com/osllmai/inDox/master/Demo/sample.txt

## Preprocess Data
using SimpleLoadAndSplit class to preprocess text data from a file, split text into chunks

In [5]:
loader_splitter = SimpleLoadAndSplit(file_path="sample.txt",remove_sword=False)

2024-07-07 12:05:57,830 INFO:Initializing UnstructuredLoadAndSplit
2024-07-07 12:05:57,831 INFO:UnstructuredLoadAndSplit initialized successfully


In [6]:
docs = loader_splitter.load_and_chunk()

2024-07-07 12:05:58,850 INFO:Getting all documents
2024-07-07 12:05:58,851 INFO:Starting processing
2024-07-07 12:05:58,886 INFO:Created initial document elements
2024-07-07 12:06:02,457 INFO:Completed chunking process
2024-07-07 12:06:02,458 INFO:Successfully obtained all documents


## Create ChromaVectoreStore instance
Here ChromaVectorStore handles the storage and retrieval of vector embeddings by specifying a collection name and sets up a vector store where text embeddings can be stored and queried.

In [7]:
from indox.vector_stores import ChromaVectorStore

# Define the collection name within the vector store
collection_name = "sample"

# Create a ChromaVectorStore instance
db = ChromaVectorStore(collection_name=collection_name, embedding=embed)

# Connect to the vector store using the provided database instance
indox.connect_to_vectorstore(vectorstore_database=db)

2024-07-07 12:06:04,504 INFO:Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.
2024-07-07 12:06:04,762 INFO:Attempting to connect to the vector store database
2024-07-07 12:06:04,763 INFO:Connection to the vector store database established successfully


<indox.vector_stores.Chroma.ChromaVectorStore at 0x195f52efb30>

store the chunks in the vector store that was set up previously.

In [9]:
indox.store_in_vectorstore(docs=docs)

2024-07-07 12:06:26,394 INFO:Storing documents in the vector store
2024-07-07 12:06:28,051 INFO:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-07-07 12:06:28,589 INFO:Document added successfully to the vector store.
2024-07-07 12:06:28,589 INFO:Documents stored successfully


<indox.vector_stores.Chroma.ChromaVectorStore at 0x195f52efb30>

## Retrieve relevant information by question-answering model
At this step we are using QuestionAnswer model and try to retrieve the answer just by our file and without any agent

In [17]:
query = "Where does messi plays right now?"
retriever = indox.QuestionAnswer(vector_database=db,llm=llm_model,top_k=3)

In [18]:
retriever.invoke(query)

2024-07-07 12:10:05,391 INFO:Retrieving context and scores from the vector database
2024-07-07 12:10:07,036 INFO:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-07-07 12:10:07,179 INFO:Generating answer without document relevancy filter
2024-07-07 12:10:07,179 INFO:Answering question
2024-07-07 12:10:07,180 INFO:Generating response
2024-07-07 12:10:09,567 INFO:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-07-07 12:10:09,569 INFO:Response generated successfully
2024-07-07 12:10:09,569 INFO:Query answered successfully


"I'm sorry, but the given context does not contain any information about where Lionel Messi currently plays."

## Retrieve information by using Agnet
Here we are using Agent to retrieve answer. As you can see, our last try was unsuccessful but now after the agent couldn't find the answer it started to search on the internet.
Note: to be more familiar with AgenticRAG pleas read [this page]("https://docs.osllm.ai/agenticRag.html")

In [19]:
agent = indox.AgenticRag(llm=llm_model,vector_database=db,top_k=3)
agent.run(query)

2024-07-07 12:10:17,315 INFO:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-07-07 12:10:17,412 INFO:Generating response
2024-07-07 12:10:18,649 INFO:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-07-07 12:10:18,739 INFO:Response generated successfully
2024-07-07 12:10:18,740 INFO:Not relevant doc
2024-07-07 12:10:18,740 INFO:Generating response
2024-07-07 12:10:19,505 INFO:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-07-07 12:10:19,507 INFO:Response generated successfully
2024-07-07 12:10:19,507 INFO:Not relevant doc
2024-07-07 12:10:19,508 INFO:Generating response
2024-07-07 12:10:20,220 INFO:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-07-07 12:10:20,222 INFO:Response generated successfully
2024-07-07 12:10:20,222 INFO:Not relevant doc
2024-07-07 12:10:20,223 INFO:No Relevant document found, Start web search


No Relevant Context Found, Start Searching On Web...


2024-07-07 12:10:23,084 INFO:Answering question
2024-07-07 12:10:23,084 INFO:Generating response


Answer Base On Web Search


2024-07-07 12:10:24,077 INFO:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-07-07 12:10:24,213 INFO:Response generated successfully
2024-07-07 12:10:24,214 INFO:Checking hallucination for answer
2024-07-07 12:10:24,215 INFO:Generating response


Check For Hallucination In Generated Answer Base On Web Search


2024-07-07 12:10:24,884 INFO:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-07-07 12:10:24,884 INFO:Response generated successfully
2024-07-07 12:10:24,885 INFO:Hallucination detected, Regenerate the answer...
2024-07-07 12:10:24,885 INFO:Answering question
2024-07-07 12:10:24,886 INFO:Generating response
2024-07-07 12:10:26,180 INFO:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-07-07 12:10:26,181 INFO:Response generated successfully


"Lionel Messi currently plays for Major League Soccer's Inter Miami CF."