## Indox Retrieval Augmentation
Here, we will explore how to work with Indox Retrieval Augmentation. We are using Mistral, we should set our INDOX_OPENAI_API_KEY as an environment variable.

In [1]:
import os
from dotenv import load_dotenv

load_dotenv()
INDOX_OPENAI_API_KEY= os.getenv("INDOX_OPENAI_API_KEY")

### Creating an instance of IndoxTetrivalAugmentation

To effectively utilize the Indox Retrieval Augmentation capabilities, you must first create an instance of the IndoxRetrievalAugmentation class. This instance will allow you to access the methods and properties defined within the class, enabling the augmentation and retrieval functionalities.

In [4]:
from indox import IndoxRetrievalAugmentation
indox = IndoxRetrievalAugmentation()

2024-06-22 20:04:54,558 INFO:IndoxRetrievalAugmentation initialized


### Generating response using Indox
IndoxApi class is used to handle question-answering task using Indox model. This instance creates IndoxOpenAIEmbedding class to specifying embedding model.By using ClusteredSplit function we can import pdf and text file and split them into chunks.

In [5]:
from indox.llms import IndoxApi
from indox.embeddings import IndoxApiEmbedding
from indox.data_loader_splitter import ClusteredSplit
openai_qa_indox = IndoxApi(api_key=INDOX_OPENAI_API_KEY)
embed_openai_indox = IndoxApiEmbedding(api_key=INDOX_OPENAI_API_KEY,model="text-embedding-3-small")

file_path = "sample.txt"
loader_splitter = ClusteredSplit(file_path=file_path,embeddings=embed_openai_indox,summary_model=openai_qa_indox)
docs = loader_splitter.load_and_chunk()

2024-06-22 20:05:04,710 INFO:Initialized IndoxOpenAIEmbedding with model: text-embedding-3-small and middle URL: http://5.78.55.161/api/embedding/generate/
2024-06-22 20:05:04,710 INFO:Initializing ClusteredSplit
2024-06-22 20:05:04,711 INFO:ClusteredSplit initialized successfully
2024-06-22 20:05:04,711 INFO:Getting all documents
2024-06-22 20:05:04,712 INFO:Starting processing for documents
2024-06-22 20:05:04,869 INFO:Embedding documents with chunk size: 0
2024-06-22 20:05:04,869 INFO:Starting to fetch embeddings for 35 texts using engine: text-embedding-3-small


--Generated 6 clusters--


2024-06-22 20:06:36,626 INFO:Embedding documents with chunk size: 0
2024-06-22 20:06:36,626 INFO:Starting to fetch embeddings for 6 texts using engine: text-embedding-3-small


--Generated 1 clusters--


2024-06-22 20:06:52,311 INFO:Completed chunking & clustering process
2024-06-22 20:06:52,313 INFO:Successfully obtained all documents


 Here ChromaVectorStore handles the storage and retrieval of vector embeddings by specifying a collection name and sets up a vector store where text embeddings can be stored and queried.

In [6]:
from indox.vector_stores import ChromaVectorStore
db = ChromaVectorStore(collection_name="sample",embedding=embed_openai_indox)
indox.connect_to_vectorstore(vectorstore_database=db)

2024-06-22 20:07:08,810 INFO:Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.
2024-06-22 20:07:09,037 INFO:Attempting to connect to the vector store database
2024-06-22 20:07:09,038 INFO:Connection to the vector store database established successfully


<indox.vector_stores.Chroma.ChromaVectorStore at 0x132a01603e0>

### load and preprocess data
This part of code demonstrates how to load and preprocess text data from a file, split it into chunks, and store these chunks in the vector store that was set up previously.

In [7]:
indox.store_in_vectorstore(docs=docs)

2024-06-22 20:07:11,994 INFO:Storing documents in the vector store
2024-06-22 20:07:11,995 INFO:Embedding documents with chunk size: 0
2024-06-22 20:07:11,996 INFO:Starting to fetch embeddings for 42 texts using engine: text-embedding-3-small
2024-06-22 20:08:27,848 INFO:Document added successfully to the vector store.
2024-06-22 20:08:27,849 INFO:Documents stored successfully


<indox.vector_stores.Chroma.ChromaVectorStore at 0x132a01603e0>

### Retrieve relevant information and generate an answer
The main purpose of these lines is to perform a query on the vector store to retrieve the most relevant information (top_k=5) and generate an answer using the language model.

In [8]:
query = "How cinderella reach her happy ending?"
retriever = indox.QuestionAnswer(vector_database=db,llm=openai_qa_indox,top_k=5)

invoke(query) method sends the query to the retriever, which searches the vector store for relevant text chunks and uses the language model to generate a response based on the retrieved information.
Context property retrieves the context or the detailed information that the retriever used to generate the answer to the query. It provides insight into how the query was answered by showing the relevant text chunks and any additional information used.

In [9]:
retriever.invoke(query)

2024-06-22 20:08:27,862 INFO:Retrieving context and scores from the vector database
2024-06-22 20:08:27,862 INFO:Embedding query text: How cinderella reach her happy ending?
2024-06-22 20:08:27,863 INFO:Embedding documents with chunk size: 0
2024-06-22 20:08:27,864 INFO:Starting to fetch embeddings for 1 texts using engine: text-embedding-3-small
2024-06-22 20:08:30,005 INFO:Generating answer without document relevancy filter
2024-06-22 20:08:33,104 INFO:Query answered successfully


"Cinderella reaches her happy ending in various ways across different retellings of the classic fairy tale. In one version, she attends a royal festival with the help of a magical bird that grants her wishes, catches the prince's attention, and is eventually recognized as his true love. In another version, she is aided by a fairy godmother to attend a festival where she captures the prince's heart but has to leave before midnight, leaving behind a glass slipper. The prince searches for the owner of the slipper and eventually finds Cinderella, leading to their reunion and happily ever after. In another retelling, Cinderella proves her identity by fitting perfectly into a golden shoe, which the prince recognizes, and they are reunited while the wicked steps"

In [10]:
retriever.context

["The documentation provided is a retelling of the classic fairy tale of Cinderella. It describes how Cinderella, a kind and beautiful young girl, is mistreated by her stepmother and stepsisters but is helped by a magical bird that grants her wishes. Despite their cruelty, Cinderella helps her stepsisters prepare for a royal festival where the prince is to choose a bride. With the help of the bird, Cinderella is able to attend the festival in beautiful dresses and catches the prince's attention. Eventually, the prince recognizes Cinderella as his true love, and they ride off together on his horse. The story also includes elements such as a hazel tree, white doves, and a wedding where justice is served to the wicked stepsisters.",
 'The documentation provided is a retelling of the classic fairy tale of Cinderella. It starts with the wife of a rich man giving advice to her daughter to be good and pious. After the mother\'s death, the daughter is mistreated by her stepmother and stepsiste

### With AgenticRag

AgenticRag stands for Agentic Retrieval-Augmented Generation. This concept combines retrieval-based methods and generation-based methods in natural language processing (NLP). The key idea is to enhance the generative capabilities of a language model by incorporating relevant information retrieved from a database or a vector store. 
 AgenticRag is designed to provide more contextually rich and accurate responses by utilizing external knowledge sources. It retrieves relevant pieces of information (chunks) from a vector store based on a query and then uses a language model to generate a comprehensive response that incorporates this retrieved information.

In [11]:
agent = indox.AgenticRag(llm=openai_qa_indox,vector_database=db,top_k=5)
agent.run(query)

2024-06-22 20:09:15,370 INFO:Embedding query text: How cinderella reach her happy ending?
2024-06-22 20:09:15,371 INFO:Embedding documents with chunk size: 0
2024-06-22 20:09:15,373 INFO:Starting to fetch embeddings for 1 texts using engine: text-embedding-3-small


Relevant doc
Relevant doc
Relevant doc
Relevant doc
Relevant doc


2024-06-22 20:09:32,539 INFO:Hallucination detected, Regenerate the answer...


"Cinderella reaches her happy ending in various ways depending on the version of the fairy tale being referenced. In most versions, Cinderella's happy ending is achieved when the prince recognizes her as the true owner of the lost glass slipper, symbolizing her as the one he danced with at the royal event. This recognition leads to their reunion, and they ride off together to live happily ever after. Additionally, justice is often served to the wicked stepmother and stepsisters, while Cinderella's kindness, resilience, and inner beauty are rewarded with a life of love and happiness with the prince."

In [12]:
query_2 = "where does messi plays right now?"

In [13]:
agent.run(query_2)

2024-06-22 20:10:03,833 INFO:Embedding query text: where does messi plays right now?
2024-06-22 20:10:03,834 INFO:Embedding documents with chunk size: 0
2024-06-22 20:10:03,836 INFO:Starting to fetch embeddings for 1 texts using engine: text-embedding-3-small


Not Relevant doc
Not Relevant doc
Not Relevant doc
Not Relevant doc


2024-06-22 20:10:14,103 INFO:No Relevant document found, Start web search


Not Relevant doc
No Relevant Context Found, Start Searching On Web...
Answer Base On Web Search
Check For Hallucination In Generated Answer Base On Web Search


2024-06-22 20:10:24,255 INFO:Hallucination detected, Regenerate the answer...


"Lionel Messi currently plays for Major League Soccer's Inter Miami CF."