## Indox Retrieval Augmentation
Here, we will explore how to work with Indox Retrieval Augmentation. We are using OpenAI from Indox Api, we should set our INDOX_OPENAI_API_KEY as an environment variable.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/osllmai/inDox/blob/master/Demo/indox_api_openai.ipynb)

In [None]:
!pip install indox
!pip install chromadb

Collecting chromadb
  Downloading chromadb-0.5.3-py3-none-any.whl (559 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m559.5/559.5 kB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m
Collecting chroma-hnswlib==0.7.3 (from chromadb)
  Downloading chroma_hnswlib-0.7.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m55.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting fastapi>=0.95.2 (from chromadb)
  Downloading fastapi-0.111.0-py3-none-any.whl (91 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.0/92.0 kB[0m [31m13.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting uvicorn[standard]>=0.18.3 (from chromadb)
  Downloading uvicorn-0.30.1-py3-none-any.whl (62 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.4/62.4 kB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m
Collecting posthog>=2.4.0 (from chromadb)
  Downloading posthog-3.5.0-py2.

In [None]:
import os
from dotenv import load_dotenv
load_dotenv()
INDOX_API_KEY= os.getenv("INDOX_API_KEY")

### Creating an instance of IndoxTetrivalAugmentation

To effectively utilize the Indox Retrieval Augmentation capabilities, you must first create an instance of the IndoxRetrievalAugmentation class. This instance will allow you to access the methods and properties defined within the class, enabling the augmentation and retrieval functionalities.

In [None]:
from indox import IndoxRetrievalAugmentation
indox = IndoxRetrievalAugmentation()

### Generating response using Indox
IndoxApi class is used to handle question-answering task using Indox model. This instance creates IndoxOpenAIEmbedding class to specifying embedding model.By using ClusteredSplit function we can import pdf and text file and split them into chunks.

In [None]:
# Import necessary classes from Indox library
from indox.llms import IndoxApi
from indox.embeddings import IndoxApiEmbedding
from indox.data_loader_splitter import ClusteredSplit

# Create instances for API access and text embedding
openai_qa_indox = IndoxApi(api_key=INDOX_API_KEY)
embed_openai_indox = IndoxApiEmbedding(api_key=INDOX_API_KEY, model="text-embedding-3-small")

# Specify the path to your text file
file_path = "sample.txt"

# Create a ClusteredSplit instance for handling file loading and chunking
loader_splitter = ClusteredSplit(file_path=file_path, embeddings=embed_openai_indox, summary_model=openai_qa_indox)

# Load and split the document into chunks using ClusteredSplit
docs = loader_splitter.load_and_chunk()

--Generated 6 clusters--
--Generated 1 clusters--


In [None]:
docs[2]

'  They took her pretty clothes away from her, put an old grey bedgown on her, and gave her wooden shoes   Just look at the proud princess, how decked out she is, they cried, and laughed, and led her into the kitchen There she had to do hard work from morning till night, get up before daybreak, carry water, light fires, cook and wash   Besides this, the sisters did her every imaginable injury - they mocked her'

 Here ChromaVectorStore handles the storage and retrieval of vector embeddings by specifying a collection name and sets up a vector store where text embeddings can be stored and queried.

In [None]:
from indox.vector_stores import ChromaVectorStore

# Define the collection name within the vector store
collection_name = "sample"

# Create a ChromaVectorStore instance
db = ChromaVectorStore(collection_name=collection_name, embedding=embed_openai_indox)

# Connect to the vector store using the provided database instance
indox.connect_to_vectorstore(vectorstore_database=db)

<indox.vector_stores.Chroma.ChromaVectorStore at 0x7dbeb3878dc0>

### load and preprocess data
This part of code demonstrates how to load and preprocess text data from a file, split it into chunks, and store these chunks in the vector store that was set up previously.

In [None]:
indox.store_in_vectorstore(docs=docs)

<indox.vector_stores.Chroma.ChromaVectorStore at 0x7dbeb3878dc0>

### Retrieve relevant information and generate an answer
The main purpose of these lines is to perform a query on the vector store to retrieve the most relevant information (top_k=5) and generate an answer using the language model.

In [None]:
query = "How cinderella reach her happy ending?"
retriever = indox.QuestionAnswer(vector_database=db,llm=openai_qa_indox,top_k=5)

invoke(query) method sends the query to the retriever, which searches the vector store for relevant text chunks and uses the language model to generate a response based on the retrieved information.
Context property retrieves the context or the detailed information that the retriever used to generate the answer to the query. It provides insight into how the query was answered by showing the relevant text chunks and any additional information used.

In [None]:
retriever.invoke(query)

"Cinderella reaches her happy ending by attending a royal festival with the help of magical elements such as a hazel tree, birds, and a golden slipper. Despite being mistreated by her stepmother and stepsisters, Cinderella is able to capture the attention of the king's son at the festival. Ultimately, her true identity is revealed when she fits perfectly into the golden slipper, and the prince recognizes her as the one he danced with. They marry and live happily ever after."

In [None]:
retriever.context

["The documentation provided is a retelling of the classic fairy tale of Cinderella. It describes how Cinderella, despite being mistreated by her stepmother and stepsisters, is able to attend a royal festival with the help of a magical bird that grants her wishes. The bird provides Cinderella with beautiful dresses and shoes, allowing her to attend the festival and catch the eye of the king's son.\n\nAs the story progresses, Cinderella's true identity is revealed with the help of the magical bird and two white doves. The false stepsisters, who mistreated Cinderella, try to gain favor with her but are punished for their cruelty.\n\nIn the end, Cinderella marries the king's son and is able to live happily ever after.",
 'The provided documentation is a retelling of the classic fairy tale "Cinderella." It describes the story of a young maiden who is mistreated by her stepmother and stepsisters. In the story, the king\'s son is searching for a bride, and a golden slipper is used to find th

### With AgenticRag

AgenticRag stands for Agentic Retrieval-Augmented Generation. This concept combines retrieval-based methods and generation-based methods in natural language processing (NLP). The key idea is to enhance the generative capabilities of a language model by incorporating relevant information retrieved from a database or a vector store.
 AgenticRag is designed to provide more contextually rich and accurate responses by utilizing external knowledge sources. It retrieves relevant pieces of information (chunks) from a vector store based on a query and then uses a language model to generate a comprehensive response that incorporates this retrieved information.

In [None]:
!pip install duckduckgo_search

Collecting duckduckgo_search
  Downloading duckduckgo_search-6.1.7-py3-none-any.whl (24 kB)
Collecting pyreqwest-impersonate>=0.4.8 (from duckduckgo_search)
  Downloading pyreqwest_impersonate-0.4.8-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.7/2.7 MB[0m [31m27.3 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: pyreqwest-impersonate, duckduckgo_search
Successfully installed duckduckgo_search-6.1.7 pyreqwest-impersonate-0.4.8


In [None]:
agent = indox.AgenticRag(llm=openai_qa_indox,vector_database=db,top_k=5)
agent.run(query)

Relevant doc
Relevant doc
Relevant doc
Relevant doc
Relevant doc


"Cinderella reaches her happy ending by attending the royal festival with the help of magical elements such as a hazel tree, birds, and a golden slipper. Despite being mistreated by her stepmother and stepsisters, Cinderella's true identity is revealed with the assistance of the magical bird and two white doves. The false stepsisters' deception is exposed, and Cinderella fits perfectly into the golden slipper, proving she is the true bride sought by the prince. As a result, Cinderella marries the king's son and is able to live happily ever after."

In [None]:
query_2 = "where does messi plays right now?"

In [None]:
agent.run(query_2)

Not Relevant doc
Not Relevant doc
Not Relevant doc
Not Relevant doc
Not Relevant doc
No Relevant Context Found, Start Searching On Web...
Answer Base On Web Search
Check For Hallucination In Generated Answer Base On Web Search


'Lionel Messi currently plays for Inter Miami CF in Major League Soccer.'