## Indox Retrieval Augmentation
Here, we will explore how to work with Indox Retrieval Augmentation. We are using Mistral, we should set our INDOX_OPENAI_API_KEY as an environment variable.

In [6]:
import os
from dotenv import load_dotenv

load_dotenv()
INDOX_OPENAI_API_KEY= os.getenv("INDOX_OPENAI_API_KEY")

### Creating an instance of IndoxTetrivalAugmentation

To effectively utilize the Indox Retrieval Augmentation capabilities, you must first create an instance of the IndoxRetrievalAugmentation class. This instance will allow you to access the methods and properties defined within the class, enabling the augmentation and retrieval functionalities.

In [2]:
from indox import IndoxRetrievalAugmentation
indox = IndoxRetrievalAugmentation()

### Generating response using Indox
MistralQA class is used to handle question-answering task using Indox model. This instance creates IndoxOpenAIEmbedding class to specifying embedding model.By using UnstructuredLoadAndSplit function we can import various file types and split them into chunks.

In [7]:
from indox.llms import IndoxApiOpenAiQa
from indox.embeddings import IndoxOpenAIEmbedding
from indox.data_loader_splitter import UnstructuredLoadAndSplit
openai_qa_indox = IndoxApiOpenAiQa(api_key=INDOX_OPENAI_API_KEY)
embed_openai_indox = IndoxOpenAIEmbedding(api_key=INDOX_OPENAI_API_KEY,model="text-embedding-3-small")

file_path = "sample.txt"
loader_splitter = UnstructuredLoadAndSplit(file_path=file_path,max_chunk_size=400)
docs = loader_splitter.load_and_chunk()


 Here ChromaVectorStore handles the storage and retrieval of vector embeddings by specifying a collection name and sets up a vector store where text embeddings can be stored and queried.

In [4]:
from indox.vector_stores import ChromaVectorStore
db = ChromaVectorStore(collection_name="sample",embedding=embed_openai_indox)
indox.connect_to_vectorstore(vectorstore_database=db)

<indox.vector_stores.Chroma.ChromaVectorStore at 0x1f3e82ac3b0>

### load and preprocess data
This part of code demonstrates how to load and preprocess text data from a file, split it into chunks, and store these chunks in the vector store that was set up previously.

In [5]:
from indox.data_loader_splitter import UnstructuredLoadAndSplit
loader_splitter = UnstructuredLoadAndSplit(file_path=file_path,max_chunk_size=400)
docs = loader_splitter.load_and_chunk()

In [6]:
docs

[Document(page_content="The wife of a rich man fell sick, and as she felt that her end\n\nwas drawing near, she called her only daughter to her bedside and\n\nsaid, dear child, be good and pious, and then the\n\ngood God will always protect you, and I will look down on you\n\nfrom heaven and be near you. Thereupon she closed her eyes and\n\ndeparted. Every day the maiden went out to her mother's grave,", metadata={'filename': 'sample.txt', 'filetype': 'text/plain', 'last_modified': '2024-05-30T13:53:09'}),
 Document(page_content='and wept, and she remained pious and good. When winter came\n\nthe snow spread a white sheet over the grave, and by the time the\n\nspring sun had drawn it off again, the man had taken another wife.\n\nThe woman had brought with her into the house two daughters,\n\nwho were beautiful and fair of face, but vile and black of heart.\n\nNow began a bad time for the poor step-child. Is the stupid goose', metadata={'filename': 'sample.txt', 'filetype': 'text/plain',

In [7]:
indox.store_in_vectorstore(docs=docs)

<indox.vector_stores.Chroma.ChromaVectorStore at 0x1f3e82ac3b0>

### Retrieve relevant information and generate an answer
The main purpose of these lines is to perform a query on the vector store to retrieve the most relevant information (top_k=5) and generate an answer using the language model.

In [8]:
query = "How cinderella reach her happy ending?"
retriever = indox.QuestionAnswer(vector_database=db,llm=openai_qa_indox,top_k=5)

invoke(query) method sends the query to the retriever, which searches the vector store for relevant text chunks and uses the language model to generate a response based on the retrieved information.
Context property retrieves the context or the detailed information that the retriever used to generate the answer to the query. It provides insight into how the query was answered by showing the relevant text chunks and any additional information used.

In [9]:
retriever.invoke(query)
retriever.context

["which they had wished for, and to cinderella he gave the branch\n\nfrom the hazel-bush. Cinderella thanked him, went to her mother's\n\ngrave and planted the branch on it, and wept so much that the tears\n\nfell down on it and watered it. And it grew and became a handsome\n\ntree. Thrice a day cinderella went and sat beneath it, and wept and\n\nprayed, and a little white bird always came on the tree, and if",
 'by the hearth in the cinders. And as on that account she always\n\nlooked dusty and dirty, they called her cinderella.\n\nIt happened that the father was once going to the fair, and he\n\nasked his two step-daughters what he should bring back for them.\n\nBeautiful dresses, said one, pearls and jewels, said the second.\n\nAnd you, cinderella, said he, what will you have. Father',
 "to appear among the number, they were delighted, called cinderella\n\nand said, comb our hair for us, brush our shoes and fasten our\n\nbuckles, for we are going to the wedding at the king's palace.

### With AgenticRag

AgenticRag stands for Agentic Retrieval-Augmented Generation. This concept combines retrieval-based methods and generation-based methods in natural language processing (NLP). The key idea is to enhance the generative capabilities of a language model by incorporating relevant information retrieved from a database or a vector store. 
 AgenticRag is designed to provide more contextually rich and accurate responses by utilizing external knowledge sources. It retrieves relevant pieces of information (chunks) from a vector store based on a query and then uses a language model to generate a comprehensive response that incorporates this retrieved information.

In [10]:
agent = indox.AgenticRag(llm=openai_qa_indox,vector_database=db,top_k=5)
agent.run(query)

Relevant doc
Not Relevant doc
Not Relevant doc
Not Relevant doc
Not Relevant doc


'Cinderella reached her happy ending by remaining kind and patient despite her hardships, by receiving help from the magical hazel tree and the little white bird, and by ultimately being recognized and rewarded for her goodness and inner beauty. Through her perseverance and faith, Cinderella was able to attend the royal wedding and find her true happiness.'

In [3]:
from indox.agents import simple_agent

In [4]:
from indox.agents.tools import wiki

In [13]:
agent = simple_agent.IndoxAgent(llm=openai_qa_indox,tools=[wiki.WikipediaTool()])

In [15]:
agent.run("who wrote cinderella story?")

ValueError: Output of LLM is not parsable for next tool use: `The Cinderella story has been written by various authors throughout history. The first literary European version was published by Giambattista Basile in Italy in 1634. Charles Perrault also wrote a version of Cinderella in French in 1697, and the Brothers Grimm published their own version in 1812. These are some of the most well-known authors who have contributed to the Cinderella story.`