# Llama Index RAG Retrieval and Ingestion

This notebook presents a straightforward implementation of a Retrieval-Augmented Generation (RAG) system using the llama_index library. The system ingests documents, creates a vector index, and retrieves the most relevant content based on a query. In addition to the techniques above, the agent now integrates the Llama Index for even more advanced data retrieval and ingestion, enhancing its ability to work with complex datasets.

## System Introduction

In this system, documents are ingested from either text strings or a directory, indexed using llama_index, and later retrieved using a query. A language model is used in other parts of the overall system (not shown here) to generate queries or hypothetical documents. With Llama Index now added as a default tool, it is possible to customize the execution of ingestion and retrieval code by adding additional parameters provided by the Llama Index documentation.

## Underlying Concept

Traditional retrieval methods may require heavy customization when working with large document corpora. By leveraging llama_index, the system automatically handles document ingestion, vector indexing, and similarity search. This creates a robust, scalable retrieval solution that easily integrates with modern language models. Furthermore, integrating the Llama Index allows for advanced customization and improved handling of complex datasets.

## System Components

1. **Document Retrieval Module:** Loads a persisted llama_index vector store and retrieves documents relevant to a query. 
   - *Example Prompt for Retrieval:* "Find the latest market trends using the Llama Index."

2. **Text Ingestion Module:** Ingests a list of raw text strings into the vector store. It creates or appends to an existing index and persists the update.
   - *Example Prompt for Ingestion:* "Find the latest market trends from the web and save it in the database using the Llama Index."

3. **Corpus Ingestion Module:** Reads documents from a specified directory using the SimpleDirectoryReader, then creates or updates the llama_index vector store.
   - **Bulk Ingestion:** You can ingest a corpus by directly uploading files to the `/tools/rag/llama_index/corpus` folder. To start the batch processing, simply make an API call:
     ```bash
     curl -X POST http://localhost:5000/llama-index-ingest-corpus
     ```
     The script processes the information in the files, transferring it to the vector and graph database. By default, the SimpleDirectoryReader attempts to read any files it encounters (treating them as text), and explicitly supports file types such as:
     - .csv (Comma-Separated Values)
     - .docx (Microsoft Word)
     - .epub (EPUB eBook format)
     - .hwp (Hangul Word Processor)
     - .ipynb (Jupyter Notebook)
     - .jpeg, .jpg (JPEG image)
     - .mbox (MBOX email archive)
     - .md (Markdown)
     - .mp3, .mp4 (Audio and video)
     - .pdf (Portable Document Format)
     - .png (Portable Network Graphics)
     - .ppt, .pptm, .pptx (Microsoft PowerPoint)
     
     For more details, refer to the [SimpleDirectoryReader Documentation](https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader/).

## How It Works

### 1. Text Preprocessing and Vector Indexing

- The retrieval module loads the persisted index from disk using a specified directory.
- Ingestion modules process raw text or directory files and update the vector store accordingly.
- Llama Index parameters can be customized to suit specific retrieval or ingestion needs, providing flexibility for various data types.

### 2. Retrieval Mechanism

- A query is submitted to the retrieval module, which loads the index and searches for the most similar nodes/documents.
- The retrieved nodes are then returned as a list of text strings.
- *Example Prompt:* "Find the latest market trends using the Llama Index."

### 3. Corpus Ingestion

- The corpus ingestion module reads all documents from a directory, ingests them into the index, and persists the updated index.
- For bulk ingestion, files placed in the designated corpus folder are processed via an API call, allowing the system to update the vector and graph databases automatically.


## System Advantages

- **Automated Vector Management:** llama_index simplifies document ingestion and indexing.
- **Scalability:** Easily ingest and index new documents from text or entire corpora without significant re-engineering.
- **Seamless Integration:** Works alongside modern language models for end-to-end RAG pipelines.
- **Advanced Customization:** With Llama Index, users can adjust parameters for retrieval and ingestion to meet specific needs.

## Practical Benefits

- **Improved Retrieval Accuracy:** Efficient similarity search based on vector representations.
- **Flexibility:** Supports ingestion from both raw text and document directories, including bulk ingestion via API calls.
- **Easy Deployment:** By persisting the index to disk, the system can be restarted or updated incrementally.

With these RAG techniques, AutoCodeAgent 2.0 transforms the way you interact with data, making it easier than ever to store, retrieve, and analyze information. Whether you're working on simple tasks or tackling complex data challenges, these tools empower your workflow and unlock new possibilities.

## Implementation Insights

- The **retrieve.py** module leverages llama_index's `StorageContext` and `VectorIndexRetriever` to load and search the index.
- The **ingest.py** module creates or updates the index using a list of text strings and persists changes using the storage context.
- The **ingest_corpus.py** module uses the `SimpleDirectoryReader` to load files from a directory, then either creates a new index or appends to an existing one before persisting it.
- Bulk ingestion is supported via an API endpoint, enabling the processing of various file types as detailed above.

## Parameters

**LLAMA_INDEX_TOP_K_RAG_RETRIEVE:**
This environment variable controls the number of top documents retrieved during a query execution in the Llama Index RAG system. In the `retrieve_documents` function, it sets the `similarity_top_k` parameter for the `VectorIndexRetriever`. Adjusting this parameter allows you to balance between retrieving more documents for higher recall and fewer, more relevant documents for higher precision.

## Conclusion

The Llama Index RAG Retrieval and Ingestion system presents a comprehensive solution for managing and querying large document corpora. By automatically ingesting documents, creating robust vector indices, and enabling advanced similarity search, this system simplifies complex data retrieval tasks. With modular components for document ingestion, customized retrieval, and bulk processing via API endpoints, it offers both scalability and flexibility. Overall, this approach enhances data accessibility and streamlines the integration of modern language models into real-world applications, paving the way for more efficient and accurate information retrieval.