# DocumentManager Class Documentation

This documentation covers the usage of the `DocumentManager` class for handling multiple document types and performing RAG operations.

## Overview

The `DocumentManager` class implements a facade pattern to simplify loading and processing multiple document types (PDF, Word, Excel, PowerPoint) for use with LLMs.

## Installation



In [None]:
from utils.document_loaders import DocumentManager



## Class Methods

### `__init__(self)`

Initializes the document manager with default loaders and settings.



In [None]:
manager = DocumentManager()



### `load_directory(directory_path: Union[str, List[str]]) -> Tuple[List, str]`

Loads and processes documents from specified paths.

**Parameters:**
- `directory_path`: Either a glob pattern (e.g., "docs/*.pdf") or list of file paths

**Returns:**
- `Tuple[List, str]`: Documents and formatted context

**Example:**


In [None]:
# Using glob pattern
docs, context = manager.load_directory('rag_pdfs/*')

# Using file list
files = ['doc1.pdf', 'presentation.pptx']
docs, context = manager.load_directory(files)



### `chunk_documents(docs: Optional[List] = None, chunk_size: int = 1000, chunk_overlap: int = 100) -> List[str]`

Splits documents into manageable chunks.

**Parameters:**
- `docs`: Optional document list (uses self.docs if None)
- `chunk_size`: Characters per chunk
- `chunk_overlap`: Overlap between chunks

**Example:**


In [None]:
chunks = manager.chunk_documents(chunk_size=500)



### `create_vector_store(chunks: List, path: str) -> None`

Creates and saves a FAISS vector store.

**Parameters:**
- `chunks`: Document chunks
- `path`: Save location

**Example:**


In [None]:
manager.create_vector_store(chunks, 'rag_vectorstore')



### `retrieve(question: str, prompt: Optional[str] = None, path: Optional[str] = None) -> str`

Performs RAG-based question answering.

**Parameters:**
- `question`: Query text
- `prompt`: Custom prompt template (optional)
- `path`: Vector store path (optional)

**Example:**


In [None]:
# Basic usage
answer = manager.retrieve("What is a completion?")

# With custom path
answer = manager.retrieve(
    question="Summarize DevCon 2024",
    path="rag_vectorstore"
)



## Complete Usage Example



In [None]:
from utils.document_loaders import DocumentManager

# Initialize
manager = DocumentManager()

# Load documents
docs, context = manager.load_directory('rag_pdfs/*')

# Create chunks
chunks = manager.chunk_documents(chunk_size=500)

# Create vector store (one-time setup)
manager.create_vector_store(chunks, 'rag_vectorstore')

# Query documents
question = "What are the key takeaways from DevCon 2024?"
response = manager.retrieve(question)
print(response)



## Notes
- Vector store creation is a one-time operation unless documents change
- The class handles PDF, Word, Excel, and PowerPoint files
- Default embeddings and LLM can be customized using `load_embedding()` and `load_llm()`
- Context is automatically managed and accumulated across operations