DocuSageAI Agent

This repository contains a project that leverages modern AI techniques to automate and enhance workflows for processing PDF documents. It is designed to extract insights, summarize content, and assist in various document-driven tasks.

What is the DocuSageAI Agent?

The DocuSageAI Agent is a tool that helps users interact with PDF documents using artificial intelligence. It automates tasks such as extracting text, summarizing content, answering questions, and streamlining document-related workflows. Whether you're dealing with research papers, financial reports, or any other kind of PDF, this tool aims to provide valuable insights quickly.

Tech Stack & Modern AI Techniques

Programming Language: Python
Frameworks & Libraries:
- Streamlit (for creating interactive web interfaces)
- Deep learning libraries for natural language processing (e.g., Transformer-based models)
- PDF processing libraries
- LangChain for RAG pipeline
- Chroma for vector storage

This project utilizes modern AI techniques such as deep learning and transformer models to understand and process natural language from PDF documents. With these techniques, the tool is capable of automating complex workflows and providing intelligent responses to user queries.

Setup & Installation

Clone the Repository:

git clone <repository_url>
cd pdf_ai_agent

Setup Virtual Environment:

It is recommended to use a virtual environment. You can set one up using:
```
python3 -m venv .venv
source .venv/bin/activate
```
Install Dependencies:

Install the required Python packages:
```
pip install -r requirements.txt
```
Or, if you use the uv package manager:
```
uv install
```

Configuration

All configuration is handled via a YAML file (default: config.yaml). You can specify a different config file using the --config argument.

Key options:

chunk_size, chunk_overlap: Controls document chunking for embedding.
max_tokens, temperature: Model generation parameters.
persist_directory: Where vector DBs are stored.
model: Default model to use (openai, gemini, claude, ollama).
Model/embedding names: Set the specific model names for each provider.

Running the Project

Streamlit App

If a web-based interface is desired (using the Streamlit framework), you can run the app by executing:

streamlit run app.py

Command Line Interface (CLI)

The main script supports two commands: index and ask. Both require a config file (default: config.yaml).

Indexing Documents

To index a PDF, DOCX, TXT, or a folder of documents:

python main.py --config config.yaml index <path_to_file_or_folder> [--metadata <collection_name>]

<path_to_file_or_folder>: Path to the file or directory you want to index
--metadata <collection_name>: (Optional) Name for the indexed collection (used as a key for querying)

Example:

For a single PDF file

python main.py --config config.yaml index <file-name.pdf> --metadata deep_learning

For a folder containing multiple documents

python main.py --config config.yaml index <dir/> --metadata deep_learning

Asking Questions

To ask questions based on the indexed documents:

python main.py --config config.yaml ask <collection_name> "<your_question>"

<collection_name>: The metadata key used during indexing
<your_question>: The question you want to ask

Example:

python main.py --config config.yaml ask deep_learning "What are the main takeaways from this document?"

Notes:

Make sure to set the required API keys in your environment before running these commands (OPENAI_API_KEY, GOOGLE_API_KEY, ANTHROPIC_API_KEY as needed).
You can switch between supported models by changing the --model argument or editing the config file.
The vector database for each collection is stored in the directory specified by persist_directory in your config.

Applications

The DocuSageAI Agent can be applied in various scenarios including:

Research: Automatically summarize and extract key insights from academic papers and research documents.
Business: Analyze financial reports, legal documents, and other business-related PDFs to streamline workflows.
Automation: Integrate with other systems to automate routine document processing tasks, enhancing overall productivity.

RAG Implementation: Retrieval-Augmented Generation

The Retrieval-Augmented Generation (RAG) technique has been implemented in this project to enhance document processing workflows.

Why: It combines the strengths of retrieval and generation to ensure that AI responses are grounded in the actual document content, reducing inaccuracies.
How: The system first retrieves relevant information from PDFs using vector-based similarity search, then leverages a transformer-based language model to generate contextually accurate outputs.
What: The implementation includes modules for text extraction, vector embedding, similarity search, and integration with AI language models, thereby delivering reliable and context-aware responses.

Conclusion

This project exemplifies the integration of modern AI techniques with practical application in document processing. It is modular and extensible, making it a valuable starting point for further advancements in automated document analysis and intelligent workflow management.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
__pycache__		__pycache__
assets		assets
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md
api.py		api.py
app.py		app.py
main.py		main.py
main_old.py		main_old.py
multimodal_pdf_rag.py		multimodal_pdf_rag.py
pyproject.toml		pyproject.toml
testing.ipynb		testing.ipynb
testing.py		testing.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DocuSageAI Agent

What is the DocuSageAI Agent?

Tech Stack & Modern AI Techniques

Setup & Installation

Configuration

Running the Project

Streamlit App

Command Line Interface (CLI)

Indexing Documents

Asking Questions

Applications

RAG Implementation: Retrieval-Augmented Generation

Conclusion

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DocuSageAI Agent

What is the DocuSageAI Agent?

Tech Stack & Modern AI Techniques

Setup & Installation

Configuration

Running the Project

Streamlit App

Command Line Interface (CLI)

Indexing Documents

Asking Questions

Applications

RAG Implementation: Retrieval-Augmented Generation

Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages