A modular, privacy-first toolkit for everyday document and image tasks. Compress, merge, split, rotate, convert — all offline, no uploads, no subscriptions.
Now with AI chat assistant — describe what you want in plain English and the system builds + executes a processing plan automatically. Runs locally on your machine, opens in your default browser. Your files never leave your computer.
# Install with GUI support
pip install -e ".[gui]"
# Launch — opens in your browser automatically
docproc-gui
# Or with options
docproc-gui --port 8080 # custom port
docproc-gui --no-browser # don't auto-open browser
python -m docproc.web.app # alternative launchThe web UI provides drag-and-drop file upload, automatic parameter forms for every pipeline, real-time processing with progress feedback, and instant download. Your files never leave your computer — everything runs on localhost.
PDF operations:
- Compress — shrink scanned PDFs to target sizes while maximizing quality
- Merge — combine multiple PDFs into one document
- Split — extract pages, split every N pages, or split each page separately
- Rotate — rotate pages by 90°, 180°, or 270°
- Pages — delete, keep, reorder, or reverse pages
- Protect — add or remove password protection (AES-256)
- Metadata — view, edit, or strip PDF metadata fields
- Stamp — overlay text or image stamps on pages
- Page numbers — add page numbers in multiple formats
- PDF → Images — export pages as PNG or JPEG
- Images → PDF — combine images into a single PDF
Image operations:
- Resize — scale, fit, crop, or pad images to target dimensions
- Convert — convert between PNG, JPEG, and WebP with quality control
- Compress — reduce file size without changing dimensions (target size mode)
- Crop — crop by pixel coordinates, percentage margins, or auto-detect content
- EXIF — view or strip EXIF/GPS metadata from images
- Background removal — AI model-based portrait segmentation (rembg)
- Watermark removal — brightness-threshold detection + OpenCV inpainting
Infrastructure:
- Recipe system — chain pipelines into named multi-step workflows (YAML/JSON)
- RAG chat assistant — describe tasks in natural language, auto-generates execution plans
- RAG document engine — upload any docs (PDF, DOCX, HTML, MD, CSV, images), get precise retrieval
- GitHub device-flow OAuth for AI features (GitHub Models API)
- Structured job logging for every run (JSONL)
- Zero-boilerplate extensibility — new pipelines auto-register into CLI
- 248 automated tests with pytest
# Install (editable, with all extras)
pip install -e ".[all,dev]"
# --- PDF operations ---
docproc compress report.pdf # compress to 1MB default
docproc compress report.pdf --target 2MB # custom target size
docproc merge doc1.pdf doc2.pdf doc3.pdf -o combined.pdf
docproc split report.pdf --pages 1-5 # extract pages 1-5
docproc split report.pdf --each # each page → own file
docproc split report.pdf --every 10 # split every 10 pages
docproc rotate scan.pdf --angle 90 # rotate all pages 90°
docproc rotate scan.pdf --angle 180 --pages 3,5 # rotate specific pages
docproc pages report.pdf --delete 1,5 # delete pages 1 and 5
docproc pages report.pdf --keep 2-4 # keep only pages 2-4
docproc pages report.pdf --reverse # reverse page order
docproc pdf2img report.pdf # export as PNGs
docproc pdf2img report.pdf --format jpg --dpi 300 # high-res JPEGs
docproc img2pdf photo1.png photo2.jpg photo3.png # combine into PDF
# --- New PDF operations ---
docproc protect doc.pdf --user-pass secret # encrypt with password
docproc protect doc.pdf --unlock --user-pass secret # remove protection
docproc metadata doc.pdf --view # show PDF metadata
docproc metadata doc.pdf --title "My Doc" --author "Me"
docproc stamp doc.pdf --text "DRAFT" --position center --opacity 0.3
docproc stamp doc.pdf --image logo.png --position bottom-right
docproc pagenums doc.pdf --format page-n-of-m # "Page 1 of 10"
# --- Image operations ---
docproc resize photo.png --width 800
docproc resize photo.png --width 600 --height 600 --fit cover
# Convert formats
docproc convert photo.png --format webp --quality 85
# Compress images (keeps dimensions, reduces file size)
docproc imgcompress photo.jpg --quality 70
docproc imgcompress photo.jpg --target 500KB
# Crop images
docproc crop photo.png --box "100,50,900,700" # pixel coordinates
docproc crop photo.png --margin 10 # trim 10% from each edge
docproc crop photo.png --auto # auto-detect content bounds
# View/strip EXIF metadata
docproc exif photo.jpg --view # show all EXIF tags
docproc exif photo.jpg --strip # remove all metadata
docproc exif photo.jpg --gps # GPS coordinates only
# Remove watermarks
docproc watermark photo.png
docproc watermark photo.png --preview --corner all
# Remove background (requires .venv310)
docproc background portrait.jpg
# Run a recipe (chained pipelines)
docproc recipe web-optimize photo.png
docproc recipe passport-photo portrait.jpg
docproc recipe email-ready-pdf report.pdf
docproc recipe social-media-image photo.png
# Introspection
docproc pipelines # list all pipelines + recipes
docproc recipes # list recipes with step details
docproc history # recent job log
docproc stats # aggregate statisticsdocproc/ # Python package v0.5.0
__init__.py # Version string
__main__.py # python -m docproc support
cli.py # Auto-generated CLI from Param descriptors
registry.py # @register decorator + get_pipeline()/list_pipelines()
recipes.py # YAML/JSON multi-step workflow loader
job_log.py # Structured JSONL job log
exceptions.py # Custom exception hierarchy
utils.py # format_size, parse_size, parse_page_range, etc.
pipelines/ # 18 registered pipelines
base.py # Pipeline ABC + PipelineResult + Param descriptor
pdf_compress.py # DPI/quality ladder compression
pdf_merge.py # Multi-input PDF merge
pdf_split.py # Split by pages/every-N/each
pdf_to_images.py # PDF → PNG/JPEG
images_to_pdf.py # Multiple images → single PDF
pdf_rotate.py # Rotate pages 90°/180°/270°
pdf_pages.py # Delete/keep/reorder/reverse pages
pdf_protect.py # PDF password encrypt/unlock
pdf_metadata.py # View/edit/strip PDF metadata
pdf_stamp.py # Text or image stamp overlay
pdf_pagenums.py # Add page numbers
resize.py # Scale/fit/crop/pad
convert.py # Format conversion (PNG/JPEG/WebP/BMP/TIFF/GIF)
image_compress.py # Quality reduction + target-size mode
crop.py # Crop by coords/margins/auto-detect
exif.py # View/strip EXIF metadata
watermark.py # Brightness detection + OpenCV inpainting
background.py # rembg AI segmentation
rag/ # RAG engine + chat system
parsers.py # Parse PDF, DOCX, HTML, MD, CSV, TXT, images (OCR)
chunking.py # Fixed / semantic / hybrid chunking strategies
storage.py # SQLite backend: datasets, documents, chunks, BM25 index
engine.py # Orchestrator: ingest, query, dataset CRUD
eval.py # IR evaluation: Precision@K, Recall@K, MRR, nDCG
retriever.py # BM25 + TF-IDF hybrid retriever, RRF, MMR
embeddings.py # API-based embeddings with disk cache
knowledge.py # Auto-gen pipeline docs + practices + workflows
llm.py # GitHub Models API client (stdlib urllib)
planner.py # Parse → validate → execute plans + quality retry
chat.py # ChatEngine: retrieve → augment → LLM → execute
context.py # Query intent classification + context assembly
entities.py # Zero-LLM entity extraction + conflict detection
auth.py # Token storage + GitHub device-flow OAuth
web/ # Flask SPA
app.py # Desktop launcher (port scan, browser open)
api.py # 28 REST endpoints
static/
index.html # Three-mode SPA: Tools wizard + Chat + Knowledge
style.css # Responsive design system
app.js # Client-side state management
tests/ # 248 passing tests (pytest)
conftest.py # Temp-dir fixtures, sample generators
test_registry.py # Registration + discovery (6)
test_pdf_pipelines.py # PDF pipelines (17)
test_image_pipelines.py # Image pipelines (13)
test_utils.py # Utility functions (24)
test_rag.py # RAG retrieval system (48)
test_rag_engine.py # RAG engine: parsers, chunking, storage, engine, eval (51)
test_new_pipelines.py # New pipelines + tech debt (46)
test_context_engineering.py # Context engineering: intents, entities, conflicts (63)
recipes/ # 8 named workflow presets
docs/ # Design docs and development notes
pyproject.toml # Package metadata + entry points
The Chat tab provides a conversational interface powered by RAG (Retrieval-Augmented Generation):
- Sign in — Settings → "Sign in with GitHub" (device flow) or paste a PAT
- Ask — "Compress my PDF to under 1MB" or "Convert these images to WebP"
- Upload — Drag files onto the chat or click the attach button
- Download — Processed files appear as inline download links
The system retrieves relevant pipeline docs via hybrid BM25 + TF-IDF search (with optional API embeddings), augments the LLM prompt, generates a JSON execution plan, validates it against the pipeline registry (checking names, params, choices, dependencies), and executes deterministically. If a size target isn't met, it binary-searches the quality parameter automatically.
Upload any documents (PDF, DOCX, HTML, Markdown, CSV, plain text, or images with OCR) and query them with high-precision hybrid retrieval:
# Create a dataset and ingest documents via API
curl -X POST localhost:5111/api/rag/datasets -H 'Content-Type: application/json' -d '{"name": "my-docs"}'
curl -X POST localhost:5111/api/rag/datasets/<id>/ingest -F file=@guide.pdf -F file=@faq.md
curl -X POST localhost:5111/api/rag/datasets/<id>/query -H 'Content-Type: application/json' -d '{"query": "how to reset password"}'Documents are parsed, chunked (semantic section-aware splitting), and indexed for BM25 + TF-IDF hybrid retrieval. All data persists in SQLite at ~/.docproc/rag/. Duplicate files are automatically detected by content hash.
User: "Compress this to under 500KB"
→ TF-IDF retrieves: compress pipeline docs + email compression tips
→ LLM generates: {"plan": [{"pipeline": "compress", "params": {"target": "500KB"}, "verify": {"max_size_bytes": 512000}}]}
→ Planner validates: pipeline exists ✓, params valid ✓, deps installed ✓
→ Executor runs: compress pipeline → checks size → binary-search retry if needed
→ User gets: download link + size metrics
| Pipeline | CLI command | Category | Use case |
|---|---|---|---|
| PDF compress | docproc compress |
Scanned PDF compression via DPI/quality ladder | |
| PDF merge | docproc merge |
Combine multiple PDFs into one | |
| PDF split | docproc split |
Extract pages or split into chunks | |
| PDF rotate | docproc rotate |
Rotate pages by 90°/180°/270° | |
| PDF pages | docproc pages |
Delete, keep, reorder, reverse pages | |
| PDF protect | docproc protect |
Add/remove password protection | |
| PDF metadata | docproc metadata |
View, edit, or strip PDF metadata | |
| PDF stamp | docproc stamp |
Overlay text or image stamps | |
| PDF page numbers | docproc pagenums |
Add page numbers (arabic, roman) | |
| PDF → Images | docproc pdf2img |
Export pages as PNG or JPEG | |
| Images → PDF | docproc img2pdf |
Combine images into a PDF | |
| Image resize | docproc resize |
image | Scale, fit, crop, or pad images |
| Image convert | docproc convert |
image | Convert between PNG, JPEG, WebP |
| Image compress | docproc imgcompress |
image | Reduce file size (target size mode) |
| Image crop | docproc crop |
image | Crop by coords, margins, or auto-detect |
| Image EXIF | docproc exif |
image | View or strip EXIF metadata |
| Background removal | docproc background |
image | AI portrait segmentation (Python 3.10) |
| Watermark removal | docproc watermark |
image | Brightness detection + inpainting |
- Create
docproc/pipelines/my_pipeline.pyextendingPipelinewith@register - Declare
params = {...}usingParamdescriptors — the CLI auto-generates flags - Import in
docproc/pipelines/__init__.py
That's it. No CLI code to write — the Param metadata drives --help, argparse, and recipe validation automatically.
- Input file name(s)
- Desired output (format + quality + constraints like max file size)
- Delivery preference (single final file vs multiple candidates)
- Python 3.10+ (3.10 required for background removal via rembg/ONNX)
- See
pyproject.tomlfor full dependency list
- Prefer model-based segmentation over color-key heuristics
- Preserve foreground colors (no jacket/skin recoloring)
- Keep edges/hair natural, then composite onto pure white when requested
- Produce deterministic output filenames and keep originals untouched
- For PDFs: prefer color over grayscale, maximize DPI and JPEG quality within size budget
Root-cause first: when output quality fails, fix dependencies/runtime/model choice before tuning thresholds.
- docs/PRD.md — Product requirements document (vision, personas, feature roadmap)
- docs/PIPELINE_ARCHITECTURE.md — System architecture and decision rules
- docs/TOOL_MEMORY.md — Proven techniques, lessons learned, canonical commands
- docs/DEVELOPMENT_LOG.md — Session log, SOTP audit, and handoff document
MIT