docproc — Local-First Document & Image Processing Toolkit

A modular, privacy-first toolkit for everyday document and image tasks. Compress, merge, split, rotate, convert — all offline, no uploads, no subscriptions.

Now with AI chat assistant — describe what you want in plain English and the system builds + executes a processing plan automatically. Runs locally on your machine, opens in your default browser. Your files never leave your computer.

Web UI (GUI)

# Install with GUI support
pip install -e ".[gui]"

# Launch — opens in your browser automatically
docproc-gui

# Or with options
docproc-gui --port 8080        # custom port
docproc-gui --no-browser       # don't auto-open browser
python -m docproc.web.app      # alternative launch

The web UI provides drag-and-drop file upload, automatic parameter forms for every pipeline, real-time processing with progress feedback, and instant download. Your files never leave your computer — everything runs on localhost.

What this does

PDF operations:

Compress — shrink scanned PDFs to target sizes while maximizing quality
Merge — combine multiple PDFs into one document
Split — extract pages, split every N pages, or split each page separately
Rotate — rotate pages by 90°, 180°, or 270°
Pages — delete, keep, reorder, or reverse pages
Protect — add or remove password protection (AES-256)
Metadata — view, edit, or strip PDF metadata fields
Stamp — overlay text or image stamps on pages
Page numbers — add page numbers in multiple formats
PDF → Images — export pages as PNG or JPEG
Images → PDF — combine images into a single PDF

Image operations:

Resize — scale, fit, crop, or pad images to target dimensions
Convert — convert between PNG, JPEG, and WebP with quality control
Compress — reduce file size without changing dimensions (target size mode)
Crop — crop by pixel coordinates, percentage margins, or auto-detect content
EXIF — view or strip EXIF/GPS metadata from images
Background removal — AI model-based portrait segmentation (rembg)
Watermark removal — brightness-threshold detection + OpenCV inpainting

Infrastructure:

Recipe system — chain pipelines into named multi-step workflows (YAML/JSON)
RAG chat assistant — describe tasks in natural language, auto-generates execution plans
RAG document engine — upload any docs (PDF, DOCX, HTML, MD, CSV, images), get precise retrieval
GitHub device-flow OAuth for AI features (GitHub Models API)
Structured job logging for every run (JSONL)
Zero-boilerplate extensibility — new pipelines auto-register into CLI
248 automated tests with pytest

Quick start

# Install (editable, with all extras)
pip install -e ".[all,dev]"

# --- PDF operations ---
docproc compress report.pdf                          # compress to 1MB default
docproc compress report.pdf --target 2MB             # custom target size
docproc merge doc1.pdf doc2.pdf doc3.pdf -o combined.pdf
docproc split report.pdf --pages 1-5                 # extract pages 1-5
docproc split report.pdf --each                      # each page → own file
docproc split report.pdf --every 10                  # split every 10 pages
docproc rotate scan.pdf --angle 90                   # rotate all pages 90°
docproc rotate scan.pdf --angle 180 --pages 3,5      # rotate specific pages
docproc pages report.pdf --delete 1,5                # delete pages 1 and 5
docproc pages report.pdf --keep 2-4                  # keep only pages 2-4
docproc pages report.pdf --reverse                   # reverse page order
docproc pdf2img report.pdf                           # export as PNGs
docproc pdf2img report.pdf --format jpg --dpi 300    # high-res JPEGs
docproc img2pdf photo1.png photo2.jpg photo3.png     # combine into PDF

# --- New PDF operations ---
docproc protect doc.pdf --user-pass secret           # encrypt with password
docproc protect doc.pdf --unlock --user-pass secret  # remove protection
docproc metadata doc.pdf --view                      # show PDF metadata
docproc metadata doc.pdf --title "My Doc" --author "Me"
docproc stamp doc.pdf --text "DRAFT" --position center --opacity 0.3
docproc stamp doc.pdf --image logo.png --position bottom-right
docproc pagenums doc.pdf --format page-n-of-m        # "Page 1 of 10"

# --- Image operations ---
docproc resize photo.png --width 800
docproc resize photo.png --width 600 --height 600 --fit cover

# Convert formats
docproc convert photo.png --format webp --quality 85

# Compress images (keeps dimensions, reduces file size)
docproc imgcompress photo.jpg --quality 70
docproc imgcompress photo.jpg --target 500KB

# Crop images
docproc crop photo.png --box "100,50,900,700"        # pixel coordinates
docproc crop photo.png --margin 10                    # trim 10% from each edge
docproc crop photo.png --auto                         # auto-detect content bounds

# View/strip EXIF metadata
docproc exif photo.jpg --view                         # show all EXIF tags
docproc exif photo.jpg --strip                        # remove all metadata
docproc exif photo.jpg --gps                          # GPS coordinates only

# Remove watermarks
docproc watermark photo.png
docproc watermark photo.png --preview --corner all

# Remove background (requires .venv310)
docproc background portrait.jpg

# Run a recipe (chained pipelines)
docproc recipe web-optimize photo.png
docproc recipe passport-photo portrait.jpg
docproc recipe email-ready-pdf report.pdf
docproc recipe social-media-image photo.png

# Introspection
docproc pipelines          # list all pipelines + recipes
docproc recipes            # list recipes with step details
docproc history            # recent job log
docproc stats              # aggregate statistics

Architecture

docproc/                         # Python package v0.5.0
    __init__.py                  # Version string
    __main__.py                  # python -m docproc support
    cli.py                       # Auto-generated CLI from Param descriptors
    registry.py                  # @register decorator + get_pipeline()/list_pipelines()
    recipes.py                   # YAML/JSON multi-step workflow loader
    job_log.py                   # Structured JSONL job log
    exceptions.py                # Custom exception hierarchy
    utils.py                     # format_size, parse_size, parse_page_range, etc.

    pipelines/                   # 18 registered pipelines
        base.py                  # Pipeline ABC + PipelineResult + Param descriptor
        pdf_compress.py          # DPI/quality ladder compression
        pdf_merge.py             # Multi-input PDF merge
        pdf_split.py             # Split by pages/every-N/each
        pdf_to_images.py         # PDF → PNG/JPEG
        images_to_pdf.py         # Multiple images → single PDF
        pdf_rotate.py            # Rotate pages 90°/180°/270°
        pdf_pages.py             # Delete/keep/reorder/reverse pages
        pdf_protect.py           # PDF password encrypt/unlock
        pdf_metadata.py          # View/edit/strip PDF metadata
        pdf_stamp.py             # Text or image stamp overlay
        pdf_pagenums.py          # Add page numbers
        resize.py                # Scale/fit/crop/pad
        convert.py               # Format conversion (PNG/JPEG/WebP/BMP/TIFF/GIF)
        image_compress.py        # Quality reduction + target-size mode
        crop.py                  # Crop by coords/margins/auto-detect
        exif.py                  # View/strip EXIF metadata
        watermark.py             # Brightness detection + OpenCV inpainting
        background.py            # rembg AI segmentation

    rag/                         # RAG engine + chat system
        parsers.py               # Parse PDF, DOCX, HTML, MD, CSV, TXT, images (OCR)
        chunking.py              # Fixed / semantic / hybrid chunking strategies
        storage.py               # SQLite backend: datasets, documents, chunks, BM25 index
        engine.py                # Orchestrator: ingest, query, dataset CRUD
        eval.py                  # IR evaluation: Precision@K, Recall@K, MRR, nDCG
        retriever.py             # BM25 + TF-IDF hybrid retriever, RRF, MMR
        embeddings.py            # API-based embeddings with disk cache
        knowledge.py             # Auto-gen pipeline docs + practices + workflows
        llm.py                   # GitHub Models API client (stdlib urllib)
        planner.py               # Parse → validate → execute plans + quality retry
        chat.py                  # ChatEngine: retrieve → augment → LLM → execute
        context.py               # Query intent classification + context assembly
        entities.py              # Zero-LLM entity extraction + conflict detection
        auth.py                  # Token storage + GitHub device-flow OAuth

    web/                         # Flask SPA
        app.py                   # Desktop launcher (port scan, browser open)
        api.py                   # 28 REST endpoints
        static/
            index.html           # Three-mode SPA: Tools wizard + Chat + Knowledge
            style.css            # Responsive design system
            app.js               # Client-side state management

tests/                           # 248 passing tests (pytest)
    conftest.py                  # Temp-dir fixtures, sample generators
    test_registry.py             # Registration + discovery (6)
    test_pdf_pipelines.py        # PDF pipelines (17)
    test_image_pipelines.py      # Image pipelines (13)
    test_utils.py                # Utility functions (24)
    test_rag.py                  # RAG retrieval system (48)
    test_rag_engine.py           # RAG engine: parsers, chunking, storage, engine, eval (51)
    test_new_pipelines.py        # New pipelines + tech debt (46)
    test_context_engineering.py  # Context engineering: intents, entities, conflicts (63)

recipes/                         # 8 named workflow presets
docs/                            # Design docs and development notes
pyproject.toml                   # Package metadata + entry points

AI Chat Assistant

The Chat tab provides a conversational interface powered by RAG (Retrieval-Augmented Generation):

Sign in — Settings → "Sign in with GitHub" (device flow) or paste a PAT
Ask — "Compress my PDF to under 1MB" or "Convert these images to WebP"
Upload — Drag files onto the chat or click the attach button
Download — Processed files appear as inline download links

The system retrieves relevant pipeline docs via hybrid BM25 + TF-IDF search (with optional API embeddings), augments the LLM prompt, generates a JSON execution plan, validates it against the pipeline registry (checking names, params, choices, dependencies), and executes deterministically. If a size target isn't met, it binary-searches the quality parameter automatically.

RAG Document Engine

Upload any documents (PDF, DOCX, HTML, Markdown, CSV, plain text, or images with OCR) and query them with high-precision hybrid retrieval:

# Create a dataset and ingest documents via API
curl -X POST localhost:5111/api/rag/datasets -H 'Content-Type: application/json' -d '{"name": "my-docs"}'
curl -X POST localhost:5111/api/rag/datasets/<id>/ingest -F file=@guide.pdf -F file=@faq.md
curl -X POST localhost:5111/api/rag/datasets/<id>/query -H 'Content-Type: application/json' -d '{"query": "how to reset password"}'

Documents are parsed, chunked (semantic section-aware splitting), and indexed for BM25 + TF-IDF hybrid retrieval. All data persists in SQLite at ~/.docproc/rag/. Duplicate files are automatically detected by content hash.

User: "Compress this to under 500KB"
 → TF-IDF retrieves: compress pipeline docs + email compression tips
 → LLM generates: {"plan": [{"pipeline": "compress", "params": {"target": "500KB"}, "verify": {"max_size_bytes": 512000}}]}
 → Planner validates: pipeline exists ✓, params valid ✓, deps installed ✓
 → Executor runs: compress pipeline → checks size → binary-search retry if needed
 → User gets: download link + size metrics

Current pipelines

Pipeline	CLI command	Category	Use case
PDF compress	`docproc compress`	pdf	Scanned PDF compression via DPI/quality ladder
PDF merge	`docproc merge`	pdf	Combine multiple PDFs into one
PDF split	`docproc split`	pdf	Extract pages or split into chunks
PDF rotate	`docproc rotate`	pdf	Rotate pages by 90°/180°/270°
PDF pages	`docproc pages`	pdf	Delete, keep, reorder, reverse pages
PDF protect	`docproc protect`	pdf	Add/remove password protection
PDF metadata	`docproc metadata`	pdf	View, edit, or strip PDF metadata
PDF stamp	`docproc stamp`	pdf	Overlay text or image stamps
PDF page numbers	`docproc pagenums`	pdf	Add page numbers (arabic, roman)
PDF → Images	`docproc pdf2img`	pdf	Export pages as PNG or JPEG
Images → PDF	`docproc img2pdf`	pdf	Combine images into a PDF
Image resize	`docproc resize`	image	Scale, fit, crop, or pad images
Image convert	`docproc convert`	image	Convert between PNG, JPEG, WebP
Image compress	`docproc imgcompress`	image	Reduce file size (target size mode)
Image crop	`docproc crop`	image	Crop by coords, margins, or auto-detect
Image EXIF	`docproc exif`	image	View or strip EXIF metadata
Background removal	`docproc background`	image	AI portrait segmentation (Python 3.10)
Watermark removal	`docproc watermark`	image	Brightness detection + inpainting

Adding a new pipeline

Create docproc/pipelines/my_pipeline.py extending Pipeline with @register
Declare params = {...} using Param descriptors — the CLI auto-generates flags
Import in docproc/pipelines/__init__.py

That's it. No CLI code to write — the Param metadata drives --help, argparse, and recipe validation automatically.

How to request work

Input file name(s)
Desired output (format + quality + constraints like max file size)
Delivery preference (single final file vs multiple candidates)

Requirements

Python 3.10+ (3.10 required for background removal via rembg/ONNX)
See pyproject.toml for full dependency list

Quality standards

Prefer model-based segmentation over color-key heuristics
Preserve foreground colors (no jacket/skin recoloring)
Keep edges/hair natural, then composite onto pure white when requested
Produce deterministic output filenames and keep originals untouched
For PDFs: prefer color over grayscale, maximize DPI and JPEG quality within size budget

Operating principle

Root-cause first: when output quality fails, fix dependencies/runtime/model choice before tuning thresholds.

Documentation

docs/PRD.md — Product requirements document (vision, personas, feature roadmap)
docs/PIPELINE_ARCHITECTURE.md — System architecture and decision rules
docs/TOOL_MEMORY.md — Proven techniques, lessons learned, canonical commands
docs/DEVELOPMENT_LOG.md — Session log, SOTP audit, and handoff document

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

docproc — Local-First Document & Image Processing Toolkit

Web UI (GUI)

What this does

Quick start

Architecture

AI Chat Assistant

RAG Document Engine

Current pipelines

Adding a new pipeline

How to request work

Requirements

Quality standards

Operating principle

Documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
docproc		docproc
docs		docs
recipes		recipes
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

docproc — Local-First Document & Image Processing Toolkit

Web UI (GUI)

What this does

Quick start

Architecture

AI Chat Assistant

RAG Document Engine

Current pipelines

Adding a new pipeline

How to request work

Requirements

Quality standards

Operating principle

Documentation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages