Note: this repository does not build standalone yet. The following forward dependencies are not yet published:
mly-core→ https://github.com/andrewdyates/mly — cold-start: dpdf rewrites mly git deps to the public mly slot, but that repo has not been published yet
claims:
- type: entrypoint name: dpdf-cli code_path: crates/dpdf-client/src/main.rs
- type: package name: dpdf-core code_path: crates/dpdf-core/
- type: package name: dpdf-pipeline code_path: crates/dpdf-pipeline/
- type: package name: dpdf-server code_path: crates/dpdf-server/
- type: package name: dpdf-wasm code_path: crates/dpdf-wasm/
- type: directory name: docs-dir code_path: docs/
Pure Rust PDF extraction and document understanding for local, server, and browser workflows.
Author: Andrew Yates andrewyates.name@gmail.com Version: 0.1.0 License: Apache 2.0 Copyright: 2026 Andrew Yates
dpdf extracts structured text, tables, figures, thumbnails, and rendering output from PDF files.
The workspace includes:
dpdfCLI for inspection, extraction, rendering, thumbnails, benchmarking, and pipeline debugging.dpdf-serverfor HTTP extraction services.dpdf-wasmfor Tier 0 browser extraction with no server round-trip.- Tiered ML document-understanding pipelines built around local
mly-*crates.
The native extraction workspace (dpdf-core, dpdf-types, dpdf-pipeline, dpdf-client, and
dpdf-server) is implemented in Rust and avoids crates.io dependencies. ML tiers depend on local
path crates such as mly-core and mly-metal, and the optional browser wrapper
crates/dpdf-wasm uses wasm-bindgen.
dpdf is currently built from source:
cargo build --workspace
cargo test --workspace
cargo build -p dpdf-client --releaseThe CLI binary is written to target/release/dpdf.
If you build with CARGO_TARGET_DIR=target/user, the binary lands at target/user/release/dpdf.
# Inspect metadata
dpdf info paper.pdf
# Extract page 1 as Markdown
dpdf extract paper.pdf --pages 1
# Render page 1 to PNG
dpdf render paper.pdf --pages 1 -o ./output/
# Generate a first-page thumbnail
dpdf thumbnail paper.pdf
# Inspect pipeline routing and intermediate stages
dpdf pipeline route paper.pdfFor ML-backed extraction, point the CLI at a local model directory:
export DPDF_MODEL_DIR=~/dpdf-models
dpdf extract paper.pdf --tier 1crates/
├── dpdf-core/ PDF parsing, text extraction, rendering, and image codecs
├── dpdf-types/ Shared document data model and serializers
├── dpdf-pipeline/ Tier routing, ML orchestration, and evaluation harnesses
├── dpdf-client/ `dpdf` CLI
├── dpdf-server/ HTTP server surface
└── dpdf-wasm/ Browser-facing Tier 0 wrapper
Additional top-level directories:
docs/for user-facing documentation that ships with the public repo.tests/for CLI and integration coverage.
Apache License 2.0. See LICENSE.