dpdf

Note: this repository does not build standalone yet. The following forward dependencies are not yet published:

mly-core → https://github.com/andrewdyates/mly — cold-start: dpdf rewrites mly git deps to the public mly slot, but that repo has not been published yet

claims:

type: entrypoint name: dpdf-cli code_path: crates/dpdf-client/src/main.rs
type: package name: dpdf-core code_path: crates/dpdf-core/
type: package name: dpdf-pipeline code_path: crates/dpdf-pipeline/
type: package name: dpdf-server code_path: crates/dpdf-server/
type: package name: dpdf-wasm code_path: crates/dpdf-wasm/
type: directory name: docs-dir code_path: docs/

dpdf

Pure Rust PDF extraction and document understanding for local, server, and browser workflows.

Author: Andrew Yates andrewyates.name@gmail.com Version: 0.1.0 License: Apache 2.0 Copyright: 2026 Andrew Yates

What Is dpdf?

dpdf extracts structured text, tables, figures, thumbnails, and rendering output from PDF files. The workspace includes:

dpdf CLI for inspection, extraction, rendering, thumbnails, benchmarking, and pipeline debugging.
dpdf-server for HTTP extraction services.
dpdf-wasm for Tier 0 browser extraction with no server round-trip.
Tiered ML document-understanding pipelines built around local mly-* crates.

Dependency Model

The native extraction workspace (dpdf-core, dpdf-types, dpdf-pipeline, dpdf-client, and dpdf-server) is implemented in Rust and avoids crates.io dependencies. ML tiers depend on local path crates such as mly-core and mly-metal, and the optional browser wrapper crates/dpdf-wasm uses wasm-bindgen.

Installation

dpdf is currently built from source:

cargo build --workspace
cargo test --workspace
cargo build -p dpdf-client --release

The CLI binary is written to target/release/dpdf. If you build with CARGO_TARGET_DIR=target/user, the binary lands at target/user/release/dpdf.

Usage

# Inspect metadata
dpdf info paper.pdf

# Extract page 1 as Markdown
dpdf extract paper.pdf --pages 1

# Render page 1 to PNG
dpdf render paper.pdf --pages 1 -o ./output/

# Generate a first-page thumbnail
dpdf thumbnail paper.pdf

# Inspect pipeline routing and intermediate stages
dpdf pipeline route paper.pdf

For ML-backed extraction, point the CLI at a local model directory:

export DPDF_MODEL_DIR=~/dpdf-models
dpdf extract paper.pdf --tier 1

Reference Docs

Workspace Layout

crates/
├── dpdf-core/       PDF parsing, text extraction, rendering, and image codecs
├── dpdf-types/      Shared document data model and serializers
├── dpdf-pipeline/   Tier routing, ML orchestration, and evaluation harnesses
├── dpdf-client/     `dpdf` CLI
├── dpdf-server/     HTTP server surface
└── dpdf-wasm/       Browser-facing Tier 0 wrapper

Additional top-level directories:

docs/ for user-facing documentation that ships with the public repo.
tests/ for CLI and integration coverage.

License

Apache License 2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
benchmarks		benchmarks
crates		crates
docs		docs
tests		tests
tools		tools
weights		weights
.claude_read_timestamps		.claude_read_timestamps
.commit_msg_w3		.commit_msg_w3
.editorconfig		.editorconfig
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
INTEGRITY.json		INTEGRITY.json
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
rustfmt.toml		rustfmt.toml
typos.toml		typos.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dpdf

What Is dpdf?

Dependency Model

Installation

Usage

Reference Docs

Workspace Layout

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

dpdf

What Is dpdf?

Dependency Model

Installation

Usage

Reference Docs

Workspace Layout

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages