Skip to content

qntx/p2m

P2M

Fast, pure-Rust PDF to Markdown converter. No ML, no OCR, no external dependencies.

Features

  • Text extraction — CMap/ToUnicode decoding, CID fonts, TrueType fallback, ligature expansion
  • Layout analysis — multi-column detection (newspaper & tabular), reading-order reconstruction
  • Table detection — rect-based (union-find clustering) and line-based (H/V grid intersection)
  • Markdown generation — headings, lists, code blocks, bold/italic, hyperlinks, page breaks
  • Tagged PDF support — structure-tree roles (H1–H6, P, L, Code, BlockQuote, Table)

Installation

cargo install p2m-cli

Or build from source:

git clone https://github.com/qntx-labs/p2m
cd p2m
cargo build --release

CLI Usage

p2m document.pdf                     # convert to stdout
p2m document.pdf -o output.md        # write to file
p2m document.pdf --pages 1,3,5       # specific pages only
p2m document.pdf --page-breaks       # insert page markers
p2m document.pdf --raw               # no metadata on stderr

Library Usage

// Simple conversion
let doc = p2m::convert("document.pdf")?;
println!("{}", doc.markdown);

// With options
let opts = p2m::Options::new().pages([1, 2, 3]);
let doc = p2m::convert_with("document.pdf", &opts)?;

// From bytes
let bytes = std::fs::read("document.pdf")?;
let doc = p2m::convert_bytes(&bytes)?;

License

Licensed under either of:

at your option.

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this project shall be dual-licensed as above, without any additional terms or conditions.


A QNTX open-source project.

QNTX

Code is law. We write both.

About

A blazing-fast PDF to Markdown converter, written in Rust.

Topics

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Contributors

Languages