pdf2md

Convert any PDF into clean, well-structured Markdown — powered by AI.

pdf2md extracts text from PDFs and uses AI to produce properly formatted Markdown with headings, tables, bullet lists, bold text, blockquotes, and more. No setup beyond a free API key.

✨ What it does

Turns this PDF...

...into this Markdown:

The AI understands document structure. It produces:

Proper heading hierarchy (h1 → h2 → h3)
Markdown tables for tabular data and comparisons
✅ / ❌ for good/bad items
Blockquotes for callouts and severity labels
Bold for key terms and metrics
Numbered lists for steps, bullet lists for items

🚀 Install

One-line install:

curl -fsSL https://raw.githubusercontent.com/jschof1/pdf2md/main/install.sh | bash

Or manually:

git clone https://github.com/jschof1/pdf2md.git
cd pdf2md
sudo cp pdf2md /usr/local/bin/

Prerequisites:

python3 (with pip)
curl
A free Google Gemini API key

🔑 API Key

pdf2md uses Google Gemini (free tier). Get your key:

Go to https://aistudio.google.com/apikey
Create a key
Set it in your shell:

export GEMINI_API_KEY="your-key-here"

Add it to ~/.bashrc, ~/.zshrc, or your shell profile for persistence.

Cost: Gemini Flash is free for reasonable usage. A 14-page report costs less than $0.01.

📖 Usage

# Convert a PDF (AI-formatted)
pdf2md report.pdf

# Custom output path
pdf2md report.pdf ./output/report.md

# Raw text extraction only (no API key needed)
pdf2md report.pdf --no-ai

# Smaller chunks for very dense documents
pdf2md report.pdf --chunk 3

# Use a different Gemini model
pdf2md report.pdf --model gemini-2.5-pro

# Suppress progress output
pdf2md report.pdf --quiet

All Options

pdf2md v1.0.0 — Convert PDF to Markdown (AI-enhanced)

Usage:
  pdf2md <input.pdf> [output.md] [options]

Arguments:
  input.pdf   Path to PDF file
  output.md   Output path (default: same name, .md extension)

Options:
  --no-ai       Basic text extraction only (no AI formatting)
  --model MODEL Gemini model to use (default: gemini-2.5-flash)
  --chunk N     Pages per AI chunk (default: 5)
  -q, --quiet   Suppress progress output
  -v, --version Print version
  -h, --help    Show this help

Environment:
  GEMINI_API_KEY  Required for AI mode. Get one free:
                  https://aistudio.google.com/apikey

⚙️ How it works

Extract — Uses PyMuPDF to pull text from every page
Chunk — Splits the document into page-based chunks (default: 5 pages) to stay within model output limits
Format — Sends each chunk to Gemini with a detailed formatting prompt
Join — Combines all chunks into a single Markdown file

Chunking means it handles documents of any length — 5 pages or 500.

📊 When to use pdf2md

Scenario	pdf2md
Converting reports and audits to Markdown	✅
Making PDF content editable in a wiki or CMS	✅
Extracting structured data from PDFs	✅
Preparing PDF content for AI/RAG pipelines	✅
Converting scanned/image-based PDFs	❌ (needs OCR first)

🔧 Dependencies

Dependency	Why	Installed how
`python3`	Text extraction via PyMuPDF	System package
`pymupdf`	PDF parsing library	Auto-installed by pdf2md
`curl`	API calls to Gemini	Pre-installed on macOS/Linux
`GEMINI_API_KEY`	AI formatting	Free at Google AI Studio

🤝 Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

Ideas for contributions:

Support for OpenAI / Anthropic / local models
OCR support for image-based PDFs
Batch processing (convert a folder of PDFs)
Config file support (~/.pdf2mdrc)
Progress bar

📝 License

MIT — use it however you like.

⭐ Star History

If pdf2md saved you time, consider giving it a star — it helps others find it.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DEPENDENCIES.md		DEPENDENCIES.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
install.sh		install.sh
pdf2md		pdf2md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pdf2md

✨ What it does

🚀 Install

🔑 API Key

📖 Usage

All Options

⚙️ How it works

📊 When to use pdf2md

🔧 Dependencies

🤝 Contributing

📝 License

⭐ Star History

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pdf2md

✨ What it does

🚀 Install

🔑 API Key

📖 Usage

All Options

⚙️ How it works

📊 When to use pdf2md

🔧 Dependencies

🤝 Contributing

📝 License

⭐ Star History

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages