Convert any PDF into clean, well-structured Markdown — powered by AI.
pdf2md extracts text from PDFs and uses AI to produce properly formatted Markdown with headings, tables, bullet lists, bold text, blockquotes, and more. No setup beyond a free API key.
Turns this PDF...
...into this Markdown:
The AI understands document structure. It produces:
- Proper heading hierarchy (
h1→h2→h3) - Markdown tables for tabular data and comparisons
✅/❌for good/bad items- Blockquotes for callouts and severity labels
- Bold for key terms and metrics
- Numbered lists for steps, bullet lists for items
One-line install:
curl -fsSL https://raw.githubusercontent.com/jschof1/pdf2md/main/install.sh | bashOr manually:
git clone https://github.com/jschof1/pdf2md.git
cd pdf2md
sudo cp pdf2md /usr/local/bin/Prerequisites:
python3(withpip)curl- A free Google Gemini API key
pdf2md uses Google Gemini (free tier). Get your key:
- Go to https://aistudio.google.com/apikey
- Create a key
- Set it in your shell:
export GEMINI_API_KEY="your-key-here"Add it to ~/.bashrc, ~/.zshrc, or your shell profile for persistence.
Cost: Gemini Flash is free for reasonable usage. A 14-page report costs less than $0.01.
# Convert a PDF (AI-formatted)
pdf2md report.pdf
# Custom output path
pdf2md report.pdf ./output/report.md
# Raw text extraction only (no API key needed)
pdf2md report.pdf --no-ai
# Smaller chunks for very dense documents
pdf2md report.pdf --chunk 3
# Use a different Gemini model
pdf2md report.pdf --model gemini-2.5-pro
# Suppress progress output
pdf2md report.pdf --quietpdf2md v1.0.0 — Convert PDF to Markdown (AI-enhanced)
Usage:
pdf2md <input.pdf> [output.md] [options]
Arguments:
input.pdf Path to PDF file
output.md Output path (default: same name, .md extension)
Options:
--no-ai Basic text extraction only (no AI formatting)
--model MODEL Gemini model to use (default: gemini-2.5-flash)
--chunk N Pages per AI chunk (default: 5)
-q, --quiet Suppress progress output
-v, --version Print version
-h, --help Show this help
Environment:
GEMINI_API_KEY Required for AI mode. Get one free:
https://aistudio.google.com/apikey
- Extract — Uses PyMuPDF to pull text from every page
- Chunk — Splits the document into page-based chunks (default: 5 pages) to stay within model output limits
- Format — Sends each chunk to Gemini with a detailed formatting prompt
- Join — Combines all chunks into a single Markdown file
Chunking means it handles documents of any length — 5 pages or 500.
| Scenario | pdf2md |
|---|---|
| Converting reports and audits to Markdown | ✅ |
| Making PDF content editable in a wiki or CMS | ✅ |
| Extracting structured data from PDFs | ✅ |
| Preparing PDF content for AI/RAG pipelines | ✅ |
| Converting scanned/image-based PDFs | ❌ (needs OCR first) |
| Dependency | Why | Installed how |
|---|---|---|
python3 |
Text extraction via PyMuPDF | System package |
pymupdf |
PDF parsing library | Auto-installed by pdf2md |
curl |
API calls to Gemini | Pre-installed on macOS/Linux |
GEMINI_API_KEY |
AI formatting | Free at Google AI Studio |
Contributions welcome! See CONTRIBUTING.md for guidelines.
Ideas for contributions:
- Support for OpenAI / Anthropic / local models
- OCR support for image-based PDFs
- Batch processing (convert a folder of PDFs)
- Config file support (
~/.pdf2mdrc) - Progress bar
MIT — use it however you like.
If pdf2md saved you time, consider giving it a star — it helps others find it.

