Extract text, tables, and form-field values from PDF files. Bordered tables, borderless (whitespace-aligned) tables, and fillable AcroForm fields are all supported.
Unzip into one of:
~/.claude/skills/— available in every project<repo>/.claude/skills/— scoped to one project (commit it to share via git)
The archive already contains a top-level pdf-extractor/ folder, so extract it
into the skills/ directory itself. If you don't have an unzip binary, use
Python's built-in extractor (works anywhere Python is installed):
unzip pdf-extractor-skill.zip -d ~/.claude/skills/
# or, no unzip binary needed:
python3 -m zipfile -e pdf-extractor-skill.zip ~/.claude/skills/Then install the Python dependencies:
uv pip install -r ~/.claude/skills/pdf-extractor/requirements.txt
# or with plain pip:
pip install -r ~/.claude/skills/pdf-extractor/requirements.txtInvoke with /pdf-extractor, or just ask Claude to pull data out of a PDF and
it will trigger automatically.
Upload the zip via Settings → Capabilities/Features → Skills. Note that claude.ai runs skills in a restricted sandbox, so the script's local Python execution may behave differently than in Claude Code.
python scripts/extract.py <file.pdf> --mode text
python scripts/extract.py <file.pdf> --mode fields
python scripts/extract.py <file.pdf> --mode tables --table-strategy lines
python scripts/extract.py <file.pdf> --mode tables --table-strategy text \
--bbox "x0,top,x1,bottom"
python scripts/extract.py <file.pdf> --mode tables \
--bbox "x0,top,x1,bottom" --columns "x0,x1,...,xN"| Flag | Purpose |
|---|---|
--mode text|fields|tables |
What to extract (default: text) |
--table-strategy lines|text |
lines for bordered tables (default), text for borderless |
--bbox "x0,top,x1,bottom" |
Crop to a region (PDF points, top-left origin) — recommended with text |
--columns "x0,x1,...,xN" |
Pin exact column edges; best paired with --bbox |
Coordinates are in PDF points. Find them with pdfplumber's
page.extract_words() (each word has x0/x1/top/bottom).
Extract the bordered table from the bundled sample (run from the skill root):
python scripts/extract.py examples/sample.pdf --mode tables --table-strategy lines[
{"page": 1, "index": 0, "rows": [
["Product", "Qty", "Revenue"],
["Widget", "120", "$1,440"],
["Gadget", "75", "$1,125"],
["Gizmo", "40", "$800"]
]}
]See examples/README.md for runnable commands and
expected output covering every mode — plain text, borderless tables with pinned
columns, and form fields.
pdf-extractor/ (v0.1.0)
├── SKILL.md # skill definition (frontmatter + instructions)
├── README.md # this file
├── LICENSE # MIT license
├── requirements.txt # Python dependencies
├── scripts/
│ └── extract.py # the extractor CLI
├── references/
│ └── form-fields.md # field-type & table-alignment reference
└── examples/
├── README.md # runnable commands + expected output
├── sample.pdf # text + bordered + borderless tables
└── form.pdf # two fillable form fields
Released under the MIT License.