pdf-extractor skill

Extract text, tables, and form-field values from PDF files. Bordered tables, borderless (whitespace-aligned) tables, and fillable AcroForm fields are all supported.

Install

Claude Code (CLI)

Unzip into one of:

~/.claude/skills/ — available in every project
<repo>/.claude/skills/ — scoped to one project (commit it to share via git)

The archive already contains a top-level pdf-extractor/ folder, so extract it into the skills/ directory itself. If you don't have an unzip binary, use Python's built-in extractor (works anywhere Python is installed):

unzip pdf-extractor-skill.zip -d ~/.claude/skills/
# or, no unzip binary needed:
python3 -m zipfile -e pdf-extractor-skill.zip ~/.claude/skills/

Then install the Python dependencies:

uv pip install -r ~/.claude/skills/pdf-extractor/requirements.txt
# or with plain pip:
pip install -r ~/.claude/skills/pdf-extractor/requirements.txt

Invoke with /pdf-extractor, or just ask Claude to pull data out of a PDF and it will trigger automatically.

claude.ai

Upload the zip via Settings → Capabilities/Features → Skills. Note that claude.ai runs skills in a restricted sandbox, so the script's local Python execution may behave differently than in Claude Code.

Using the script directly

python scripts/extract.py <file.pdf> --mode text
python scripts/extract.py <file.pdf> --mode fields
python scripts/extract.py <file.pdf> --mode tables --table-strategy lines
python scripts/extract.py <file.pdf> --mode tables --table-strategy text \
    --bbox "x0,top,x1,bottom"
python scripts/extract.py <file.pdf> --mode tables \
    --bbox "x0,top,x1,bottom" --columns "x0,x1,...,xN"

Flag	Purpose
`--mode text\|fields\|tables`	What to extract (default: `text`)
`--table-strategy lines\|text`	`lines` for bordered tables (default), `text` for borderless
`--bbox "x0,top,x1,bottom"`	Crop to a region (PDF points, top-left origin) — recommended with `text`
`--columns "x0,x1,...,xN"`	Pin exact column edges; best paired with `--bbox`

Coordinates are in PDF points. Find them with pdfplumber's page.extract_words() (each word has x0/x1/top/bottom).

Example

Extract the bordered table from the bundled sample (run from the skill root):

python scripts/extract.py examples/sample.pdf --mode tables --table-strategy lines

[
  {"page": 1, "index": 0, "rows": [
    ["Product", "Qty", "Revenue"],
    ["Widget", "120", "$1,440"],
    ["Gadget", "75", "$1,125"],
    ["Gizmo", "40", "$800"]
  ]}
]

See examples/README.md for runnable commands and expected output covering every mode — plain text, borderless tables with pinned columns, and form fields.

pdf-extractor/  (v0.1.0)
├── SKILL.md              # skill definition (frontmatter + instructions)
├── README.md             # this file
├── LICENSE               # MIT license
├── requirements.txt      # Python dependencies
├── scripts/
│   └── extract.py        # the extractor CLI
├── references/
│   └── form-fields.md    # field-type & table-alignment reference
└── examples/
    ├── README.md         # runnable commands + expected output
    ├── sample.pdf        # text + bordered + borderless tables
    └── form.pdf          # two fillable form fields

License

Released under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pdf-extractor skill

Install

Claude Code (CLI)

claude.ai

Using the script directly

Example

Contents

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
examples		examples
references		references
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

pdf-extractor skill

Install

Claude Code (CLI)

claude.ai

Using the script directly

Example

Contents

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages