pdftl ("PDF tackle") is a CLI tool for PDF manipulation written in Python. It is intended to be a command-line compatible extension of the venerable pdftk.
Leveraging the power of pikepdf (qpdf) and other modern libraries, it offers advanced capabilities like cropping, chopping, regex text replacement, adding text and arbitrary content stream injection.
pipx install pdftl[full]
# merge, crop to letter paper, rotate last page and output with encryption with one command
pdftl A=a.pdf B=b.pdf cat A1-5 B2-end \
--- crop '4-8,12(letter)' \
--- rotate endright \
output out.pdf owner_pw foo user_pw bar encrypt_aes256- Familiar syntax: Command-line compatible with
pdftk. Verified against Mike Haertl's php-pdftk test suite and the pdftk-java test suite logic, sos/pdftk/pdftl/should result in working scripts. - Pipelining: Chain multiple operations in a single command using
---. - Performant:
pdftlseems faster thanpdftk-javafor many operations (based on informal benchmarks). Reason:pdftlmostly drivespikepdfwhich drivesqpdf, a fast C++ library. - Extra/enhanced operations and features such as zooming pages, smart merging preserving links and outlines, cropping/chopping up pages, text extraction, optimizing images.
- Modern security: Supports AES-256 encryption and modern permission flags out of the box.
- Content editing: Find & replace text via regular expressions, inject raw PDF operators, or overlay dynamic text.
pdftl maintains command-line compatibility with pdftk while introducing features required for modern PDF workflows.
| Feature | pdftk (Legacy) |
pdftl (Modern) |
|---|---|---|
| Pipelining | โ (Requires temp files) | โ
Native (Chain ops with ---) |
| Encryption | โ AES-256 Support | |
| Syntax | Standard | โ Compatible Extension |
| Page Geometry | โ | โ Crop to fit, Zoom, & Chop |
| Pipelined Logic | โ | โ Rotate + Stamp in one command |
| Plugins | โ | โ Custom operations/mutation scripts written in Python |
| Installation | Often complex binary | โ
Simple pipx install pdftl |
| Performance | Variable | โ Powered by pikepdf/qpdf |
| Link Integrity | โ Preserves internal cross-refs | |
| Shell Completion | โ | โ bash, zsh and powershell |
| Help | โ
Self-documenting: pdftl help <operation/option/topic/tag> |
Install pipx, and then:
pipx install pdftl[full]A simple pip install pdftl[full] install is also supported.
Note: The [full] install includes ocrmypdf for image optimization, reportlab for text generation, pypdfium2 for text extraction and robust flattening, and pyHanko for cryptographic signature functionality. Omit [full] to omit those features and dependencies.
- Combine:
cat,shuffle(interleave pages from multiple docs). - Split:
burst(split into single pages),deletepages. - Metadata:
dump_data,update_info,attach_files,unpack_files. - Watermarking:
stamp/background(single page),multistamp/multibackground.
- Rotate:
rotatepages (absolute or relative). - Crop:
cropto margins or standard paper sizes (e.g., "A4"). - Chop:
choppages into grids or rows (e.g., split a scanned spread into two pages). - Shift, scale and spin page content inside the page boundaries using
place.
- Forms:
fill_form,generate_fdf,dump_data_fields. - Annotations:
modify_annots(surgical edits to link properties, colors, borders),delete_annots,dump_annots.
- Decryption: using
input_pw. - Encryption: using
owner_pw,user_pwandencrypt_aes256, optionally setting permissions withallow. - Signatures: add secure signatures using
sign_keyandsign_cert. List and verify signatures usingdump_signatures(powered bypyHanko).
- Text replacement:
replacetext in content streams using regular expressions (experimental). - Code injection:
injectraw PDF operators at the head/tail of content streams. - Optimization:
optimize_images(smart compression via OCRmyPDF). - Dynamic text:
add_textsupports Bates stamping and can add page numbers, filenames, timestamps, etc. - Cleanup:
normalizecontent streams,linearizefor web viewing. - Plugins: write your own custom operation in Python, save to
~/.config/pdftl/operations(*nix) or%APPDATA%\pdftl\config(Windows) and you can use it in pdftl, just like the built-in operations. And you canmutate_contentusing simple Python scripts.
For more than 100 other examples: pdftl help examples.
# Merge two files
pdftl in1.pdf in2.pdf cat output combined.pdf
# Now with in2.pdf zoomed in
pdftl A=in1.pdf B=in2.pdf cat A Bz1 output combined2.pdf# Take pages 1-5, rotate them 90 degrees East, and crop to A4
pdftl in.pdf cat 1-5east --- crop "(a4)" output out.pdfYou can chain operations without intermediate files using ---:
# Burst a file, but rotate and stamp every page first
pdftl in.pdf rotate south \
--- stamp watermark.pdf \
--- burst output page_%04d.pdf# Fill a form and flatten it (make it non-editable)
pdftl form.pdf fill_form data.fdf flatten output signed.pdf# Change all Highlight annotations on odd pages to Red
pdftl docs.pdf modify_annots "odd/Highlight(C=[1 0 0])" output red_notes.pdf# Add a watermark, the pdftk way
pdftl in.pdf stamp watermark.pdf output marked1.pdf# Add an obnoxious semi-transparent red watermark on odd pages only
pdftl in.pdf add_text 'odd/YOUR AD HERE/(position=mid-center, font=Helvetica-Bold, size=72, rotate=45, color=1 0 0 0.5)' output with_ads.pdf
# Add Bates numbering starting at 000121
# Result: DEF-000121, DEF-000122, ...
pdftl in.pdf \
add_text "/DEF-{page+120:06d}/(position=bottom-center, offset-y=10)" \
output bates.pdf
# Content stream replacment with regular expressions (YMMV)
# Change black to red
pdftl in.pdf replace '/0 0 0 (RG|rg)/1 0 0 \1/' output redder.pdf
While pdftl is primarily a CLI tool, it also exposes a robust Python API for integrating PDF workflows into your scripts.
It supports both a Functional interface (similar to the CLI) and a Fluent interface (for method chaining).
from pdftl import pipeline
# Chain operations fluently without saving intermediate files
(
pipeline("input.pdf")
.rotate("right")
.stamp("watermark.pdf")
.save("output.pdf")
)See the API Tutorial for more details.
| Operation | Description |
|---|---|
add_text |
Add user-specified text strings to PDF pages |
attach_files |
Attach files to the output PDF |
background |
Use a 1-page PDF as the background for each page |
burst |
Split a single PDF into individual page files |
cat |
Concatenate pages from input PDFs into a new PDF |
chop |
Chop pages into multiple smaller pieces |
crop |
Crop pages |
delete |
Delete pages from an input PDF |
delete_annots |
Delete annotation info |
dump_annots |
Dump annotation info |
dump_data |
Metadata, page and bookmark info (XML-escaped) |
dump_data_annots |
Dump annotation info in pdftk style |
dump_data_fields |
Print PDF form field data with XML-style escaping |
dump_data_fields_utf8 |
Print PDF form field data in UTF-8 |
dump_data_utf8 |
Metadata, page and bookmark info (in UTF-8) |
dump_dests |
Print PDF named destinations data to the console |
dump_files |
List file attachments |
dump_layers |
Dump layer info (JSON) |
dump_signatures |
List and validate digital signatures |
dump_text |
Print PDF text data to the console or a file |
fill_form |
Fill a PDF form |
filter |
Do nothing (the default if <operation> is absent) |
generate_fdf |
Generate an FDF file containing PDF form data |
inject |
Inject code at start or end of page content streams |
insert |
Insert blank pages |
modify_annots |
Modify properties of existing annotations |
move |
Move pages to a new location |
multibackground |
Use multiple pages as backgrounds |
multistamp |
Stamp multiple pages onto an input PDF |
mutate_content |
Mutate page content streams using a user-supplied Python script |
normalize |
Reformat page content streams |
optimize_images |
Optimize images |
place |
Shift, scale, and spin page content |
replace |
Regex replacement on page content streams |
render |
Render PDF pages as images |
rotate |
Rotate pages in a PDF |
shuffle |
Interleave pages from multiple input PDFs |
stamp |
Stamp a 1-page PDF onto each page of an input PDF |
unpack_files |
Unpack file attachments |
update_info |
Update PDF metadata from dump_data instructions |
update_info_utf8 |
Update PDF metadata from dump_data_utf8 instructions |
| Option | Description |
|---|---|
allow <perm> |
Specify permissions for encrypted files |
compress |
Compress output file streams (default) |
drop_info |
Discard document-level info metadata |
drop_xfa |
Discard form XFA data if present |
drop_xmp |
Discard document-level XMP metadata |
encrypt_128bit |
Use 128 bit encryption (obsolete, maybe insecure) |
encrypt_40bit |
Use 40 bit encryption (obsolete, highly insecure) |
encrypt_aes128 |
Use 128 bit AES encryption (maybe obsolete) |
encrypt_aes256 |
Use 256 bit AES encryption |
flatten |
Flatten all annotations |
keep_final_id |
Copy final input PDF's ID metadata to output |
keep_first_id |
Copy first input PDF's ID metadata to output |
linearize |
Linearize output file(s) |
no_encrypt_metadata |
Leave metadata unencrypted |
need_appearances |
Set a form rendering flag in the output PDF |
output <file> |
The output file path, or a template for burst |
owner_pw <pw> |
Set owner password and encrypt output |
replacement_font <file> |
Replace the font used for all form fields with a TTF file |
sign_cert <file> |
Path to certificate PEM |
sign_field <name> |
Signature field name (default: Signature1) |
sign_key <file> |
Path to private key PEM |
sign_pass_env <var> |
Environment variable with sign_cert passphrase |
sign_pass_prompt |
Prompt for sign_cert passphrase |
uncompress |
Disable compression of output file streams |
user_pw <pw> |
Set user password and encrypt output |
verbose |
Turn on verbose output |
- License: This project is licensed under the Mozilla Public License 2.0.
- Changelog: CHANGELOG.md.
- Documentation: pdftl.readthedocs.io.