PDF Ghost is a Python library designed for performing a wide range of operations on PDF files, including merging,
splitting, rotating, compressing, watermarking, converting, encrypting/decrypting, extracting text/images, adding page
numbers, batch processing, and comparing PDFs. It also supports generating PDFs from Markdown or LaTeX files.
- Merge PDFs: Combine multiple PDFs into a single file.
- Split PDFs: Split a PDF into smaller files based on page ranges.
- Remove Pages: Remove specific pages with page index, remove page from start and end.
- Rotate Pages: Rotate all or specific pages in a PDF.
- Insert Pages: Insert pages or specific pages in a PDF.
- Rearrange Pages: Rearrange pages of a pdf file or merge pdf files and then rearrange all the pages.
- Compress PDFs: Reduce the file size of a PDF by optimizing images and removing unnecessary metadata.
- Watermarking: Add or remove text or image watermarks to PDFs.
- Image to PDF: Covert Images to PDF file.
- PDF to Image: Convert pages of a PDF file to images.
- Encrypt/Decrypt PDFs: Add password protection to PDFs and decrypt them with the correct password.
- Extract Text/Images: Extract text or images from a PDF.
- Add Page Numbers: Insert page numbers at the bottom or top of each page.
- Convert PDFs to HTML: Convert PDFs into structured HTML files.
- Generate PDFs from Markdown/LaTeX: Convert Markdown or LaTeX files into well-formatted PDFs.
- Compare PDFs: Identify differences between two PDF files.
- PDF Signing: Add digital signatures to PDFs using cryptographic certificates.
- Batch Processing: Apply operations (merge, split, rotate, etc.) on multiple PDFs at once.
- Python 3.7+
pip install pdfghostFor Markdown-to-PDF and LaTeX-to-PDF conversion, the following external tools are required:
- Pandoc: For converting Markdown to PDF.
- BasicTeX: A lightweight LaTeX distribution for converting LaTeX to PDF.
If you have Homebrew installed, run:
brew install pandocsudo apt-get update
sudo apt-get install pandocDownload the Pandoc installer from the official website here and follow the installation instructions.
- Download BasicTeX from here.
- Install it by following the on-screen instructions.
- Add the following to your
.bashrcor.zshrcfile:export PATH="/usr/local/texlive/2023/bin/universal-darwin:$PATH"
- Install
texlive(a full LaTeX distribution):sudo apt-get update sudo apt-get install texlive
- Download and install MiKTeX (a lightweight LaTeX distribution) from here.
- Follow the installation instructions.
from pdfghost import merge_pdfs
merge_pdfs("output.pdf", "file1.pdf", "file2.pdf")from pdfghost import split_pdf
split_pdf("input.pdf", "output_folder", split_range=(0, 2))from pdfghost import remove_pages
# Remove pages with indices 0, 2, and 4 (0-based)
remove_pages("input.pdf", "output.pdf", pages_to_remove=[0, 2, 4])from pdfghost import remove_pages_from_start
# Remove the first 3 pages
remove_pages_from_start("input.pdf", "output.pdf", num_pages=3)from pdfghost import remove_pages_from_end
# Remove the last 2 pages
remove_pages_from_end("input.pdf", "output.pdf", num_pages=2)from pdfghost import rotate_pdf
# Rotate all pages by 90 degrees
rotate_pdf("input.pdf", "output.pdf", rotation=90)
# Rotate specific pages by 180 degrees
rotate_pdf("input.pdf", "output.pdf", rotation=180, pages_to_rotate=[0, 2])from pdfghost import insert_pages
# Insert pages at specific positions
insertions = [
(1, "insert1.pdf"), # Insert pages from insert1.pdf at position 1
(4, "insert2.pdf"), # Insert pages from insert2.pdf at position 4
]
insert_pages("input.pdf", "output.pdf", insertions)from pdfghost import rearrange_pdf
# Rearrange pages in a PDF
page_order = [2, 0, 1] # New order: Page 3, Page 1, Page 2
rearrange_pdf("input.pdf", "output.pdf", page_order)from pdfghost import merge_and_rearrange
# Merge multiple PDFs and rearrange their pages
page_order = [
(0, 0), # Page 1 from file1.pdf
(1, 0), # Page 1 from file2.pdf
(0, 1), # Page 2 from file1.pdf
]
merge_and_rearrange("output.pdf", page_order, "file1.pdf", "file2.pdf")from pdfghost import compress_pdf
# Compress a PDF with medium compression
compress_pdf("input.pdf", "output.pdf", power=3)
# Compress a PDF with maximum compression
compress_pdf("input.pdf", "output.pdf", power=5)from pdfghost import add_text_watermark
# Add a text watermark to all pages
add_text_watermark("input.pdf", "output.pdf", text="Confidential")
# Add a text watermark to specific pages
add_text_watermark("input.pdf", "output.pdf", text="Confidential", pages_to_watermark=[0, 2])from pdfghost import add_image_watermark
# Add an image watermark to all pages
add_image_watermark("input.pdf", "output.pdf", image_path="watermark.png")
# Add an image watermark to specific pages
add_image_watermark("input.pdf", "output.pdf", image_path="watermark.png", pages_to_watermark=[1])from pdfghost import remove_watermark
# Remove watermarks from all pages
remove_watermark("input.pdf", "output.pdf")
# Remove watermarks from specific pages
remove_watermark("input.pdf", "output.pdf", pages_to_clean=[0, 2])from pdfghost import pdf_to_images
# Convert each page of a PDF into PNG images
pdf_to_images("input.pdf", "output_folder", format="png")
# Convert each page of a PDF into JPG images
pdf_to_images("input.pdf", "output_folder", format="jpg")from pdfghost import images_to_pdf
# Convert multiple image files into a single PDF
images_to_pdf("output.pdf", "image1.png", "image2.jpg")from pdfghost import encrypt_pdf
# Encrypt a PDF with a password
encrypt_pdf("input.pdf", "output.pdf", password="mypassword")from pdfghost import decrypt_pdf
# Decrypt a PDF with a password
decrypt_pdf("input.pdf", "output.pdf", password="mypassword")from pdfghost import extract_text
# Extract text from a PDF and save it as a .txt file
extract_text("input.pdf", "output.txt", format="txt")
# Extract text from a PDF and save it as a .csv file
extract_text("input.pdf", "output.csv", format="csv")from pdfghost import extract_images
# Extract all images from a PDF and save them as separate image files
extract_images("input.pdf", "output_folder")from pdfghost import add_page_numbers
# Add page numbers at the bottom of each page
add_page_numbers("input.pdf", "output.pdf", position="bottom")
# Add page numbers at the top of each page
add_page_numbers("input.pdf", "output.pdf", position="top")from pdfghost import pdf_to_html
# Convert a PDF into a structured HTML file
pdf_to_html("input.pdf", "output.html")from pdfghost import markdown_to_pdf
# Convert a Markdown file into a PDF
markdown_to_pdf("input.md", "output.pdf")from pdfghost import latex_to_pdf
# Convert a LaTeX file into a PDF
latex_to_pdf("input.tex", "output.pdf")from pdfghost import compare_pdfs
# Compare two PDFs and generate a summary of differences
result = compare_pdfs("file1.pdf", "file2.pdf", output_type="summary")
print(result)
# Compare two PDFs with side-by-side output
result = compare_pdfs("file1.pdf", "file2.pdf", output_type="side_by_side")
print(result)
# Compare two PDFs with highlighted differences
result = compare_pdfs("file1.pdf", "file2.pdf", output_type="highlight_differences")
print(result)
# Compare two PDFs with version control-style output
result = compare_pdfs("file1.pdf", "file2.pdf", output_type="version_control")
print(result)
# Compare two PDFs with annotations
result = compare_pdfs("file1.pdf", "file2.pdf", output_type="annotations")
print(result)from pdfghost import sign_pdf
# Sign a PDF with a cryptographic certificate
sign_pdf("input.pdf", "signed.pdf", "certificate.pfx", password="mypassword")from pdfghost import batch_process, rotate_pdf
# Rotate all PDFs in a folder by 90 degrees
batch_process("input_folder", "output_folder", rotate_pdf, rotation=90)To run unit tests, first install the development dependencies, and then use:
python -m unittest discover tests/- Fork the repository.
- Create your feature branch (
git checkout -b feature/your-feature). - Commit your changes (
git commit -am 'Add new feature'). - Push to the branch (
git push origin feature/your-feature). - Open a new Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.