Skip to content

Latest commit

 

History

History
71 lines (54 loc) · 7.11 KB

README.md

File metadata and controls

71 lines (54 loc) · 7.11 KB

Awesome PDF Awesome

A curated list of resources around PDF files

The File Format

Viewers

  • KOReader: a document viewer primarily aimed at e-ink readers
  • react-native-pdf: a react native PDF view component
  • PdfViewPager: Android widget to display PDF documents in your Activities or Fragments
  • vue-pdf: vue.js pdf viewer

Data Extraction

  • pdftotext: an application that converts Portable Document Format (PDF) files to plain text. Part of poppler-utils.
  • pdfminer.six: a Python library for extracting information from PDF documents
    • pdfplumber: Plumb a PDF for detailed information about each text character, rectangle, and line. Plus: Table extraction and visual debugging.
  • Tabula: an application for extracting tables
  • camelot: PDF Table Extraction
  • awesome-document-understanding: A curated list of resources for Document Understanding (DU) topic

Generators

Anything that can produce PDF files from scratch:

  • fpdf2: An Open Source Python library for generating PDFs
  • pdflatex (e.g. in TexLive): A LaTeX-to-PDF converter
  • reportlab: An Open Source Python library for generating PDFs and graphics.
  • prawn: a pure Ruby PDF generation library
  • react-pdf: Create PDF files using React
  • markdown-pdf: Markdown to PDF converter
  • mpdf: PHP library generating PDF files from UTF-8 encoded HTML

Manipulators

Anything that's used to edit an existing PDF file:

  • pdfarranger: a small python-gtk application, which helps the user to merge or split pdf documents and rotate, crop and rearrange their pages using a graphical interface
  • OCRmyPDF: adds an OCR text layer to scanned PDF files, allowing them to be searched

File Analysis / Security

  • Pdfalyzer: PDF analysis tool to visualize the internal data structure of a PDF in large and colorful diagrams as well as scanning the binary streams embedded in the PDF against a collection of malicious PDF specific YARA rules.
  • Malicious PDF Generator: generate a bunch of malicious pdf files with phone-home functionality
  • pdfbox: tool in java to browse internally a pdf. Download and use as pdfbox-app-x.y.z.jar debug pdf_file

Multi-Purpose Libraries

  • pdftk: command-line tool for working with PDFs. It is commonly used for client-side scripting or server-side processing of PDFs.
  • pypdf : a free and open-source pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
  • pikepdf : a Python library for reading and writing PDF, powered by qpdf
  • PyMuPDF : Python bindings to MuPDF.
  • pypdfium2 : Python bindings to PDFium.
  • borb : reading, creating and manipulating PDF files in python
  • pdfcpu : batch processing and scripting via a rich command line
  • pdf-lib : Create and modify PDF documents in any JavaScript environment
  • HexaPDF: : A pure Ruby PDF creation and manipulation library