Skip to content

Latest commit

 

History

History
131 lines (94 loc) · 4.69 KB

README.md

File metadata and controls

131 lines (94 loc) · 4.69 KB

License: LGPL GitHub Workflow Status Gem Version Downloads

Origamindee

Mindee's fork of the popular Origami library.

Overview

Origami is a framework written in pure Ruby to manipulate PDF files.

It offers the possibility to parse the PDF contents, modify and save the PDF structure, as well as creating new documents.

Origami supports some advanced features of the PDF specification:

  • Compression filters with predictor functions
  • Encryption using RC4 (now obsolete) or AES, including the undocumented Revision 6 derivation algorithm
  • Digital signatures and Usage Rights
  • File attachments
  • AcroForm and XFA forms
  • Object streams

Origami is able to parse PDF, FDF and PPKLite (Adobe certificate store) files.

Requirements

The following Ruby versions are tested and supported: 2.6, 2.7, 3.0, 3.1, 3.2

Some optional features require additional gems:

Quick start

First install Origamindee (this fork) using the latest gem available:

$ gem install origamindee

You'll need to import it under the original name:

require 'origami'

To process a PDF document, you can use the PDF.read method:

pdf = Origami::PDF.read "something.pdf"

puts "This document has #{pdf.pages.size} page(s)"

The default behavior is to parse the entire contents of the document at once. This can be changed by passing the lazy flag to parse objects on demand.

pdf = Origami::PDF.read "something.pdf", lazy: true

pdf.each_page do |page|
    page.each_font do |name, font|
        # ... only parse the necessary bits
    end
end

You can also create documents directly by instantiating a new PDF object:

pdf = Origami::PDF.new

pdf.append_page
pdf.pages.first.write "Hello", size: 30

pdf.save("example.pdf")

# Another way of doing it
Origami::PDF.write("example.pdf") do |pdf|
    pdf.append_page do |page|
        page.write "Hello", size: 30
    end
end

Take a look at the examples and bin directories for some examples of advanced usage.

Tools

Origami comes with a set of tools to manipulate PDF documents from the command line.

  • pdfcop: Runs some heuristic checks to detect dangerous contents.
  • pdfdecompress: Strips compression filters out of a document.
  • pdfdecrypt: Removes encrypted contents from a document.
  • pdfencrypt: Encrypts a PDF document.
  • pdfexplode: Explodes a document into several documents, each of them having one deleted resource. Useful for reduction of crash cases after a fuzzing session.
  • pdfextract: Extracts binary resources of a document (images, scripts, fonts, etc.).
  • pdfmetadata: Displays the metadata contained in a document.
  • pdf2ruby: Converts a PDF into an Origami script rebuilding an equivalent document (experimental).
  • pdfsh: An IRB shell running inside the Origami namespace.

Note: Since version 2.1, pdfwalker has been moved to a separate repository.

Motivation

We were using the excellent Origami library for our Ruby OCR client library.

Unfortunately, it seems the Origami project is now inactive, and as we needed to add Ruby 3.0 support, the decision was made to fork Origami.

We also noticed that the colorize library is licensed under GPL, meaning that Origami cannot be licensed under the LGPL.
It was therefore replaced by Rainbow which has similar functionality, and is licensed under MIT.

Furthermore, we are now in a better position to fix any problems related to PDF parsing that are encountered by our users.

As such it is our intention to support functionalities within the scope of our client library.

We do not claim to be an official successor to Origami.

License

Origami is distributed under the LGPL license.

Copyright © 2019 Guillaume Delugré

Copyright © 2022 Mindee, SA