Skip to content
Nicolas Canceill edited this page Nov 10, 2013 · 15 revisions

Welcome to the pdf_hide wiki! This page will give you a quick start guide.

You should also check the algorithm page if you want to know how things are done in detail.

Quick Start Guide

This short page will guide you through a simple series of steps that will help you get pdf_hide working. If you run into trouble, see the FAQ or try troubleshooting.

Requirements

Supported systems

pdf_hide should work on most Unix-based systems supported by Python 3, including BSD and Solaris distributions.

It will probably not work on Windows — at least not out-of-the-box.

It is currently tested on MacOS and Ubuntu.

Mandatory dependencies

This tool is a Python 3 program: it requires a basic Python 3 installation (any version from 3.1 should work). Python 3 distributes under Python License from the Python Software Foundation. The Python 3 installation must include the hashlib module.

You will need QPDF in order to modify compressed PDF files. QPDF distributes under Artistic license v2 from the Perl Foundation. Your system must to be able to call qpdf through Python's os.system command — eg by putting it in your $PATH. The QPDF installation must include support for the QDF format.

Optional dependencies

You will need GNU Make in order to run the tests — and, more generally, to take advantage of the Makefiles provided with the source. Make distributes under GPLv3 from the GNU Software Foundation.

Additionally, if you want to run tests, you will need pdflatex in order to build the samples. The TeX framework, including pdflatex, distributes under GPLv2 from the GNU Software Foundation. In order to run the tests, your system must to be able to call pdflatex through Python's os.system command — eg by putting it in your $PATH.

Install

You can find the latest version packaged on the releases page. The current version is 0.0 beta: tgzzip.

Alternatively, you can clone the git repository at: github.com/ncanceill/pdf_hide.git

Once your system meets the requirements, you can go on with the install. First though, you can run the tests with make tests — provided you also satisfied the corresponding optional requirements.

Tests can be run out-of-the-box because Python is interpreted. This also means that you can directly use the pdf_hide script from the top level of the source distribution. If you prefer to have it in your execution path, you can install it on your system.

The install can be performed through:

  • the Makefile: make install
  • the setup script: ./setup.py install

Usually, installing will require root privileges.

First steps

The pdf_hide script supports two commands: either embed or extract.

Do not hesitate to get help using the --help option. It can be used without a command to get generic help for the script, or with a command to get specific help about that command.

By default, the script will only print errors. If you want more info, you can use the -v option; you can use it multiple times to increase verbosity — eg -vvv. Otherwise, if you want the script to be quiet, you can use the -q option.

Embedding

Given a file containing data to hide, and an innocent PDF file containing enough paragraphs of text (preferably justified), you can generate a PDF file with embedded data using:

pdf_hide -o embedded.pdf embed data_file innocent.pdf

You will be prompted for a password that will be used for extracting; alternatively, you can specify the password directly from the command line using the -k option.

Extracting

Given an innocent PDF file containing embedded data using, you can extract the hidden data using:

pdf_hide -o extracted_data_file extract embedded.pdf

You will be prompted for the password that was used during embedding; alternatively, you can specify the password directly from the command line using the -k option.

Advanced features

Verbosity control

You can get info messages using the -v option, and further increase verbosity by adding vs, eg -vvv.

Alternatively, you can suppress all output using the -q option.

Embedding from standard input

Instead of specifying the path to a data file when embedding, you can use - as a shortcut for Python's sys.stdin file descriptor, which will read from the standard input. As a result, you can pipe the data into the command line:

echo "data" | pdf_hide -o embedded.pdf embed - innocent.pdf

WARNING: using this feature MAY break the getpass module!

The getpass module will prompt for the key when it is not specified through the -k option. On some systems (for instance, this behavior has been observed on recent versions of Ubuntu), using the - shortcut locks sys.stdin, which breaks the normal getpass behavior, and may even prevent you from entering the key.

Tuning the algorithm

Bit depth

WARNING: a low bit depth will require more space for the same amount of data!

By default, data is embedded in 4-bit integers in original data in the [1,16] range — or [-1,-16]. This can be tuned using the -n option:

pdf_hide -n 7 -o embedded.pdf embed data_file innocent.pdf

The algorithm supports a bit depth from 1 to 8.

Redundancy

WARNING: a high redundancy will require more space for the same amount of data!

By default, on average a tenth (0.1) of the original data is not used. This can be tuned using the -r option:

pdf_hide -r 0.5 -o embedded.pdf embed data_file innocent.pdf

The algorithm supports a redundancy strictly between 0 and 1.

The random values

By default, the algorithm will embed random values when data is not used, in order to make the embedded data harder to detect. This can be overruled by the --no-random flag.

Using improvements

Enabling the algorithm improvements with the -i flag allows to use more space for embedding, and uses techniques to make the modifications harder to notice.

LaTeX custom range

If you have enabled the improvements, and if your innocent PDF file has been made by pdflatex, you can use the --custom-range flag.

This will restrict the embedding process to the [-250,-450] range (also excluding values -333 and -334), in order to exploit a covert channel known in LaTeX.

Enabling this feature will greatly reduce the space available for embedding, and it does not support a bit depth higher than 6, but it will make the algorithm much more secure.


Distributed under GPLv3. Copyright © 2013 Nicolas Canceill.

Clone this wiki locally