-
Notifications
You must be signed in to change notification settings - Fork 15
Quickstart
Welcome to the pdf_hide wiki! This page will give you a quick start guide.
You should also check the algorithm page if you want to know how things are done in detail.
This short page will guide you through a simple series of steps that will help you get pdf_hide working. If you run into trouble, see the FAQ or try troubleshooting.
pdf_hide should work on most Unix-based systems supported by Python 3, including BSD and Solaris distributions.
It will probably not work on Windows — at least not out-of-the-box.
It is currently tested on MacOS and Ubuntu.
This tool is a Python 3 program: it requires a basic Python 3 installation (any version from 3.1 should work). Python 3 distributes under Python License from the Python Software Foundation. The Python 3 installation must include the hashlib module.
You will need QPDF in order to modify compressed PDF files. QPDF distributes under Artistic license v2 from the Perl Foundation. Your system must to be able to call qpdf through Python's os.system command — eg by putting it in your $PATH. The QPDF installation must include support for the QDF format.
You will need GNU Make in order to run the tests — and, more generally, to take advantage of the Makefiles provided with the source. Make distributes under GPLv3 from the GNU Software Foundation.
Additionally, if you want to run tests, you will need pdflatex in order to build the samples. The TeX framework, including pdflatex, distributes under GPLv2 from the GNU Software Foundation. In order to run the tests, your system must to be able to call pdflatex through Python's os.system command — eg by putting it in your $PATH.
You can find the latest version packaged on the releases page. The current version is 0.0 beta: tgz — zip.
Alternatively, you can clone the git repository at: github.com/ncanceill/pdf_hide.git
Once your system meets the requirements, you can go on with the install. First though, you can run the tests with make tests — provided you also satisfied the corresponding optional requirements.
Tests can be run out-of-the-box because Python is interpreted. This also means that you can directly use the pdf_hide script from the top level of the source distribution. If you prefer to have it in your execution path, you can install it on your system.
The install can be performed through:
- the Makefile:
make install - the setup script:
./setup.py install
Usually, installing will require root privileges.
The pdf_hide script supports two commands: either embed or extract.
Do not hesitate to get help using the --help option. It can be used without a command to get generic help for the script, or with a command to get specific help about that command.
By default, the script will only print errors. If you want more info, you can use the -v option; you can use it multiple times to increase verbosity — eg -vvv. Otherwise, if you want the script to be quiet, you can use the -q option.
Given a file containing data to hide, and an innocent PDF file containing enough paragraphs of text (preferably justified), you can generate a PDF file with embedded data using:
pdf_hide -o embedded.pdf embed data_file innocent.pdfYou will be prompted for a password that will be used for extracting; alternatively, you can specify the password directly from the command line using the -k option.
Given an innocent PDF file containing embedded data using, you can extract the hidden data using:
pdf_hide -o extracted_data_file extract embedded.pdfYou will be prompted for the password that was used during embedding; alternatively, you can specify the password directly from the command line using the -k option.
You can get info messages using the -v option, and further increase verbosity by adding vs, eg -vvv.
Alternatively, you can suppress all output using the -q option.
Instead of specifying the path to a data file when embedding, you can use - as a shortcut for Python's sys.stdin file descriptor, which will read from the standard input. As a result, you can pipe the data into the command line:
echo "data" | pdf_hide -o embedded.pdf embed - innocent.pdfWARNING: using this feature MAY break the getpass module!
The getpass module will prompt for the key when it is not specified through the -k option. On some systems (for instance, this behavior has been observed on recent versions of Ubuntu), using the - shortcut locks sys.stdin, which breaks the normal getpass behavior, and may even prevent you from entering the key.
WARNING: a low bit depth will require more space for the same amount of data!
By default, data is embedded in 4-bit integers in original data in the [1,16] range — or [-1,-16]. This can be tuned using the -n option:
pdf_hide -n 7 -o embedded.pdf embed data_file innocent.pdfThe algorithm supports a bit depth from 1 to 8.
WARNING: a high redundancy will require more space for the same amount of data!
By default, on average a tenth (0.1) of the original data is not used. This can be tuned using the -r option:
pdf_hide -r 0.5 -o embedded.pdf embed data_file innocent.pdfThe algorithm supports a redundancy strictly between 0 and 1.
By default, the algorithm will embed random values when data is not used, in order to make the embedded data harder to detect. This can be overruled by the --no-random flag.
Enabling the algorithm improvements with the -i flag allows to use more space for embedding, and uses techniques to make the modifications harder to notice.
If you have enabled the improvements, and if your innocent PDF file has been made by pdflatex, you can use the --custom-range flag.
This will restrict the embedding process to the [-250,-450] range (also excluding values -333 and -334), in order to exploit a covert channel known in LaTeX.
Enabling this feature will greatly reduce the space available for embedding, and it does not support a bit depth higher than 6, but it will make the algorithm much more secure.
Distributed under GPLv3. Copyright © 2013 Nicolas Canceill.