ImagesFromPDF

A python script to pull images from a PDF, and then make backgrounds transparent with ImageMagick. Images are saved as png files.

Requirements

The first requirement is pdfreader, which is a Python package for manipulating PDFs. It can be installed with pip using

$ pip install pdfreader

There are other install options detailed in that repository. Note that you must be using Python 3.X to use pdfreader.

The other requirement is ImageMagick, which is a command-line program for editing images. It is used in this script to remove white and/or black backgrounds from images, but is not required to simply extract images. ImageMagick must be installed following the instructions in their documentation. You can use brew if you have a Mac, but the steps are slightly more complicated for Windows or Unix users.

Usage

The script in thie repository is extract_images_from_pdf.py and it can be ran from the command line with

$ python extract_images_from_pdf.py /path/to/MyFile.pdf

You can run this script with a number of flags that you can set to enable different features. These include:

-o or --output; controls the output directory, by default it will output to <filename>_images/ where the input file is called /path/to/<filename>.pdf.
-v or --verbose; set to True by default, controls the amount of output provided.
-fp or --first_page; default 0, first page to export from.
-lp or --last_page; default 1000, last page to export from.
-mw or --min_width; default 200, minimum pixel width of pictures to export.
-mh or --min_height; default 200, minimum pixel height of pictures to export.
-xw or --max_width; default 1210, maximum pixel width of pictures to export.
-xh or --max_height; default 1517, maximum pixel height of pictures to export.
-mt or --make_transparent; default False, flag to attempt to make backgrounds transparent.
-wt or --white_to_trans; default True, if -mt=True then set this flag to make white pixels transparent
-bt or --black_to_trans; default True, if -mt=True then set this flag to make black pixels transparent.
-wf or --white_fuzz; default 1, if white pixels are made transparent, sets the ImageMagick fuzz percentage (i.e. sets almost white pixels to transparent as well). Can be 0-100.
-bf or --black_fuzz; default 1, if black pixels are made transparent, sets the ImageMagick fuzz percentage (i.e. sets almost black pixels to transparent as well). Can be 0-100.
-ims or --image_string; default is "Im", string that appears in all image names used to indicate which images to pull from the document.

Note that the max pixel height and width values correspond to just under the size of pages in standard Pathfinder bestiaries.

Example usage

I have a PDF of the Pathfinder Bestiary released by Paizo. I was able to use this script to pull out all monster images into png files with transparent backgrounds using:

$ python extract_images_from_pdf.py Bestiary1.pdf -mt=True

These images can then be inserted into virtual tabletop software.

Tests

This sript has been tested on Python 3.7 on a Mac.

Contributing

PRs are welcome. Big upgrades could include giving sensible names to images based on nearby text in the documents, as well as a test suite that I was too lazy to make.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
extract_images_from_pdf.py		extract_images_from_pdf.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ImagesFromPDF

Requirements

Usage

Example usage

Tests

Contributing

About

Releases

Packages

Languages

License

jammcc/ImagesFromPDF

Folders and files

Latest commit

History

Repository files navigation

ImagesFromPDF

Requirements

Usage

Example usage

Tests

Contributing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages