Skip to content

A python script to pull images from a PDF, and then make backgrounds transparent.

License

Notifications You must be signed in to change notification settings

jammcc/ImagesFromPDF

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ImagesFromPDF

A python script to pull images from a PDF, and then make backgrounds transparent with ImageMagick. Images are saved as png files.

Requirements

The first requirement is pdfreader, which is a Python package for manipulating PDFs. It can be installed with pip using

$ pip install pdfreader

There are other install options detailed in that repository. Note that you must be using Python 3.X to use pdfreader.

The other requirement is ImageMagick, which is a command-line program for editing images. It is used in this script to remove white and/or black backgrounds from images, but is not required to simply extract images. ImageMagick must be installed following the instructions in their documentation. You can use brew if you have a Mac, but the steps are slightly more complicated for Windows or Unix users.

Usage

The script in thie repository is extract_images_from_pdf.py and it can be ran from the command line with

$ python extract_images_from_pdf.py /path/to/MyFile.pdf

You can run this script with a number of flags that you can set to enable different features. These include:

  • -o or --output; controls the output directory, by default it will output to <filename>_images/ where the input file is called /path/to/<filename>.pdf.
  • -v or --verbose; set to True by default, controls the amount of output provided.
  • -fp or --first_page; default 0, first page to export from.
  • -lp or --last_page; default 1000, last page to export from.
  • -mw or --min_width; default 200, minimum pixel width of pictures to export.
  • -mh or --min_height; default 200, minimum pixel height of pictures to export.
  • -xw or --max_width; default 1210, maximum pixel width of pictures to export.
  • -xh or --max_height; default 1517, maximum pixel height of pictures to export.
  • -mt or --make_transparent; default False, flag to attempt to make backgrounds transparent.
  • -wt or --white_to_trans; default True, if -mt=True then set this flag to make white pixels transparent
  • -bt or --black_to_trans; default True, if -mt=True then set this flag to make black pixels transparent.
  • -wf or --white_fuzz; default 1, if white pixels are made transparent, sets the ImageMagick fuzz percentage (i.e. sets almost white pixels to transparent as well). Can be 0-100.
  • -bf or --black_fuzz; default 1, if black pixels are made transparent, sets the ImageMagick fuzz percentage (i.e. sets almost black pixels to transparent as well). Can be 0-100.
  • -ims or --image_string; default is "Im", string that appears in all image names used to indicate which images to pull from the document.

Note that the max pixel height and width values correspond to just under the size of pages in standard Pathfinder bestiaries.

Example usage

I have a PDF of the Pathfinder Bestiary released by Paizo. I was able to use this script to pull out all monster images into png files with transparent backgrounds using:

$ python extract_images_from_pdf.py Bestiary1.pdf -mt=True

These images can then be inserted into virtual tabletop software.

Tests

This sript has been tested on Python 3.7 on a Mac.

Contributing

PRs are welcome. Big upgrades could include giving sensible names to images based on nearby text in the documents, as well as a test suite that I was too lazy to make.

About

A python script to pull images from a PDF, and then make backgrounds transparent.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%