Skip to content
PDF Checker
Java CSS JavaScript Python HTML
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

PDF Checker

PDF checker is a simple framework to create test/checks of PDF documents for certain text patterns, text organization etc. For instance it can check for "forbidden" words or it can check that all numbered headings are in order or many other checks.

Check project page for details.

How it works

PDF document is decomposend to individual text lines and these are supplied to checker. Checker can use some text tools (like regular expressions) to see if document is fine. Checkers are plugins - so other checkers can be easily provided (see plugins directory - checker is subclass of CheckStrategy class).

A checker reports back errors together with their position in document (page ,% of page height and text bounding box coordinates).

PDF Checker has command line interface ( or web based user interface ( - which also displays PDF document and highlights errors on the document.

Install (on Debian Wheezy)

apt-get install build-essential python python-dev git python-pip uwsgi uwsgi-plugin-python nginx-full 
#Below are dependencies for python-saml - for SSO integration - it's optional if you not using SSO, but must modify
apt-get install libxml2-dev libxslt1-dev libxmlsec1-dev swig
#install python-saml from git to get latest version - there has been critical fix for logout on Jan 9th 2015

cd /opt
git clone checker
cd checker
pip  install -r requirements.pip
cp checker-uwsgi.ini /etc/uwsgi/apps-available/
ln -s /etc/uwsgi/apps-available/checker-uwsgi.ini /etc/uwsgi/apps-enabled
/etc/init.d/uwsgi restart
cp checker-nginx /etc/nginx/sites-available/
ln -s /etc/nginx/sites-available/checker-nginx  /etc/nginx/sites-enabled
/etc/init.d/nginx restart


GPL ver.3 - check it here


Python 2.7 pdfminer >= 20140328 Flask >= 0.10.1


v0.1 - initial version - alpha quality

v0.2 - production version

v.0.2.1 - small fixes,

v.0.3 - categories for checks - better UI

v.0.3.3 - SSO integration (SAML), stores results in DB, couple of core checker changes to enable new custome checks

You can’t perform that action at this time.