Skip to content
Tool for parsing and working with documents
Branch: master
Clone or download
Type Name Latest commit message Commit time
Failed to load latest commit information.
.circleci Update build and publish to PyPi Oct 3, 2019
padhana modifying temp log input (simplifying) Nov 4, 2019
test_documents changed threshold for line space from 1.25* to 1.50* Oct 24, 2019
.gitignore making output logging configurable and auto-detecting temp folder if … Nov 1, 2019
LICENSE Initial public commit Jul 19, 2019 updating readme with documentation site info Jul 28, 2019
VERSION modifying temp log input (simplifying) Nov 4, 2019
junit.xml Move to PyPi Oct 3, 2019
meta.yaml Initial public commit Jul 19, 2019
setup.cfg Move to PyPi Oct 3, 2019 Remove dup Oct 3, 2019


The Padhana framework is designed to enable you to work with PDF and other types of documents in a formal way. By combining a simple document format based on a node hierarchy with a set of parsers and document analysis tools, we parse and then structure/annotate document content to enable rich interactions.

Documentation & Examples

Documentation can be found here:


Ensure you have Anaconda 3 or greater installed, then run:

conda env create -f conda.yml --force

Activate the padhana Conda environment with the command:

conda activate padhana

Additional Steps

If you want to use the Tesseract Parser then you will need to install Tesseract


You can’t perform that action at this time.