Skip to content
Tool for parsing and working with documents
Python
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.circleci Update build and publish to PyPi Oct 3, 2019
docs
padhana modifying temp log input (simplifying) Nov 4, 2019
test_documents changed threshold for line space from 1.25* to 1.50* Oct 24, 2019
tests
.gitignore making output logging configurable and auto-detecting temp folder if … Nov 1, 2019
LICENSE Initial public commit Jul 19, 2019
README.md updating readme with documentation site info Jul 28, 2019
VERSION modifying temp log input (simplifying) Nov 4, 2019
conda.yml
junit.xml Move to PyPi Oct 3, 2019
meta.yaml Initial public commit Jul 19, 2019
setup.cfg Move to PyPi Oct 3, 2019
setup.py Remove dup Oct 3, 2019

README.md

Padhana

The Padhana framework is designed to enable you to work with PDF and other types of documents in a formal way. By combining a simple document format based on a node hierarchy with a set of parsers and document analysis tools, we parse and then structure/annotate document content to enable rich interactions.

Documentation & Examples

Documentation can be found here: https://hohonu.github.io/padhana-docs/

Set-up

Ensure you have Anaconda 3 or greater installed, then run:

conda env create -f conda.yml --force

Activate the padhana Conda environment with the command:

conda activate padhana

Additional Steps

If you want to use the Tesseract Parser then you will need to install Tesseract

See https://github.com/tesseract-ocr/tesseract/wiki

You can’t perform that action at this time.