Skip to content
Tool for parsing and working with documents
Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.circleci
docs
padhana
test_documents
tests
.gitignore
LICENSE
README.md
VERSION
conda.yml
junit.xml
meta.yaml
setup.cfg
setup.py

README.md

Padhana

The Padhana framework is designed to enable you to work with PDF and other types of documents in a formal way. By combining a simple document format based on a node hierarchy with a set of parsers and document analysis tools, we parse and then structure/annotate document content to enable rich interactions.

Documentation & Examples

Documentation can be found here: https://hohonu.github.io/padhana-docs/

Set-up

Ensure you have Anaconda 3 or greater installed, then run:

conda env create -f conda.yml --force

Activate the padhana Conda environment with the command:

conda activate padhana

Additional Steps

If you want to use the Tesseract Parser then you will need to install Tesseract

See https://github.com/tesseract-ocr/tesseract/wiki

You can’t perform that action at this time.