Scrape structured data from pdfs
Python
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
examples
.gitignore
README.rst
scrapepdf.py

README.rst

Tools for scraping data from PDF documents

Use "pdftohtml -xml" to get an XML document, then scrapepdf.py can process this to get some structure out of the resulting XML.