Pdf2Dom is a PDF parser that converts the documents to a HTML DOM representation. The obtained DOM tree may be then serialized to a HTML file or further processed. A command-line utility for converting the PDF documents to HTML is included in the distribution package. Pdf2Dom may be also used as an independent Java library with a standard DOM i…
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.settings Dependency update Oct 26, 2018
misc Travis configuration fixes May 19, 2016
src Update to the new jStyleParser functions API May 24, 2018
.classpath Dependency update Oct 26, 2018
.gitignore Converted to maven Feb 17, 2014
.project Converted to maven Feb 17, 2014
.travis.yml Travis configuration fixes May 19, 2016
CHANGELOG Documentation update Jan 30, 2018
LICENSE License added Jan 22, 2018
README.md README update May 19, 2016
pom.xml Dependency update Oct 26, 2018

README.md

Pdf2Dom

Build Status

Pdf2Dom is a PDF parser that converts the documents to a HTML DOM representation. The obtained DOM tree may be then serialized to a HTML file or further processed. The inline CSS definitions contained in the resulting document are used for making the HTML page as similar as possible to the PDF input. A command-line utility for converting the PDF documents to HTML is included in the distribution package. Pdf2Dom may be also used as an independent Java library with a standard DOM interface for your DOM-based applications or as an alternative parser for the CSSBox rendering engine in order to add the PDF processing capability to CSSBox.

Pdf2Dom is based on the Apache PDFBox™ library.

See the project page for more information and downloads: http://cssbox.sourceforge.net/pdf2dom