Skip to content

sbhar/PDF2HTMLjava

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pdf2Dom

Build Status

Pdf2Dom is a PDF parser that converts the documents to a HTML DOM representation. The obtained DOM tree may be then serialized to a HTML file or further processed. The inline CSS definitions contained in the resulting document are used for making the HTML page as similar as possible to the PDF input. A command-line utility for converting the PDF documents to HTML is included in the distribution package. Pdf2Dom may be also used as an independent Java library with a standard DOM interface for your DOM-based applications.

Pdf2Dom is based on the Apache PDFBox™ library.

See the project page for more information and downloads: http://cssbox.sourceforge.net/pdf2dom

See also the Pdf2Dom-lite fork that provides a lightweight version of Pdf2Dom with no font decoding support but significantly reduced dependencies.


Readme from Sugata Bhar's end tested with only 1 file test.pdf

This is a Maven based project.

The output folder holds the responsive implementation.

PDFToHTML.jar can convert pdf to html(giving absolutely positioned elements) using the example command:

java -jar PDFToHTML.jar test.pdf test.pdf.html

About

PDF to html converter

Resources

License

LGPL-3.0, Unknown licenses found

Licenses found

LGPL-3.0
LICENSE
Unknown
license.pdf

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published