Skip to content

cnsuhao/pdftohtml

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is a modified version of the pdftohtml project. It includes rectangles and paths in the XML output so that we can detect lines. Also information about images in the document. We can split the strings or coalesce them as they are processed.

sample.pdf is generated from mkPDF.R. This illustrates rectangles and lines. Using pdftohtml to convert this to XML gives us these elements.

See examples/

About

copy of pdftohtml code with enhancements

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 94.0%
  • C 2.9%
  • Objective-C 1.8%
  • Makefile 1.1%
  • DIGITAL Command Language 0.2%