Skip to content

A parser for the hOCR format

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT
Notifications You must be signed in to change notification settings

styrowolf/hocr-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hocr-parser

A parser for the hOCR format, "an open standard for representing document layout analysis and OCR results as a subset of HTML."

Design

This parser uses roxmltree to parse the XHTML. It simplifies provides easy access to the hOCR data embedded through the HOCR and Element structs, as well as their "borrowed" counterparts to prevent allocating for property names.

The parser does not validate if the file adheres to the hOCR specification. It checks required metadata and validity of hOCR element and property names but does not check property values.

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

About

A parser for the hOCR format

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages