Skip to content

Python script for performing various operations on ALTO XML files

Notifications You must be signed in to change notification settings

mjdhasan/alto-tools

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

alto-tools

image

Warning: not fully implemented - work in progress

Python3 script for performing various operations on ALTO files.

Planned features:

  • extract OCR confidence of the ALTO document(s)
  • extract text content of the ALTO document(s)
  • extract graphical elements of the ALTO document(s)
  • extract metadata of the ALTO document(s)
  • xsl transform ALTO document(s) to target format(s)
  • xpath query content of the ALTO document(s)

Requirements:

  • lxml for XPath and XSLT support

About

Python script for performing various operations on ALTO XML files

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%