Skip to content

Latest commit

 

History

History
14 lines (10 loc) · 477 Bytes

README.md

File metadata and controls

14 lines (10 loc) · 477 Bytes

Apache Tika OCR Demo Java Project

Apache Tika OCR Demo Java Project

Apache Tika OCR Demo Java Project

Plus aspects:

  • Open source.
  • Text, PDF, JPEG, JPG, Html, Xml, Excel documents are doing text parse.
  • Text, PDF, JPEG, JPG, Html, Xml, Excel, Mp3, Odp, Mp4, JAR etc. showing metadata information of files.

Negative aspects:

  • There are problems in Turkish characters.
  • There are problems in Russian (Cyrillic alphabet) characters.