Apache Tika OCR Demo Java Project
Plus aspects:
- Open source.
- Text, PDF, JPEG, JPG, Html, Xml, Excel documents are doing text parse.
- Text, PDF, JPEG, JPG, Html, Xml, Excel, Mp3, Odp, Mp4, JAR etc. showing metadata information of files.
Negative aspects:
- There are problems in Turkish characters.
- There are problems in Russian (Cyrillic alphabet) characters.