-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase OCR timeout #10
Comments
The build script of the Debian Package now extracts the OCR config org/apache/tika/parser/ocr/TesseractOCRConfig.properties from the Tika Server JAR, changes timeout setting and adds/overwrites with changed config to/in the Tika Server JAR of the package. |
If anybody needs to modify the timeout via REST, just add a header with "X-Tika-OCRTimeout: 200" for 200 seconds of timeout. Example: curl -T file_to_ocr.jpg localhost:9998/tika --header "X-Tika-OCRTimeout: 200" |
Thanks for your tip, will add that in ETL plugin for the case someone uses a Tika on another server/installation which is not our preconfigured Tika deb package. |
Timeout settings now by Open Semantic ETL using header X-Tika-OCRTimeout for Tika-Server. |
I am having this pop up now; Its for the fake tika server
|
Tika default OCR timeout of 120 not enough if multiple parallel processed documents or images doing OCR which leads to Tika OCR timeouts and so Tika exception for full document(s)
The text was updated successfully, but these errors were encountered: