Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCRWorker.php fails to automatically ocr newly uploaded pdf file #55

Closed
chris001 opened this issue Feb 6, 2017 · 3 comments
Closed

Comments

@chris001
Copy link

chris001 commented Feb 6, 2017

OCRWorker.php is running as a systemd service, as described in the documentation, with the correct user and group www-data same as the web server.
Yet, OCRWorker fails to ocr a newly uploaded pdf file.

Bug report / Feature request

Expected Behavior

The documentation states that, OCRWorker.php is supposed to automatically ocr a newly added pdf file.

Current Behavior

It doesn't ocr the pdf file, even after waiting about an hour. However, the command in the overlay menu, does indeed ocr the file.
Would prefer the automatic ocr to be working.

Possible Solution

Where to look to troubleshoot this?
Is there a log?
Has anyone had the same issue with OCRWorker and solved it?

Steps to Reproduce (for bugs)

  1. Install the ocr app, and its prerequisites, as describe in the documentation.
  2. Install the OCRWorker.php daemon using the systemd option as detailed in the wiki.
  3. Upload a pdf file containing a scanned document, to owncloud/nextcloud.
  4. Watch the process list, OCRWorker.php is doing nothing, even after a long time.
  5. Click on the overlay menu, start the the ocr manually, watch the process list, the daemon OCRWorker and tesseract run with high load for 10 seconds, and a new file is produced adjacent to the original file, with _OCR.pdf suffix, correctly containing the ocr'ed data.

Context

Your Environment

  • OCR version used: latest version from here.
  • Browser Name and version: Firefox 52.0b3 latest version.
  • Operating System and version (desktop or mobile): Windows 10 latest updates. Linux Debian 8 server.
  • ownCloud/nextcloud version: (see ownCloud admin page or version.php) latest version nextcloud.
  • PHP version 7.0
  • Database version Mysql Mariadb 5.6
  • Are you using encryption: yes/no No.

Log File Content (nextcloud/owncloud.log of the "data"-directory)

@janis91
Copy link
Owner

janis91 commented Feb 6, 2017

Actually the steps you are describing are absolutely what the app is about to do. It offers the possibility to ocr a file (image / pdf). It does not work directly after a new file is added to nextcloud. As this isn't the behaviour, what others would expect. One example:
You add/upload your photos (jpg - supported type for ocr) of your last vacation and there isn't much text on it. It's just a bunch of photos from sightseeing and so on. But if the app would trigger a ocr process for all newly added files, this would be the case.
So the manual start has to be performed by the user.

If you have the need for another behavior, you can for example fork the github repository and add a hook for a newly uploaded file.

I will close this issue, as the app is supposed to work like this.

@janis91 janis91 closed this as completed Feb 6, 2017
@chris001
Copy link
Author

chris001 commented Feb 7, 2017

When you upload a new PDF file, would it be accurate to say, owncloud creates the file and/or opens it for writing, completely writes it, and closes it? After it's closed for the first time, is it the hook for postCreate or postWrite that gets triggered ?

@janis91
Copy link
Owner

janis91 commented Feb 7, 2017

Actually I don't know this. Maybe you can ask in the nextcloud/server repo. The guys over there should know this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants