This project is not updated or maintained anymore. At the moment there is too much to do in other projects, so I won't have time for this in the near future. Sorry :-/
Nextcloud OCR (optical character recognition) processing for images and PDF with tesseract-ocr and OCRmyPDF brings OCR capability to your Nextcloud. The app uses a docker container with tesseract-ocr, OCRmyPDF and communicates over redis in order to process images (png, jpeg, tiff) and PDF asynchronously and save the output file to the source folder in nextcloud. That for example enables you to search in it. (Hint: currently not all PDF-types are supported, for more information see here)
Prerequisites, Requirements and Dependencies
The OCR app has some prerequisites:
- Nextcloud 12 or 13. For older versions take an older major version of this app.
- Linux server as environment. (tested with Debian 8 and Ubuntu 14.04 (Trusty)) currently not compatible to ARM processors like raspberry
- Docker is used for processing files. tesseract-ocr and OCRmyPDF reside in a docker container.
- php-redis is used for the communication and has to be a part of your php.
Currently the app is not working with any activated encryption, nor is it working with files shared via external storage or federated sharing. This has to be considered. If one wants to process such a file, it must be copied to the local environment.
For further information see the homepage or the appropriate documentation in the wiki.
Install the app from the Nextcloud AppStore or download the release package from github (NOT the sources) and place the content in nextcloud/apps/ocr/.
Please consider: The app will not work as long as the Docker container isn't running. (more information in the wiki)
Administration and Usage
Please read the related topics in the wiki.
The software is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.