Docker container to provide Apache Tika RESTful API
Shell
Switch branches/tags
Nothing to show
Clone or download
Pull request Compare This branch is 13 commits ahead of ministryofjustice:master.
Permalink
Failed to load latest commit information.
Dockerfile Update README. Fix author in Dockerfile Feb 9, 2016
LICENSE.txt
README.md
install.sh

README.md

Apache Tika Server w/ Tesseract in Docker

Sets up a container based on java:7

Includes

If you prefer the latest stable version of Tika-server (including OCR via Tesseract), you may want to consider logicalspark/docker-tikaserver

Usage

To use the image from the Docker registry, just do:

sudo docker run -d -p 9998:9998 mattfullerton/tika-tesseract-docker

N.B.: This automated build has a problem preventing the process from running. An alternative, manually built repository is at mattfullerton/tika-tesseract-docker-no-automation, or you may have success building yourself (below), or not. I am trying to understand how this can happen!

I.e., alternatively try:

sudo docker run -d -p 9998:9998 mattfullerton/tika-tesseract-docker-no-automation

To build and run the container, do the following:

sudo docker build -t tika github.com/mattfullerton/tika
sudo docker run -d -p 9998:9998 tika

Test with commands like:

curl -T testpdf.pdf http://localhost:9998/tika
curl -T multipage_tiff_example.tif http://localhost:9998/tika

The second command uses OCR.

Author

Credits