Skip to content
master
Go to file
Code
This branch is 13 commits ahead of ministryofjustice:master.

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 

README.md

Apache Tika Server w/ Tesseract in Docker

Sets up a container based on java:7

Includes

If you prefer the latest stable version of Tika-server (including OCR via Tesseract), you may want to consider logicalspark/docker-tikaserver

Usage

To use the image from the Docker registry, just do:

sudo docker run -d -p 9998:9998 mattfullerton/tika-tesseract-docker

N.B.: This automated build has a problem preventing the process from running. An alternative, manually built repository is at mattfullerton/tika-tesseract-docker-no-automation, or you may have success building yourself (below), or not. I am trying to understand how this can happen!

I.e., alternatively try:

sudo docker run -d -p 9998:9998 mattfullerton/tika-tesseract-docker-no-automation

To build and run the container, do the following:

sudo docker build -t tika github.com/mattfullerton/tika
sudo docker run -d -p 9998:9998 tika

Test with commands like:

curl -T testpdf.pdf http://localhost:9998/tika
curl -T multipage_tiff_example.tif http://localhost:9998/tika

The second command uses OCR.

Author

Credits

About

Docker container to provide Apache Tika RESTful API

Resources

License

Releases

No releases published

Languages

You can’t perform that action at this time.