This extension is currently under development.
$ git clone https://github.com/insecia/php-tesseract.git
$ cd php-tesseract
$ docker-compose build extension-builder
$ docker-compose run --rm extension-builder
$ docker-compose build php
$ docker-compose run --rm php make test
$ docker-compose run --rm php php script_name.php
$tesseract = \Tesseract\Tesseract::fromFile('image.jpg');
$textContent = $tesseract->getText();
It's also possible to define a certain rectangular area of the image from which the tesseract lib should extract text.
$tesseract = \Tesseract\Tesseract::fromFile('image.jpg');
$textContent = $tesseract->getRectangle(500, 500, 1000, 1000)->getText();
A tesseract instance can also be created from a string that contains the binary content of an image. This has the advantage of not requiring the creation of a temporary file.
$textContent = \Tesseract\Tesseract::fromString($imageContent)->getText();
One or multiple languages can also be specified. Note that the language file for the specified languages must be installed. Refer to the Dockerfile for usage under Alpine or the tesseract-ocr documentation.
$tesseract = \Tesseract\Tesseract::fromFile('image.jpg', [
\Tesseract\Language\GERMAN,
\Tesseract\Language\ENGLISH
]);
$textContent = $tesseract->getText();
It is also possible to choose a different page seg mode.
$tesseract = \Tesseract\Tesseract::fromFile('image.jpg');
echo $tesseract->setPageSegMode(\Tesseract\PageSegMode\SINGLE_WORD)->getText();