ofxTesseract is an addon for openFrameworks that provides access to the Tesseract OCR library.
Warning: the training data provided here causes a segmentation fault in some cases. Please use the latest version available from the Tesseract project.
I've included a copy of eng.traineddata in the example, but if you need a newer one or a different language you can get it from the tesseract-ocr website:
Once downloaded, it should be placed in your data directory under a special directory named 'tessdata':
Example usage of ofxTesseract looks like:
#include "ofxTesseract.h" ... ofxTesseract ocr; ofImage img; ... ocr.setup(); ocr.setWhitelist("0123456789"); tess.setAccuracy(ofxTesseract::ACCURATE); img.loadImage("text.png"); string result = ocr.findText(img); cout << result << endl;
If you need to build the library from scratch, you need to be familiar with building static libraries. For Tesseract, after downloading the source, it looks something like this:
cd /Users/username/tesseract ./runautoconf ./configure --disable-shared --enable-static --prefix=/Users/username/tesseract make sudo make install