HOCRReader

An utility to read Html OCR data from Tesseract.

Installation

The package is available on Nuget: Quellatalo.Nin.HOCRReader

Example code

(Working together with TheEyes library)

/// <summary>
/// A test to run on windows 8 and later.
/// This code will prepare a tesseract ocr for English,
/// find the lines with the "great" word in them, and highlight those lines as another new image file.
/// Please prepare tesseract, tessdata folder, original image and update the parameters accordingly.
/// </summary>
using Quellatalo.Nin.TheEyes.Imaging;
using Quellatalo.Nin.HOCRReader;
using Emgu.CV;
using Emgu.CV.OCR;
using System.Drawing;
using System.Threading;

void HOCRTest()
{
    Tesseract tesseract = new Tesseract(@"path\to\tessdata", "eng", OcrEngineMode.TesseractLstmCombined)
    {
        PageSegMode = PageSegMode.SparseText
    };
    using (Bitmap img = new Bitmap(@"path\to\OriginalImage.png"))
    using (Graphics g = Graphics.FromImage(img))
    using (Image<Bgr, byte> b = new Image<Bgr, byte>(img))
    {
        tesseract.SetImage(b);
        HOCR hOCR = new HOCR(tesseract.GetHOCRText());
		// find all lines that contain Japanese 'ru' character and highlight them
        List<OCRLine> foundLines = hOCR.FindAllText("great");
        foreach (OCRLine line in foundLines)
        {
            GraphicX.Instance.Highlight(g, line.Rectangle, Pens.Red);
        }
		// save the highlighted work to another file.
        img.Save(@"path\to\HighlightedImageOutput.png");
    }
}

License

MIT

It's free. El Psy Congroo!

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
HOCRReader		HOCRReader
.gitattributes		.gitattributes
.gitignore		.gitignore
HOCRReader.sln		HOCRReader.sln
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HOCRReader

Installation

Example code

License

About

Releases

Packages

Languages

License

quellatalo/HOCRReader

Folders and files

Latest commit

History

Repository files navigation

HOCRReader

Installation

Example code

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages