Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to obtain the Thresholded Image from tesseract? #588

Closed
diogoalmiro opened this issue Dec 5, 2021 · 7 comments
Closed

Is it possible to obtain the Thresholded Image from tesseract? #588

diogoalmiro opened this issue Dec 5, 2021 · 7 comments

Comments

@diogoalmiro
Copy link

I would like to have access to the thresholded image created by tesseract. (See function: GetThresholdedImage)

This feature is implemented by the tesseract wrapper library for pyton as a function: https://github.com/sirfz/tesserocr/blob/master/tesserocr.pyx#L1737
There is also a parameter that writes the thresholded image to the file "tessinput.tif": "tessedit_write_images".

I am using win10 and NodeJS with some tiff files, setting the parameter "tessedit_write_images" to "T" or "1" does nothing.

@Balearica
Copy link
Member

This library does not currently include an interface for retrieving the thresholded image. It's theoretically possible to do so using low-level functions to read the contents of the wasm filesystem and/or memory--but not something we currently support with high-level functions.

Feel free to thumbs up this issue if you're reading this and would use this feature if added--can probably add in a future release if there's enough demand.

@Balearica
Copy link
Member

@diogoalmiro This feature has been added in the development branch for version 4 and will be included in that release. That branch is functional at present if you would like to try it out, and is described in more detail in #662. An example has also been included to demonstrate usage.

Balearica pushed a commit that referenced this issue Oct 14, 2022
Balearica added a commit that referenced this issue Nov 25, 2022
See #662 for explanation of Tesseract.js Version 4 changes.  List below is auto-generated from commits. 

* Added image preprocessing functions (rotate + save images)

* Updated createWorker to be async

* Reworked createWorker to be async and throw errors per #654

* Reworked createWorker to be async and throw errors per #654

* Edited detect to return null when detection fails rather than throwing error per #526

* Updated types per #606 and #580 (#663) (#664)

* Removed unused files

* Added savePDF option to recognize per #488; cleaned up code for linter

* Updated download-pdf example for node to use new savePDF option

* Added OutputFormats option/interface for setting output

* Allowed for Tesseract parameters to be set through recognition options per #665

* Updated docs

* Edited loadLanguage to no longer overwrite cache with data from cache per #666

* Added interface for setting 'init only' options per #613

* Wrapped caching in try block per #609

* Fixed unit tests

* Updated setImage to resolve memory leak per #678

* Added debug output option per #681

* Fixed bug with saving images per #588

* Updated examples

* Updated readme and Tesseract.js-core version
@Balearica
Copy link
Member

Closing as this was added in Version 4.

@GitMurf
Copy link

GitMurf commented Dec 9, 2022

@Balearica I see the example is for browser but can you retrieve the created "threshold" images with Node as well? Thanks

@Balearica
Copy link
Member

@GitMurf Yes, you can also do this on Node. You should be able to adapt the browser example fairly easily.

@Balearica
Copy link
Member

@GitMurf I added an example for retrieving processed images using Node, which can be found here.

@GitMurf
Copy link

GitMurf commented Dec 14, 2022

@Balearica thanks a ton! This is exactly what I needed :) Much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants