Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a way to set "Init Only" parameters (user_word_suffix, etc.) #613

Closed
clnoel opened this issue May 16, 2022 · 3 comments
Closed

Add a way to set "Init Only" parameters (user_word_suffix, etc.) #613

clnoel opened this issue May 16, 2022 · 3 comments

Comments

@clnoel
Copy link

clnoel commented May 16, 2022

I would like to use the command-line parameters "user_word_suffix", "load_freq_dawg", and "load_system_dawg". After sorting through a lot of documentation, and looking through a lot of code, I realized that these are "init only" parameters. In the TessBaseAPI code, they need to be passed to Init(), either as a set of keys/values or in a config file. Setting the parameters after initialization doesn't work because the traineddata files have already been read and the dictionaries formed.

Suggested fix:
Add a config filename optional parameter (string) to worker.Initialize(...) that gets passed to api.Init(...).

Other fixes:
Add a worker.SetInitParameters() function, just like worker.SetParameters(), that must be called before worker.Initialize, and pass those keys/values to api.Init().
Add a "initParams" optional parameter to worker.Initialize, which contains key/value pairs that get passed to api.Init

I'm suggesting the config file option because it feels like the least work to get the desired result.

@Balearica
Copy link
Collaborator

I looked it up, and it seems like the init-only parameters are few in number and relatively fringe (notably disabling various dictionaries). However, I agree that it would be nice to have some way for advanced users to specify a config file (like is possible on desktop). I will look into whether this can be easily added in a future release.

@Balearica
Copy link
Collaborator

This feature has been added to the dev/v4 branch, and will be released with version 4. If you would like to test before then, instructions are in #662.

To easily verify that these options are indeed being set, I am attaching a test image with significantly different results for the legacy model (oem: "0") depending on whether load_number_dawg is enabled.

number_test2

Results with load_number_dawg: "1"

1823747 72460000
271.83 1223.00
3164675.10 1512895284

Results with load_number_dawg: "0"

18237.47 724600.00
271.83 1223.00
3164675.10 15128952.84

Balearica added a commit that referenced this issue Nov 25, 2022
See #662 for explanation of Tesseract.js Version 4 changes.  List below is auto-generated from commits. 

* Added image preprocessing functions (rotate + save images)

* Updated createWorker to be async

* Reworked createWorker to be async and throw errors per #654

* Reworked createWorker to be async and throw errors per #654

* Edited detect to return null when detection fails rather than throwing error per #526

* Updated types per #606 and #580 (#663) (#664)

* Removed unused files

* Added savePDF option to recognize per #488; cleaned up code for linter

* Updated download-pdf example for node to use new savePDF option

* Added OutputFormats option/interface for setting output

* Allowed for Tesseract parameters to be set through recognition options per #665

* Updated docs

* Edited loadLanguage to no longer overwrite cache with data from cache per #666

* Added interface for setting 'init only' options per #613

* Wrapped caching in try block per #609

* Fixed unit tests

* Updated setImage to resolve memory leak per #678

* Added debug output option per #681

* Fixed bug with saving images per #588

* Updated examples

* Updated readme and Tesseract.js-core version
@Balearica
Copy link
Collaborator

Closing as this was added in Version 4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants