You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tesseract.js version (version number for npm/GitHub release, or specific commit for repo)
tesseract.js: 4.1.0
react: 18.2.0
npm: 8.5.0
Describe the bug
I'm doing my thesis about OCR scanners and I found this tesseract,js. I wanted to create some examples to my professor and I found this issue (or maybe I did not configure it that correctly). I think tesseract does not recognize the new paragraphs. I don't know how to configure it correctly or that's a bug, but I really need your help.
(Sorry if I was confusing)
To Reproduce
Steps to reproduce the behavior:
Choose the image to work with
Wait for the results
Please attach any input image required to replicate this behavior.
Result:
Example image:
Expected behavior
It should give me a huge text output
It looks like this is an issue with Tesseract rather than any code specific to Tesseract.js or your implementation.
Tesseract uses a binarization algorithm provided by the Leptonica library by default (there are some configuration options related to binarization that you can look up in the Tesseract repo/documentation). This example shows how to access the intermediate images used by Tesseract. For your image, the binarization algorithm performs extremely poorly with the text being almost entirely erased--I've attached the binarized image.
I do not have a strong understanding of the Leptonica binarization algorithm bused by default, but the fact that the text is very light gray compared to some of the darker blacks found in the image appears to be what is tripping it up. To demonstrate, removed the image, and the text no longer disappears during the binarization step.
If you use the version with the image removed, Tesseract will spit out some text. I do not expect it to be accurate--I can hardly make out the words in the input image personally, so do not expect Tesseract to be able to recognize it accurately.
One final small thing--the worker.load function can be removed. That was necessary in old versions but does not do anything in v4 and above.
Tesseract.js version (version number for npm/GitHub release, or specific commit for repo)
tesseract.js: 4.1.0
react: 18.2.0
npm: 8.5.0
Describe the bug
I'm doing my thesis about OCR scanners and I found this tesseract,js. I wanted to create some examples to my professor and I found this issue (or maybe I did not configure it that correctly). I think tesseract does not recognize the new paragraphs. I don't know how to configure it correctly or that's a bug, but I really need your help.
(Sorry if I was confusing)
To Reproduce
Steps to reproduce the behavior:
Please attach any input image required to replicate this behavior.
![image](https://private-user-images.githubusercontent.com/84982671/243287597-51b0ae67-0511-4c52-975e-0193fa513fd7.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjA2NjU3ODcsIm5iZiI6MTcyMDY2NTQ4NywicGF0aCI6Ii84NDk4MjY3MS8yNDMyODc1OTctNTFiMGFlNjctMDUxMS00YzUyLTk3NWUtMDE5M2ZhNTEzZmQ3LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MTElMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzExVDAyMzgwN1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTIwOWIxMjIzZDEzNGIzMmRiNjk1MDE4ODQzNzdjYzIzMGI1YTljNWUwOTk5MzcyYzk1NTdkOGNkODRkOGFkZTQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.i92tBAJ3uRAookfSwZEgBfPQ7q5fDp0seCSqmBSvoQM)
![image](https://private-user-images.githubusercontent.com/84982671/243287697-be01d340-e111-4c83-9b27-54de3c7d724f.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjA2NjU3ODcsIm5iZiI6MTcyMDY2NTQ4NywicGF0aCI6Ii84NDk4MjY3MS8yNDMyODc2OTctYmUwMWQzNDAtZTExMS00YzgzLTliMjctNTRkZTNjN2Q3MjRmLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MTElMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzExVDAyMzgwN1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTQyNjk1ODNlNWM5MDk2ZDcwYzEwM2FiNzQ0NzRjYzMzNGRjNjk5OGU0MDFjYzNiOWY4MzcyODkwYzc3NDY0YTQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.lwN-SiPbZ8zU43TNM4B1OC-dP6FOVyx3QkqWPrbqRe4)
Result:
Example image:
Expected behavior
It should give me a huge text output
Device Version:
Code:
The text was updated successfully, but these errors were encountered: