New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detecting Text-Lines only #20
Comments
They are not completely independent. For example text region is used to get robust and clean textlines. Actually, textlines are not direct result of pixelwise textline-segmentation. After pixelwise textline-segmentation I am using some methods inside text region to detect textlines. |
|
I do not think so: As @vahidrezanezhad wrote, the segmentation of the page into lines depends on (benefits from) the segmentation of the page into regions but is not a direct consequence of this step. Setting up a completely modular workflow would therefor worsen the results. Since the coordinates of both representation levels are not consistent (text lines may reach outside of the regions they are part of), you can consider the whole tool as “detecting text lines only”. I.e. the resulting regions are merely a helper and not a result on its own.
This would indeed make a very interesting scientific publication. I bet something like this is on their agenda. @mikegerber @vahidrezanezhad @cneud ??? Again, many thanks for making this available. |
Dear @deepseek , let me explain a little about what we are doing and what is the main goal of this tool. Please do not forget that the only and main goal of this work was to provide textlines for OCR.
So, make long story short. If you wish to use this tool to do a great OCR then you have to keep all those models in :) |
keep up the good work |
Thank you for your hard work,
If I understand correctly, each level of segmentation is independent of the other, therefore I would recommend a flag option to select detecting text-lines only, without region nor page.
This would benefit to reduce the memory footprint, and speed up the process.
The text was updated successfully, but these errors were encountered: