Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detecting Text-Lines only #20

Closed
ghost opened this issue Dec 11, 2019 · 5 comments
Closed

Detecting Text-Lines only #20

ghost opened this issue Dec 11, 2019 · 5 comments

Comments

@ghost
Copy link

ghost commented Dec 11, 2019

Thank you for your hard work,

If I understand correctly, each level of segmentation is independent of the other, therefore I would recommend a flag option to select detecting text-lines only, without region nor page.
This would benefit to reduce the memory footprint, and speed up the process.

@vahidrezanezhad
Copy link
Member

vahidrezanezhad commented Dec 11, 2019

They are not completely independent. For example text region is used to get robust and clean textlines. Actually, textlines are not direct result of pixelwise textline-segmentation. After pixelwise textline-segmentation I am using some methods inside text region to detect textlines.

@ghost
Copy link
Author

ghost commented Dec 11, 2019

#19

If I want to restrict my textline rectangles to text region(masking by text region) it can worse the result of textline detection.

  • Am I missing something here? as per @wrznr examples, the text-lines detected don't adhere to the detected regions.
  • Have you compared the Text-Line detection results with and without the page & region steps?
    criteria: ( Detection rate, Memory used, Images per second, ...)

@wrznr
Copy link
Contributor

wrznr commented Dec 12, 2019

Am I missing something here?

I do not think so: As @vahidrezanezhad wrote, the segmentation of the page into lines depends on (benefits from) the segmentation of the page into regions but is not a direct consequence of this step. Setting up a completely modular workflow would therefor worsen the results. Since the coordinates of both representation levels are not consistent (text lines may reach outside of the regions they are part of), you can consider the whole tool as “detecting text lines only”. I.e. the resulting regions are merely a helper and not a result on its own.

Have you compared the Text-Line detection results with and without the page & region steps?

This would indeed make a very interesting scientific publication. I bet something like this is on their agenda. @mikegerber @vahidrezanezhad @cneud ???

Again, many thanks for making this available.

@vahidrezanezhad
Copy link
Member

vahidrezanezhad commented Dec 12, 2019

Dear @deepseek , let me explain a little about what we are doing and what is the main goal of this tool. Please do not forget that the only and main goal of this work was to provide textlines for OCR.

  • Is page extraction important? Of course it is. You can still do a great textline detection even without page extraction but as you may know we have many of documents that have part of neighboring page alongside the main page we are interested in. This cause a noise in your OCR although you have done a great textline detection!!

  • Can texline detection be done without textregions? I would say no!!
    Let me be clear here.

    • When we can do textline detection without textregion? Once you do a great pixelwise textline segmentation where all contours of each textline is isolated and detected perfectly (Kind of impossible).
    • Importance of textregions? Textregions are important for deskewing in order to find textlines. It happens often that regions are skewed in different way and doing a deskewing for whole image is not a good idea otherwise you have to do for each textregion (It is not the only reason why we need textregions).

So, make long story short. If you wish to use this tool to do a great OCR then you have to keep all those models in :)

@ghost
Copy link
Author

ghost commented Dec 12, 2019

keep up the good work

@ghost ghost closed this as completed Dec 12, 2019
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants