-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combining word boxes into lines or paragraphs #22
Comments
Hi @mrm8488 -- I do not think it is possible to detect the end of a line directly in the detector or recognizer models. To synthesize word boxes into lines or paragraphs (as I believe you imply), the user would have to apply their own logic for stitching together the pieces. I would very much like to have some starter implementation of that logic in this repository, but just haven't had a chance to think it through and implement. That said, if you have thoughts and an approach in process, a PR for this feature would be very much appreciated! Please post back here if that's something you are interested in working on. |
Hello @mrm8488 , just to give you an idea, the predictions is a list of (text, box) tuples, where each item represents a word and its position in the image (starting from top left) |
Thank you. I was thinking doing something like that. |
I would do it like @MounaBC said. First sort the bounding boxes along the y-axis (top-bottom, highest endY value first) but then I would just categorize everything into a new line that overlaps on the y-axis. After this it's just sorting along the x-axis for every line. EDIT 5 |
hi, thanks for the code, i am trying to arrange all text in one line in ascending order
|
I don't know if I had to do some corrections later on so here is my most recent code:
I get the actual frame to analyse for text from a cv2.VideoCapture and then do the following:
|
Do you have any idea to group the boxes into blocks / paragraphs? I tried to find algorithms to do this but failed. I only succeeded in improving the line segmenter by introducing the y-distance of the two centers of the boxes. Thank you |
Is there anyway (maybe any built-in function) to detect EOL chars in a large text? Or, maybe it must be done by the client by comparing the words position vector.
Thanks in advance.
The text was updated successfully, but these errors were encountered: