Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combining word boxes into lines or paragraphs #22

Open
mrm8488 opened this issue Jan 17, 2020 · 7 comments
Open

Combining word boxes into lines or paragraphs #22

mrm8488 opened this issue Jan 17, 2020 · 7 comments
Labels
enhancement New feature or request

Comments

@mrm8488
Copy link

mrm8488 commented Jan 17, 2020

Is there anyway (maybe any built-in function) to detect EOL chars in a large text? Or, maybe it must be done by the client by comparing the words position vector.
Thanks in advance.

@faustomorales faustomorales changed the title Query about end of line Combining word boxes into lines or paragraphs Jan 19, 2020
@faustomorales
Copy link
Owner

Hi @mrm8488 -- I do not think it is possible to detect the end of a line directly in the detector or recognizer models.

To synthesize word boxes into lines or paragraphs (as I believe you imply), the user would have to apply their own logic for stitching together the pieces. I would very much like to have some starter implementation of that logic in this repository, but just haven't had a chance to think it through and implement. That said, if you have thoughts and an approach in process, a PR for this feature would be very much appreciated! Please post back here if that's something you are interested in working on.

@MounaBC
Copy link

MounaBC commented Jan 20, 2020

Hello @mrm8488 , just to give you an idea, the predictions is a list of (text, box) tuples, where each item represents a word and its position in the image (starting from top left)
box is an array of 4 items, each one of them representing a border of the word's box (its X and Y). Its structure is [[startX,startY], [endX,startY], [endX,endY], [startX, endY]]
With those information, you can organise your text the way you want to, it depends a lot on how it is in your images , the size of your images ...
For example, if your image has straight lines of text, you can first sort your results by ascending Y. Then, depending on the size of your image and text, you can define a threshold on the difference of Y coordiantes to separate your lines
Once you have that, you just sort by ascending X for each line.

@mrm8488
Copy link
Author

mrm8488 commented Jan 20, 2020

Thank you. I was thinking doing something like that.

@Johndirr
Copy link

Johndirr commented Feb 9, 2020

I would do it like @MounaBC said. First sort the bounding boxes along the y-axis (top-bottom, highest endY value first) but then I would just categorize everything into a new line that overlaps on the y-axis. After this it's just sorting along the x-axis for every line.

EDIT 5
I posted my approach here: https://stackoverflow.com/a/60684094/5459124

@rawat123
Copy link

rawat123 commented May 28, 2020

I would do it like @MounaBC said. First sort the bounding boxes along the y-axis (top-bottom, highest endY value first) but then I would just categorize everything into a new line that overlaps on the y-axis. After this it's just sorting along the x-axis for every line.

EDIT 5
I posted my approach here: https://stackoverflow.com/a/60684094/5459124

hi, thanks for the code, i am trying to arrange all text in one line in ascending order
i am using below code, my image has multi line text
box ([x1,y1], [x2,y2] , [x3,y3], [x4,y4])
while executing the below code i got error after it arranges two boxes in same line
IndexError: arrays used as indices must be of integer (or boolean) type

import numpy as np
def isOnSameLine(boxOne, boxTwo):
    print(boxOne)
    print(boxTwo)
    boxOneStartY = boxOne[0, 1]
    boxOneEndY = boxOne[2, 1]
    boxTwoStartY = boxTwo[0, 1]
    boxTwoEndY = boxTwo[2, 1]
    if ((boxTwoStartY <= boxOneEndY and boxTwoStartY >= boxOneStartY)
            or (boxTwoEndY <= boxOneEndY and boxTwoEndY >= boxOneStartY)
            or (boxTwoEndY >= boxOneEndY and boxTwoStartY <= boxOneStartY)):
        return True
    else:
        return False
# list of indexes
temp = []
i = 0
box_groups = np.array([[[292.17706, 10.344554], [431.73145, 15.781749], [427.96115, 112.55261],
                        [288.40674, 107.11542]],
                       [[292.17706, 10.344554], [431.73145, 15.781749], [427.96115, 112.55261],
                        [288.40674, 107.11542]],
                       [[104.10318, 25.434502], [251.24907, 17.586721], [256.33423, 112.93329],
                        [109.18835, 120.78107]],
                       [[191.8359, 116.40875], [472.261, 113.45691], [473.46985, 228.30032],
                        [193.04477, 231.25217]]])
# TODO: check if there is more than one box_group
sorted_box_group = [4, 4]
while i < len(box_groups):
    for j in range(i + 1, len(box_groups)):
        if (isOnSameLine(box_groups[i], box_groups[j])):
            print(str(i) + " and " + str(j) + " on same line")
            if i not in temp:
                temp.append(i)
            if j not in temp:
                temp.append(j)
        else:
            print(str(i) + " and " + str(j) + " not on same line")
        # append temp with i if the current box (i) is not on the same line with any other box
        if len(temp) == 0:
            temp.append(i)
    print("-----------------")
    print(temp)
    print(sorted_box_group)
    # put boxes on same line into lined_box_group array
    lined_box_group = box_groups[np.array(temp)]
    # sort boxes by startX value
    lined_box_group = lined_box_group[np.argsort(lined_box_group[:, 0, 0])]
    # copy sorted boxes on same line into sorted_box_group
    print(i)
    print(temp[-1] + 1)
    sorted_box_group[i:temp[-1] + 1] = lined_box_group
    # skip to the index of the box that is not on the same line
    i = temp[-1] + 1
    # clear list of indexes
    temp = []

@Johndirr
Copy link

I don't know if I had to do some corrections later on so here is my most recent code:

def isOnSameLine(boxOne, boxTwo):
    boxOneStartY = boxOne[0,1]
    boxOneEndY = boxOne[2,1]
    boxTwoStartY = boxTwo[0,1]
    boxTwoEndY = boxTwo[2,1]
    if((boxTwoStartY <= boxOneEndY and boxTwoStartY >= boxOneStartY)
    or(boxTwoEndY <= boxOneEndY and boxTwoEndY >= boxOneStartY)
    or(boxTwoEndY >= boxOneEndY and boxTwoStartY <= boxOneStartY)):
        return True
    else:
        return False

def segmentLines(box_group):
    # sort by highest starty value (bottom left corner of box - [startX,startY], [endX,startY], [endX,endY], [startX, endY])
    box_group = box_group[np.argsort(box_group[:, 0, 1])]

    lined_box_group = np.zeros(box_group.shape)
    sorted_box_group = np.zeros(box_group.shape)

    # list of indexes
    temp = []
    i = 0

    # check if there is more than one box in the box_group
    if len(box_group) > 1:
        while i < len(box_group):
            for j in range(i + 1, len(box_group)):
                if(isOnSameLine(box_group[i],box_group[j])):
                    # print(str(i) + " and " + str(j) + " on same line")
                    if i not in temp:
                        temp.append(i)
                    if j not in temp:
                        temp.append(j)
                # else:
                    # print(str(i) + " and " + str(j) + " not on same line")
            # append temp with i if the current box (i) is not on the same line with any other box
            if len(temp) == 0:
                temp.append(i)
            
            # put boxes on same line into lined_box_group array
            lined_box_group = box_group[np.array(temp)]
            # sort boxes by startX value
            lined_box_group = lined_box_group[np.argsort(lined_box_group[:, 0, 0])]
            # copy sorted boxes on same line into sorted_box_group
            sorted_box_group[i:temp[-1]+1] = lined_box_group
            
            # skip to the index of the box that is not on the same line
            i = temp[-1] + 1
            # clear list of indexes
            temp = []
    else:
        # since there is only one box in the boxgroup do nothing but copying the box
        # print("only one box in boxgroup")
        sorted_box_group = box_group
        
    return sorted_box_group

I get the actual frame to analyse for text from a cv2.VideoCapture and then do the following:

box_groups = detector.detect([frame])
box_group = box_groups[0]

if len(box_group) > 0:
            # sort bounding boxes into lines
            sorted_box_group = segmentLines(box_group)
            # recognize text
            recognizedtext = recognizer.recognize_from_boxes([frame], [sorted_box_group])
            text = " ".join(recognizedtext[0])
        else:
            text = ""

@faustomorales faustomorales added the enhancement New feature or request label Aug 16, 2020
@Duv54
Copy link

Duv54 commented Sep 11, 2020

Do you have any idea to group the boxes into blocks / paragraphs? I tried to find algorithms to do this but failed. I only succeeded in improving the line segmenter by introducing the y-distance of the two centers of the boxes.

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants