Simplify gen_txt.py #12

philipp-schmidt · 2021-01-09T18:59:02Z

I believe gen_txt.py is doing too much work. It should not be necessary to process the dataset for different input resolutions of the network, as the yolo input coordinates are basically UV coordinates and are relative to input image dimensions. (Unless you do letterbox rescaling I guess...)

AlexeyAB/Yolo_mark#60 (comment)

Current implementation:

def txt_line(cls, bbox, img_w, img_h):
    """Generate 1 line in the txt file."""
    assert INPUT_WIDTH > 0 and INPUT_HEIGHT > 0
    x, y, w, h = bbox
    x = max(int(x), 0)
    y = max(int(y), 0)
    w = min(int(w), img_w - x)
    h = min(int(h), img_h - y)
    w_rescaled = float(w) * INPUT_WIDTH  / img_w
    h_rescaled = float(h) * INPUT_HEIGHT / img_h
    if w_rescaled < MIN_W or h_rescaled < MIN_H:
        return ''
    else:
        cx = (x + w / 2.) / img_w
        cy = (y + h / 2.) / img_h
        nw = float(w) / img_w
        nh = float(h) / img_h
        return '%d %.6f %.6f %.6f %.6f\n' % (cls, cx, cy, nw, nh)

Implementation from https://github.com/theAIGuysCode/OIDv4_ToolKit:

# function that turns XMin, YMin, XMax, YMax coordinates to normalized yolo format
def convert(filename_str, coords):
    os.chdir("..")
    image = cv2.imread(filename_str + ".jpg")
    coords[2] -= coords[0]
    coords[3] -= coords[1]
    x_diff = int(coords[2]/2)
    y_diff = int(coords[3]/2)
    coords[0] = coords[0]+x_diff
    coords[1] = coords[1]+y_diff
    coords[0] /= int(image.shape[1])
    coords[1] /= int(image.shape[0])
    coords[2] /= int(image.shape[1])
    coords[3] /= int(image.shape[0])
    os.chdir("Label")
    return coords

Note the lack of INPUT_WIDTH and INPUT_HEIGHT.

philipp-schmidt · 2021-01-09T19:36:27Z

Do you require this parameter to filter small objects? Or can we calculate the size of objects independent of network input size?

jkjung-avt · 2021-01-10T01:33:35Z

Do you require this parameter to filter small objects?

Yes. By filtering out objects that are too small, we would be asking the yolo model to learn more realistic targets. It would produce a object detector that performs better at inference time. (Don't just believe my words. You could experiment/verify it by yourself.)

philipp-schmidt · 2021-01-10T01:43:39Z

Hi, I totally agree with you regarding the performance improvements. However, by replacing the height/width with relative targets you can omit the input dimensions of the network and prepare the dataset only once for all input resolutions, which is more convenient for MLOps pipelines.

MIN_W = 0.01
MIN_H = 0.01
#
w_rescaled = float(w) / img_w
h_rescaled = float(h) / img_h
if w_rescaled < MIN_W or h_rescaled < MIN_H:
    return ''

0.01 times 608 is roughly 6 and 0.01 times 416 is 4, so you get about the same result for both common yolov4 configurations without having to know the input dimensions in advance.

This change is very opinionated of course. I was including this in my automated data preparation workflow and wanted to share it with you in case you will automate this repo in the future. Considering it closed.

philipp-schmidt closed this as completed Jan 10, 2021

jkjung-avt mentioned this issue Feb 4, 2021

meaning of 608 in "Prepare_data.sh" execution #18

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify gen_txt.py #12

Simplify gen_txt.py #12

philipp-schmidt commented Jan 9, 2021 •

edited

Loading

philipp-schmidt commented Jan 9, 2021 •

edited

Loading

jkjung-avt commented Jan 10, 2021

philipp-schmidt commented Jan 10, 2021

Simplify gen_txt.py #12

Simplify gen_txt.py #12

Comments

philipp-schmidt commented Jan 9, 2021 • edited Loading

philipp-schmidt commented Jan 9, 2021 • edited Loading

jkjung-avt commented Jan 10, 2021

philipp-schmidt commented Jan 10, 2021

philipp-schmidt commented Jan 9, 2021 •

edited

Loading

philipp-schmidt commented Jan 9, 2021 •

edited

Loading