Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify gen_txt.py #12

Closed
philipp-schmidt opened this issue Jan 9, 2021 · 3 comments
Closed

Simplify gen_txt.py #12

philipp-schmidt opened this issue Jan 9, 2021 · 3 comments

Comments

@philipp-schmidt
Copy link
Contributor

philipp-schmidt commented Jan 9, 2021

I believe gen_txt.py is doing too much work. It should not be necessary to process the dataset for different input resolutions of the network, as the yolo input coordinates are basically UV coordinates and are relative to input image dimensions. (Unless you do letterbox rescaling I guess...)

AlexeyAB/Yolo_mark#60 (comment)

Current implementation:

def txt_line(cls, bbox, img_w, img_h):
    """Generate 1 line in the txt file."""
    assert INPUT_WIDTH > 0 and INPUT_HEIGHT > 0
    x, y, w, h = bbox
    x = max(int(x), 0)
    y = max(int(y), 0)
    w = min(int(w), img_w - x)
    h = min(int(h), img_h - y)
    w_rescaled = float(w) * INPUT_WIDTH  / img_w
    h_rescaled = float(h) * INPUT_HEIGHT / img_h
    if w_rescaled < MIN_W or h_rescaled < MIN_H:
        return ''
    else:
        cx = (x + w / 2.) / img_w
        cy = (y + h / 2.) / img_h
        nw = float(w) / img_w
        nh = float(h) / img_h
        return '%d %.6f %.6f %.6f %.6f\n' % (cls, cx, cy, nw, nh)

Implementation from https://github.com/theAIGuysCode/OIDv4_ToolKit:

# function that turns XMin, YMin, XMax, YMax coordinates to normalized yolo format
def convert(filename_str, coords):
    os.chdir("..")
    image = cv2.imread(filename_str + ".jpg")
    coords[2] -= coords[0]
    coords[3] -= coords[1]
    x_diff = int(coords[2]/2)
    y_diff = int(coords[3]/2)
    coords[0] = coords[0]+x_diff
    coords[1] = coords[1]+y_diff
    coords[0] /= int(image.shape[1])
    coords[1] /= int(image.shape[0])
    coords[2] /= int(image.shape[1])
    coords[3] /= int(image.shape[0])
    os.chdir("Label")
    return coords

Note the lack of INPUT_WIDTH and INPUT_HEIGHT.

@philipp-schmidt
Copy link
Contributor Author

philipp-schmidt commented Jan 9, 2021

Do you require this parameter to filter small objects? Or can we calculate the size of objects independent of network input size?

@jkjung-avt
Copy link
Owner

Do you require this parameter to filter small objects?

Yes. By filtering out objects that are too small, we would be asking the yolo model to learn more realistic targets. It would produce a object detector that performs better at inference time. (Don't just believe my words. You could experiment/verify it by yourself.)

@philipp-schmidt
Copy link
Contributor Author

Hi, I totally agree with you regarding the performance improvements. However, by replacing the height/width with relative targets you can omit the input dimensions of the network and prepare the dataset only once for all input resolutions, which is more convenient for MLOps pipelines.

MIN_W = 0.01
MIN_H = 0.01
#
w_rescaled = float(w) / img_w
h_rescaled = float(h) / img_h
if w_rescaled < MIN_W or h_rescaled < MIN_H:
    return ''

0.01 times 608 is roughly 6 and 0.01 times 416 is 4, so you get about the same result for both common yolov4 configurations without having to know the input dimensions in advance.

This change is very opinionated of course. I was including this in my automated data preparation workflow and wanted to share it with you in case you will automate this repo in the future. Considering it closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants