Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network Predictions and Ground Truth Segmentations Should Match in Shape #266

Closed
siavashk opened this issue Apr 28, 2019 · 2 comments
Closed

Comments

@siavashk
Copy link
Contributor

siavashk commented Apr 28, 2019

I know that this issue has been raised multiple times before. I have gone through the issues, both opened and closed, and I see that a lot of people have the same or a related question.

There are two classes of issues that are related to this:

  1. People that directly ask how to get the prediction that matches in width and height with their input. For example see prediction.shape different from input.shape #41, How to get output which is in same size as input image? #138, Question about padding #175, Output dimension #183.

  2. People that are asking about training with padding='SAME' instead of 'VALID'. They are doing this because they do not know how to properly align the prediction with the input. For example see Padding Options #93, Question about padding #175, Error: logits and labels must be broadcastable: logits_size=[1447680,2] labels_size=[0,2] #215.

There are three responses:

  1. This is expected because in the original paper it was implemented this way.

  2. Simply pad the input so that the prediction size would match the unpadded input, for example here and here.

  3. Resize the prediction to match the input as mentioned here.

The first response, while correct, is not really helpful. The second and third responses are just incorrect. Padding the input could hypothetically change the distribution of pixels in the input image, which could introduce errors into the prediction. The third solution is also wrong, because the prediction map is both downsampled and shifted (i.e. spatial translation) with respect to the input. This means that just upsampling the prediction map without accounting for the shift would result in a misalignment.

What this repository is missing is a function that is the inverse of crop_and_concat:
https://github.com/jakeret/tf_unet/blob/master/tf_unet/layers.py#L50

I am going to write this because I need it for my own research.

@siavashk siavashk changed the title Network Predictions and Ground Truth Segmentation Should Match in Shape The Network Prediction and Ground Truth Segmentation Should Match in Shape Apr 28, 2019
@siavashk siavashk changed the title The Network Prediction and Ground Truth Segmentation Should Match in Shape Network Predictions and Ground Truth Segmentations Should Match in Shape Apr 28, 2019
@siavashk
Copy link
Contributor Author

I made a mistake. The relevant piece of code is not crop_and_concat, it is actually crop_to_shape.
I added an inverse function (expand_to_shape) that pads the prediction such that it aligns with the input.

@siavashk
Copy link
Contributor Author

def expand_to_shape(data, shape, border=0):
    """
    Expands the array to the given image shape by padding it with a border (expects a tensor of shape [batches, nx, ny, channels].

    :param data: the array to expand
    :param shape: the target shape
    """
    diff_nx = shape[1] - data.shape[1]
    diff_ny = shape[2] - data.shape[2]

    offset_nx_left = diff_nx // 2
    offset_nx_right = diff_nx - offset_nx_left
    offset_ny_left = diff_ny // 2
    offset_ny_right = diff_ny - offset_ny_left

    expanded = np.full(shape, border, dtype=np.float32)
    expanded[:, offset_nx_left:(-offset_nx_right), offset_ny_left:(-offset_ny_right)] = data

    return expanded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant