Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fill arg and _apply_grid_transform improvements #6517

Open
vfdev-5 opened this issue Aug 30, 2022 · 2 comments · May be fixed by #8099
Open

Fill arg and _apply_grid_transform improvements #6517

vfdev-5 opened this issue Aug 30, 2022 · 2 comments · May be fixed by #8099

Comments

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Aug 30, 2022

Few years ago we introduced non-const fill value handling in _apply_grid_transform using mask approach:

# Append a dummy mask for customized fill colors, should be faster than grid_sample() twice
if fill is not None:
dummy = torch.ones((img.shape[0], 1, img.shape[2], img.shape[3]), dtype=img.dtype, device=img.device)
img = torch.cat((img, dummy), dim=1)
img = grid_sample(img, grid, mode=mode, padding_mode="zeros", align_corners=False)
# Fill with required color
if fill is not None:
mask = img[:, -1:, :, :] # N * 1 * H * W
img = img[:, :-1, :, :] # N * C * H * W
mask = mask.expand_as(img)
len_fill = len(fill) if isinstance(fill, (tuple, list)) else 1
fill_img = torch.tensor(fill, dtype=img.dtype, device=img.device).view(1, len_fill, 1, 1).expand_as(img)
if mode == "nearest":
mask = mask < 0.5
img[mask] = fill_img[mask]
else: # 'bilinear'
img = img * mask + (1.0 - mask) * fill_img

There are few minor problems with this approach:

  1. if we pass fill = [0.0, ], we would expect to have a similar result as fill=None. This is not exactly true for bilinear interpolation mode where we do linear interpolation:
    else: # 'bilinear'
    img = img * mask + (1.0 - mask) * fill_img

Most probably, we would like to skip fill_img creation for all fill values that has sum(fill) == 0 as grid_sample pads with zeros.

- if fill is not None:
+ if fill is not None and sum(fill) > 0:
  1. Linear fill_img and img interpolation may be replaced by directly applying a mask:
         mask = mask < 0.9999
         img[mask] = fill_img[mask] 

That would match better PIL Image behaviour.

else: # 'bilinear'
img = img * mask + (1.0 - mask) * fill_img

image

cc @datumbox

@pmeier
Copy link
Collaborator

pmeier commented Nov 7, 2023

Since we have another report in #8083, do we want to tackle this? IMO, we should just align the two branches

if mode == "nearest":
bool_mask = mask < 0.5
float_img[bool_mask] = fill_img.expand_as(float_img)[bool_mask]
else: # 'bilinear'
# The following is mathematically equivalent to:
# img * mask + (1.0 - mask) * fill = img * mask - fill * mask + fill = mask * (img - fill) + fill
float_img = float_img.sub_(fill_img).mul_(mask).add_(fill_img)

with something like

bool_mask = mask < 1
float_img[bool_mask] = fill_img.expand_as(float_img)[bool_mask] 

This removes the blending and in turn the "shadow" for bilinear interpolation. Plus, this is equivalent for nearest interpolation, since the mask in that case only contains 0.0 and 1.0 entries.

@vfdev-5
Copy link
Collaborator Author

vfdev-5 commented Nov 7, 2023

@pmeier the value 0.9999 for mask was sort of on purpose. In the description example affine rotation by 50 degrees with bilinear mode creates a rotated mask with unique values:

tensor([0.00000000, 0.02883029, 0.02883148, 0.10955429, 0.10955477, 0.11125469,
         0.11125565, 0.19197845, 0.19197917, 0.19367909, 0.19367981, 0.27440262,
         0.27440357, 0.35512805, 0.35512924, 0.35682678, 0.35682797, 0.43755341,
         0.43755519, 0.43925095, 0.43925512, 0.51997960, 0.51998138, 0.60240537,
         0.60240555, 0.68312985, 0.68313217, 0.68482971, 0.68482977, 0.76555562,
         0.76555634, 0.76725388, 0.76725554, 0.84798002, 0.84798050, 0.92870331,
         0.92870587, 0.93040466, 0.93040580, 0.99999994, 1.00000000]))

and 0.99999994 can appear inside the mask:

plt.imshow(((mask > 0.999) & (mask < 1.0))[0, 0, ...], interpolation="none")

image

so, using mask < 1 gives:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants