Skip to content

downsampling with grid_sample doesn't match interpolate #21457

@fmassa

Description

@fmassa

I'm taking this discussion from #20785 (comment) as it is unrelated to grid_sample not being aligned.


There is also another point that I'd like to make about grid_sample, which is that even after we make this change to make it aligned, it won't match 1:1 with interpolate in some cases.

Indeed, if we use grid_sample to downsample an image using bilinear interpolation, it will always take the 4 closest pixels that correspond to the neighbors in the image space. This means that for large downsampling factors, this will make the bilinear interpolation look almost like a nearest neighbor interpolation.

Here is where this is defined

int ix_ne = ix_nw + 1;
int iy_ne = iy_nw;
int ix_sw = ix_nw;
int iy_sw = iy_nw + 1;
int ix_se = ix_nw + 1;
int iy_se = iy_nw + 1;

We might potentially want to replace the +1 with the corresponding image offset after a lookup in the coordinates of the next elements in the grid, to have smoother interpolation. But I haven't thought about it throughly, and we might be overfitting to a particular case. In particular, I don't know how what I proposed just above would behave with grids having holes.


Comment from @bnehoran from #20785 (comment) :

I am not sure I understand how what you are suggesting would work for a more general grid.
First of all, if I understand what you are suggesting, you are proposing to allow grid_sample to also do the job of upsampling and/or downsampling tensors, right?
Just to avoid confusion, as you mention, this is different than the point made in this issue, which is more about using the two functionalities in conjunction with one another (that is, using grid_sample on a tensor that has been upsampled or downsampled using interpolate), but incidentally, it seems that these changes should also cover the case of using gird_sample to upsample/downsample, for the most part.

I agree that calling grid_sample on a tensor using an identity grid of larger size than the tensor should have the same effect as upsampling it using interpolate with bi/tri/linear modes, and I think these changes should do that. Similarly, attempting to downsample a tensor by using grid_sample with an identity grid that is smaller than the tensor, would bi/tri/linearly interpolate between the nearest whole pixels (note: rather than average pooling over the nearby area), which I believe should also be equivalent to the bi/tri/linear modes of interpolate. I think this equivalence is what you meant to point out, right?
The nearest modes should similarly become equivalent.

There would just be no equivalent to the area mode of interpolate. However, it's not immediately clear to me how such a mode should really behave for grid_sample. If the identity grid is evenly spaced, then it is all well, since the rectangular areas that each grid pixel covers are well defined. However, as soon as the grid starts deviating significantly from the identity (for example, in an extreme case, suppose that multiple grid points land on the same location of the sampled tensor), how do you then define the "area" covered by each grid point?
Maybe there does exist some way to define it that is both natural and consistent, but at the very least, it seems non-trivial.

cc @fmassa @vfdev-5

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: interpolationmodule: visiontriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions