Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sampling for feature vector vs ground truth mesh #39

Closed
gordon-lim opened this issue Jun 29, 2020 · 8 comments
Closed

Sampling for feature vector vs ground truth mesh #39

gordon-lim opened this issue Jun 29, 2020 · 8 comments

Comments

@gordon-lim
Copy link

gordon-lim commented Jun 29, 2020

Hi. I am having some trouble with my understanding. I hope you can enlighten and will truly appreciate it! You use spatial sampling with the ground truth mesh. Does this not mean you have a ground truth 3d occupancy field that is incomplete? For corresponding pixels without a groundtruth inside/outside prediction, will their feature vectors and z-values still be fed through PIFu?

I noticed in an earlier paragraph that you mentioned the use of bilinear sampling to obtain the feature vectors. How is the purpose of this sampling different from spatial sampling used with the ground truth meshes?

I also checked out the script you attached in the previous issue.
surface_points, _ = trimesh.sample.sample_surface(mesh, 4 * self.num_sample_inout)
sample_points = surface_points + np.random.normal(scale=self.opt.sigma, size=surface_points.shape)
Is this above spatial sampling? I do not see a direct connection to the spatial sampling described in the paper so I just want to confirm.

Thank you for you patience. I have picked this paper to try and learn as much as possible. Hopefully I'm not annoying you. I look forward to your reply.

@shunsukesaito
Copy link
Owner

I guess you are confused between sampling of 3d points and sampling of image features. What you mentioned above is basically sampling of 3d points for training. During training, we sample points around ground truth meshes, obtain GT occupancy labels in the data loader, and supervise PIFu prediction with these labels. We found that the final reconstruction quality is highly influenced by this sampling strategy. Please refer to the supplemental material for our ablation study on this.

And image feature sampling (bilinear sampling) has nothing to do with this ground truth data sampling above. In PIFu, we query scalar/vector fields at arbitrary points. To do so, we combine localized image feature based on 2D camera projection of 3d points and z value. The bilinear sampling appears in this image feature extraction process. Please refer to the query function in

def query(self, points, calibs, transforms=None, labels=None):

for details.

@gordon-lim
Copy link
Author

There's a comment under the query method that says:

Image features should be pre-computed before this call.

So it seems this isn't the code for the image feature extraction. I also could not match the code with what I found online with regards to bilinear resampling.

I did a search on bilinear sampling and got results for bilinear interpolation and bilinear resampling that got to do with how pixels are filled/removed when making images bigger or smaller. Is this relevant? I recognise that you are using a continuous space instead of pixels. Is bilinear sampling used to get a "pixel value" where the coordinate is not originally on a pixel?

@shunsukesaito
Copy link
Owner

Note that "computation" of image features through fully convoulutional networks and "extraction" of this feature are different steps.

I did a search on bilinear sampling and got results for bilinear interpolation and bilinear resampling that got to do with how pixels are filled/removed when making images bigger or smaller. Is this relevant?

Algorithm-wise bilinear interpolation and sampling do similar things, but the focus of bilinear sampling is to extract pixel values based on non-discretized pixel coordinates (already normalized to [-1, 1]) using the bilinear interpolation scheme. So resizing is just one application of it but can be used in various scenarios like PIFu.

I recognise that you are using a continuous space instead of pixels. Is bilinear sampling used to get a "pixel value" where the coordinate is not originally on a pixel?

I think you are right. You can refer to https://en.wikipedia.org/wiki/Bilinear_interpolation#:~:text=Bilinear%20interpolation%20is%20performed%20using,quadratic%20in%20the%20sample%20location.
to get a better sense of how these non-discretized coordinates are used to extract "pixel value".

@gordon-lim
Copy link
Author

I have checked out the wikipedia page.

I'm not sure if I'm oversimplifying things but... if a non-discretized coordinate falls within the aligned pixel, why not just take that aligned pixel's value? What's the need for billinear sampling then?

@shunsukesaito
Copy link
Owner

That's an option too (if you use mode='nearest', that's exactly what you said). However, this way reconstruction tends to be blocky as the feature sampling is discontinuous between pixels. Bilinear sampling makes it C0 continuous to alleviate these blocky artifacts.

@gordon-lim
Copy link
Author

use mode='nearest'
Did you use nn.Upsample ?

@shunsukesaito
Copy link
Owner

No. Please take a look at here.

samples = torch.nn.functional.grid_sample(feat, uv, align_corners=True) # [B, C, N, 1]

@gordon-lim
Copy link
Author

Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants