Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about projection #16

Open
yejr0229 opened this issue Jan 21, 2024 · 12 comments
Open

question about projection #16

yejr0229 opened this issue Jan 21, 2024 · 12 comments

Comments

@yejr0229
Copy link

yejr0229 commented Jan 21, 2024

Hi,I want to project the world_src_pts onto 2D image plane,and I use the projection() function from your previous work SHERF,and the projected 2d points seems like wrong,could you help me solve this?Following is my code:

src_uv = projection(world_src_pts.reshape(bs, -1, 3), camera_R, camera_T, camera_K) # [bs, N, 6890, 3]
src_uv = src_uv.view(-1, *src_uv.shape[2:])

Here camera_K is the camera intrinsic'),camera_R is the ['R'] in smpl_param,camera_T is the ['Th'] in smpl_param.

@yejr0229
Copy link
Author

This is the world_src_pts:
world_src_pts

@yejr0229
Copy link
Author

This is my projected 2d points:
src_uv

@yejr0229
Copy link
Author

Then I tried to add a translation of 400 on the 2D coordinates and obtained the following result:
src_uv+400

@skhu101
Copy link
Owner

skhu101 commented Jan 22, 2024

Hi, thanks for your interest in our work. You can put the following code in train.py to get the projection results.

RT = torch.cat([torch.tensor(viewpoint_cam.R.transpose()), torch.tensor(viewpoint_cam.T).reshape(3,1)], -1)[None, None].cuda()
xyz = torch.repeat_interleave(torch.tensor(viewpoint_cam.world_vertex)[None, None], repeats=RT.shape[1], dim=1) #[bs, view_num, , 3]
xyz = torch.matmul(RT[:, :, None, :, :3].float(), xyz[..., None].float()) + RT[:, :, None, :, 3:].float()
xyz = torch.matmul(torch.tensor(viewpoint_cam.K)[None, None][:, :, None].float().cuda(), xyz)[..., 0]
xy = xyz[..., :2] / (xyz[..., 2:] + 1e-5)
src_uv = xy.view(-1, *xy.shape[2:])

test_image = gt_image.clone().permute(1,2,0)
test_image[src_uv[0,:,1].type(torch.LongTensor), src_uv[0,:,0].type(torch.LongTensor)] = 1
imageio.imwrite(f'vertex_img.png', (255*test_image).cpu().numpy().astype(np.uint8))

@yejr0229
Copy link
Author

Thanks a lot!And I'd like to know what coordinate system is mean3D defined in,in world or in smpl?

@skhu101
Copy link
Owner

skhu101 commented Jan 23, 2024

In world coordinate.

@yejr0229
Copy link
Author

Thanks.I found that the evaluation metrics are calculated within the entire range of the image, and I want to only calculate psnr, ssim, and lpips within the mask range of the human body,could you help me solve this?

@skhu101
Copy link
Owner

skhu101 commented Jan 23, 2024

Hi, you can calculate the psnr, ssim, and lpips with the help of a bounding box mask.

@yejr0229
Copy link
Author

Thanks for replying,I'd like to calculate the psnr, ssim, and lpips inside human mask(denote as bkgd_mask in your code),I found this code in render.py:

rendering.permute(1,2,0)[bound_mask[0]==0] = 0 if background.sum().item() == 0 else 1

Can I replace bound_mask to bkgd_mask so that I will achieve my purpose?
By the way,I'm a little confused about the code.In my opinion,it seems like gauhuamn is still calculate the metrics within whole image since the zero pixels are not been removed,I think we need to flatten the image and mask,and only preserve the pixels within mask,than the metrics are computed under mask.

@skhu101
Copy link
Owner

skhu101 commented Jan 25, 2024

Thanks for your question. In 3D human reconstruction, we learn both the 3D human part and also the background part. By following the routine of HumanNeRF papers, we can either calculate the metric based on the whole image or the image cropped by a bound box mask.

@yejr0229
Copy link
Author

Thanks a lot!Additionally, I‘d like to ask about the requirements for training perspectives and the number of training images in Gauhuman. I found Gauhuman selected training view as [4] and sampled a total of 100 images per 5 frames. In order to compare with other methods, I modified the training setting and set the training view as [0].,and 570 continuous images were taken for training, but the results were very poor. Why did the result drop so baddly?

@skhu101
Copy link
Owner

skhu101 commented Apr 3, 2024

Hi, we follow the setting of instant-nvr for performance comparison. Is the performance drop consistent for both the ZJU_MoCap and MonoCap data sets?

@skhu101 skhu101 closed this as completed Apr 12, 2024
@skhu101 skhu101 reopened this Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants