Some questions about rend_util.py #12

DavidXu-JJ · 2022-12-03T16:51:07Z

Hi, thank you for your decent work. I try to follow your work recently and I meet some problems which I wish to get answers from this issue.

First question:
In function load_K_Rt_from_P at line 48 in rend_util.py:

volsdf/code/utils/rend_util.py

Lines 48 to 50 in a974c88

    
           pose = np.eye(4, dtype=np.float32) 
        
           pose[:3, :3] = R.transpose() 
        
           pose[:3,3] = (t[:3] / t[3])[:,0]

This code really makes me confused and I'm not able to give an explanation to it.
I read the following code at line 78 in rend_util.py:

volsdf/code/utils/rend_util.py

Lines 73 to 78 in a974c88

    
           pixel_points_cam = lift(x_cam, y_cam, z_cam, intrinsics=intrinsics) 
        
           # permute for batch matrix product 
        
           pixel_points_cam = pixel_points_cam.permute(0, 2, 1) 
        
           world_coords = torch.bmm(p, pixel_points_cam).permute(0, 2, 1)[:, :, :3]

It seems that you use pose as a cameraToWorld matrix.
I did an experiment in advance, the following code is from stackoverflow:

k = np.array([[631,   0, 384],
              [  0, 631, 288],
              [  0,   0,   1]])
r = np.array([[-0.30164902,  0.68282439, -0.66540117],
              [-0.63417301,  0.37743435,  0.67480953],
              [ 0.71192167,  0.6255351 ,  0.3191761 ]])
t = np.array([ 3.75082481, -1.18089565,  1.06138781])

C = np.eye(4)
C[:3, :3] = k @ r
C[:3, 3] = k @ r @ t

out = cv2.decomposeProjectionMatrix(C[:3, :])

If I convert r and t into a homogeneous coordinate, then I take the R@T, which is the worldToCamera matrix. I will get:

>>> T=np.eye(4)
>>> T[:3,3]=t
>>> R=np.eye(4)
>>> R[:3,:3]=r
>>> R@T
array([[-0.30164902,  0.68282439, -0.66540117, -2.64402567],
       [-0.63417301,  0.37743435,  0.67480953, -2.10814783],
       [ 0.71192167,  0.6255351 ,  0.3191761 ,  2.27037141],
       [ 0.        ,  0.        ,  0.        ,  1.        ]])

Then if I take the inverse of R@T, which I think is the cameraToWorld matrix. I will get:

>>> np.linalg.inv((R@T))
array([[-0.30164902, -0.63417301,  0.71192166, -3.75082481],
       [ 0.6828244 ,  0.37743435,  0.6255351 ,  1.18089565],
       [-0.66540117,  0.67480953,  0.3191761 , -1.06138781],
       [ 0.        ,  0.        ,  0.        ,  1.        ]])

This result seems that, to get the cameraToWorld matrix, we should concatenate the R^(-1) and -T, instead of R^(-1) and T referred in line 31 in rend_util.py:

volsdf/code/utils/rend_util.py

Lines 48 to 50 in a974c88

    
           pose = np.eye(4, dtype=np.float32) 
        
           pose[:3, :3] = R.transpose() 
        
           pose[:3,3] = (t[:3] / t[3])[:,0]

I don't know why it takes R^(-1) and T here.

Second question:
In function lift in line 96 in rend_util.py:

volsdf/code/utils/rend_util.py

Lines 96 to 109 in a974c88

    
           def lift(x, y, z, intrinsics): 
        
               # parse intrinsics 
        
               intrinsics = intrinsics.cuda() 
        
               fx = intrinsics[:, 0, 0] 
        
               fy = intrinsics[:, 1, 1] 
        
               cx = intrinsics[:, 0, 2] 
        
               cy = intrinsics[:, 1, 2] 
        
               sk = intrinsics[:, 0, 1] 
        
               x_lift = (x - cx.unsqueeze(-1) + cy.unsqueeze(-1)*sk.unsqueeze(-1)/fy.unsqueeze(-1) - sk.unsqueeze(-1)*y/fy.unsqueeze(-1)) / fx.unsqueeze(-1) * z 
        
               y_lift = (y - cy.unsqueeze(-1)) / fy.unsqueeze(-1) * z 
        
               # homogeneous 
        
               return torch.stack((x_lift, y_lift, z, torch.ones_like(z).cuda()), dim=-1)

I don't know why the x_lift takes y and fy into consideration.
It seems that sk should be 0, but I test it in runtime and I get:

intrinsics
tensor([[[ 2.8923e+03, -2.1742e-04,  8.2320e+02,  0.0000e+00],
         [ 0.0000e+00,  2.8832e+03,  6.1907e+02,  0.0000e+00],
         [ 0.0000e+00,  0.0000e+00,  1.0000e+00,  0.0000e+00],
         [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  1.0000e+00]]],
       device='cuda:0')

It seems that sk is not 0. So the transformation becomes:

$$ \begin{bmatrix} x'\\y'\\z \end{bmatrix}= \begin{bmatrix} f_x&sk&c_x&0\\ 0&f_y&c_y&0\\ 0&0&1&0 \end{bmatrix} \begin{bmatrix} x\_lift\\y\_lift\\z\\1 \end{bmatrix} $$

Here [x,y,z,1] is the point in the camera coordinates.
I find that:

$$ x'=f_x \cdot x\_lift + sk \cdot y\_lift + c_x \cdot z $$

The actual result of x_lift is:

$$ x\_lift = \cfrac{x'-c_x \cdot z}{f_x} - sk \cdot y\_lift $$

But in rend_list.py, x_lift is like to be:

$$ x\_lift = \cfrac{(x'-c_x)\cdot z}{f_x} - sk \cdot y\_lift $$

So when z=1, the code is correct. Would it be better if it is simply changed to be:

x_lift = (x / z - cx.unsqueeze(-1) + cy.unsqueeze(-1)*sk.unsqueeze(-1)/fy.unsqueeze(-1) - sk.unsqueeze(-1)*y/fy.unsqueeze(-1)) / fx.unsqueeze(-1) * z

(/ z is added to the x)

The first question means more to me than the second question. Would you please explain the logic of pose matrix to me.

Hope this issue would help other people as well.

I try my best to express my question as clear as possible. If there's something unclear or wrong with me, please inform of me.

The text was updated successfully, but these errors were encountered:

DavidXu-JJ · 2022-12-04T06:41:07Z

The answer to the confusing Problem 1 is figured out,

volsdf/code/utils/rend_util.py

Lines 78 to 79 in a974c88

    
           world_coords = torch.bmm(p, pixel_points_cam).permute(0, 2, 1)[:, :, :3] 
        
           ray_dirs = world_coords - cam_loc[:, None, :]

Here in line 79, camera location is setting to be T vector:

volsdf/code/utils/rend_util.py

Line 63 in a974c88

cam_loc = pose[:, :3, 3]

However, the actual camera location is at -T vector. What matters in this function is the relative position between the pixel location and the camera location, so cameraToWorld matrix doesn't need to take the -T as its translation part.
I remain my opinion on Problem 2. But since it's not the crucial part, so I close this issue.
At last, I'm sorry for the annoying 'open' and 'close' of my issue.(I'm not very much familiar with the operation on the issue)
EOF

raynehe · 2023-05-16T07:40:11Z

@DavidXu-JJ Hi! Sorry to bother you. I encountered a similar problem related to DTU dataset's coordinate system convention, and I'm wondering if you know about it.

My dataset follows NeRF's coordinate system convention, that is OpenGL convention (x-axis to the right, y-axis upward, and z-axis backward along the camera’s focal axis).

My issue is, if I apply the dataset to VolSDF directly, the computed ray_dir is incorrect. I think the problem is in the rotation matrix, DTU/BlendedMVs might follow a different convention. But I couldn't find anything about the coordinate system convention of DTU dataset, do you know about this?

Thank you very much!

DavidXu-JJ · 2023-05-16T15:11:24Z

@raynehe

If I doesn't mess it up, I remember the most dataset follow OpenCV coords. Maybe you can try to simply reverse the y and z axis.
I'm sorry if my suggestion doesn't help or is wrong.

DavidXu-JJ mentioned this issue Dec 3, 2022

Some question on rend_util.py #11

Closed

DavidXu-JJ closed this as completed Dec 4, 2022

DavidXu-JJ reopened this Dec 4, 2022

DavidXu-JJ closed this as completed Dec 4, 2022

raynehe mentioned this issue May 16, 2023

DTU's coordinate system convention #18

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some questions about rend_util.py #12

Some questions about rend_util.py #12

DavidXu-JJ commented Dec 3, 2022

DavidXu-JJ commented Dec 4, 2022

raynehe commented May 16, 2023 •

edited

Loading

DavidXu-JJ commented May 16, 2023

Some questions about rend_util.py #12

Some questions about rend_util.py #12

Comments

DavidXu-JJ commented Dec 3, 2022

DavidXu-JJ commented Dec 4, 2022

raynehe commented May 16, 2023 • edited Loading

DavidXu-JJ commented May 16, 2023

raynehe commented May 16, 2023 •

edited

Loading