Camera Optimizer has different pose adjustments when applied to Raybundle and Camera. #3073

nexuslrf · 2024-04-13T03:41:30Z

Hi developers,

I found a pretty tricky computation inconsistency issue on camera_optimizer's apply_to_* functions:

nerfstudio/nerfstudio/cameras/camera_optimizers.py

Lines 148 to 168 in babf577

    
           def apply_to_raybundle(self, raybundle: RayBundle) -> None: 
        
               """Apply the pose correction to the raybundle""" 
        
               if self.config.mode != "off": 
        
                   correction_matrices = self(raybundle.camera_indices.squeeze())  # type: ignore 
        
                   raybundle.origins = raybundle.origins + correction_matrices[:, :3, 3] 
        
                   raybundle.directions = torch.bmm(correction_matrices[:, :3, :3], raybundle.directions[..., None]).squeeze() 
        
           def apply_to_camera(self, camera: Cameras) -> torch.Tensor: 
        
               """Apply the pose correction to the world-to-camera matrix in a Camera object""" 
        
               if self.config.mode == "off": 
        
                   return camera.camera_to_worlds 
        
               assert camera.metadata is not None, "Must provide id of camera in its metadata" 
        
               if "cam_idx" not in camera.metadata: 
        
                   # Evalutaion cams? 
        
                   return camera.camera_to_worlds 
        
               camera_idx = camera.metadata["cam_idx"] 
        
               adj = self(torch.tensor([camera_idx], dtype=torch.long, device=camera.device))  # type: ignore 
        
               adj = torch.cat([adj, torch.Tensor([0, 0, 0, 1])[None, None].to(adj)], dim=1) 
        
               return torch.bmm(camera.camera_to_worlds, adj)

The rotation and translation shown in the above two functions are completely different.
Assume the original pose is [R, t], and pose adjustment in CameraOptimizer is [dR, dt]

apply_to_raybundle: simply translates the origins (t+dt), and rotates the ray directions (dR @ R @ ray_vec)
apply_to_camera: new camera is [R@dR, t+R@dt], when using this new camera to generate rays origins will be t+R@dt, directions transformation will be R @ dR @ ray_vec.

I initially wanted to evaluate the quality of camera optimization of a trained NeRF model by applying this apply_to_camera function but found such a computation mismatch. I guess these functions would also be confusing for other users (for sure 🤣

It would be better in the future version to add some caution notes and an option to switch to different forms of pose adjustments.
For example,

# raybundle_cam: bool
if raybundle_cam:
    R = torch.bmm(adj[:, :3, :3], camera.camera_to_worlds[:, :3, :3])
    t = adj[:, :3, 3:] + camera.camera_to_worlds[:, :3, 3:]
    camera.camera_to_worlds = torch.cat([R, t], dim=-1)
else:
    adj = torch.cat([adj, torch.Tensor([0, 0, 0, 1])[None, None].to(adj)], dim=1)
    camera.camera_to_worlds = torch.bmm(camera.camera_to_worlds, adj)

The text was updated successfully, but these errors were encountered:

jb-ye · 2024-04-15T17:49:27Z

Thanks for sharing the knowledge. I think the correction transform (dR, dt) should be multiply to T_camera_to_world transform from the left, which is (dR @ R, dR t + dt ). This is neither of what was implemented in apply_to_raybundle and apply_to_camera.

Think about the case when we have pose estimate of a sensor rig that contains two cameras. The two cameras have their own pose estimates subject to their fixed extrinsics T_cam1_to_cam2, such that T_cam2_to_world * T_cam1_to_cam2 = T_cam1_to_world. A shared correction transform to the pose estimate of this sensor rig must be multiplied from the left to each of its cameras.

@oseiskar Just want to learn your thoughts on this?

oseiskar · 2024-04-15T19:06:07Z

@jb-ye I did not actually check this in detail in e8e05ff, and just kept the multiplication order from the original 3DGS PR

I agree with you that it would be better to multiply camera-to-world from the left since that would also work for multi-camera rigs: If you have a single adjusment matrix for all cameras in the rig, the multiplying with that from the left would work for adjusting the pose of the rig correctly.

However, as long as there are no such rig constraints but all cameras are optimized independently, then all of the versions discussed in this thread are OK. They are just different ways to parametrize adjustments to a SE(3) pose.

The main problem in the current version seems to be that the NeRF (ray bundle) and 3DGS use different interpretations for the parameters, which is not nice since it breaks in cases like @nexuslrf 's test.

It might be safest to fix the 3DGS version to match the NeRF one, since at least for 3DGS, all versions should be fine, but for NeRFs, there could be more performance implications (just guessing here, though). If support for multi-camera setups is added, then both version should probably be fixed to use the multiply-from-the-left form.

brentyi · 2024-04-15T22:53:07Z

Thanks everyone for the discussion! Some quick thoughts.

The reason for the original apply_to_raybundle() implementation was likely runtime: the reconstruction quality for all of these implementations (left-multiply, right-multiply, the current behavior) should all be basically the same, but (dR @ R, t + dt) (what's implemented) has one less matrix multiplication than (dR @ R, dR @ t + dt). I recall some tests being run and the overhead being nontrivial.

Assuming this is true, one suggestion is to keep the current behavior for NeRFs, but match the splatfacto behavior for consistency and document it better: it's a separate rotation around the camera frame origin + translation in world coordinates.¹

Think about the case when we have pose estimate of a sensor rig that contains two cameras. The two cameras have their own pose estimates subject to their fixed extrinsics T_cam1_to_cam2, such that T_cam2_to_world * T_cam1_to_cam2 = T_cam1_to_world. A shared correction transform to the pose estimate of this sensor rig must be multiplied from the left to each of its cameras.

If we want to switch to a single rigid transform, between left- vs right- multiplied my very weak personal bias is actually for right-multiplied, mostly because it seems like the standard in packages like Ceres and gtsam. I don't find the multi-sensor rig case super convincing, since (a) you have to write custom code anyways and (b) right-multiplication is more convenient for the complementary case where you have good global rig poses but don't trust per-sensor calibrations relative to the rig.

Edit: on a second read this is exactly what @oseiskar suggested in the comment above. sorry for not reading more thoroughly! ↩

oseiskar · 2024-04-16T06:44:32Z

If we want to switch to a single rigid transform, between left- vs right- multiplied my very weak personal bias is actually for right-multiplied, mostly because it seems like the standard in packages like Ceres and gtsam. I don't find the multi-sensor rig case super convincing, since (a) you have to write custom code anyways and (b) right-multiplication is more convenient for the complementary case where you have good global rig poses but don't trust per-sensor calibrations relative to the rig.

I would like to add that being able to handle all relevant cases with a single rigid transform, which is either left or right multiplication is not a very sustainable goal. In a system that can deal with cases such as multi-camera rigs and possibly extrinsic calibration correction, you will anyway end up simultaneously transforming the poses from both left (tune the pose of the rig) and right (fixed or adjustable transform within the local rig coordinates) and some of these transforms will be shared between multiple cameras.

... and, without multi-camera support, it does not really matter so I also support this:

Assuming this is true, one suggestion is to keep the current behavior for NeRFs, but match the splatfacto behavior for consistency and document it better: it's a separate rotation around the camera frame origin + translation in world coordinates.

Also allow exporting Nerfacto-optimized poses. Fixes nerfstudio-project#3073.

Also allow exporting Nerfacto-optimized poses. Fixes nerfstudio-project#3073. Second revision where the rotation and translation parts are handled separately.

oseiskar mentioned this issue Apr 28, 2024

Export Splatfacto-optimized poses #3116

Merged

oseiskar added a commit to SpectacularAI/nerfstudio that referenced this issue May 26, 2024

Make Splatfacto pose adjustments consistent with Nerfacto

8b13f2d

Also allow exporting Nerfacto-optimized poses. Fixes nerfstudio-project#3073.

oseiskar added a commit to SpectacularAI/nerfstudio that referenced this issue May 26, 2024

Make Splatfacto pose adjustments consistent with Nerfacto

f1a5210

Also allow exporting Nerfacto-optimized poses. Fixes nerfstudio-project#3073.

jb-ye closed this as completed in 5491df9 May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Camera Optimizer has different pose adjustments when applied to Raybundle and Camera. #3073

Camera Optimizer has different pose adjustments when applied to Raybundle and Camera. #3073

nexuslrf commented Apr 13, 2024

jb-ye commented Apr 15, 2024 •

edited

oseiskar commented Apr 15, 2024

brentyi commented Apr 15, 2024 •

edited

oseiskar commented Apr 16, 2024

Camera Optimizer has different pose adjustments when applied to Raybundle and Camera. #3073

Camera Optimizer has different pose adjustments when applied to Raybundle and Camera. #3073

Comments

nexuslrf commented Apr 13, 2024

jb-ye commented Apr 15, 2024 • edited

oseiskar commented Apr 15, 2024

brentyi commented Apr 15, 2024 • edited

Footnotes

oseiskar commented Apr 16, 2024

jb-ye commented Apr 15, 2024 •

edited

brentyi commented Apr 15, 2024 •

edited