Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Camera Optimizer has different pose adjustments when applied to Raybundle and Camera. #3073

Closed
nexuslrf opened this issue Apr 13, 2024 · 4 comments

Comments

@nexuslrf
Copy link

Hi developers,

I found a pretty tricky computation inconsistency issue on camera_optimizer's apply_to_* functions:

def apply_to_raybundle(self, raybundle: RayBundle) -> None:
"""Apply the pose correction to the raybundle"""
if self.config.mode != "off":
correction_matrices = self(raybundle.camera_indices.squeeze()) # type: ignore
raybundle.origins = raybundle.origins + correction_matrices[:, :3, 3]
raybundle.directions = torch.bmm(correction_matrices[:, :3, :3], raybundle.directions[..., None]).squeeze()
def apply_to_camera(self, camera: Cameras) -> torch.Tensor:
"""Apply the pose correction to the world-to-camera matrix in a Camera object"""
if self.config.mode == "off":
return camera.camera_to_worlds
assert camera.metadata is not None, "Must provide id of camera in its metadata"
if "cam_idx" not in camera.metadata:
# Evalutaion cams?
return camera.camera_to_worlds
camera_idx = camera.metadata["cam_idx"]
adj = self(torch.tensor([camera_idx], dtype=torch.long, device=camera.device)) # type: ignore
adj = torch.cat([adj, torch.Tensor([0, 0, 0, 1])[None, None].to(adj)], dim=1)
return torch.bmm(camera.camera_to_worlds, adj)

The rotation and translation shown in the above two functions are completely different.
Assume the original pose is [R, t], and pose adjustment in CameraOptimizer is [dR, dt]

  • apply_to_raybundle: simply translates the origins (t+dt), and rotates the ray directions (dR @ R @ ray_vec)

  • apply_to_camera: new camera is [R@dR, t+R@dt], when using this new camera to generate rays origins will be t+R@dt, directions transformation will be R @ dR @ ray_vec.

I initially wanted to evaluate the quality of camera optimization of a trained NeRF model by applying this apply_to_camera function but found such a computation mismatch. I guess these functions would also be confusing for other users (for sure 🤣

It would be better in the future version to add some caution notes and an option to switch to different forms of pose adjustments.
For example,

# raybundle_cam: bool
if raybundle_cam:
    R = torch.bmm(adj[:, :3, :3], camera.camera_to_worlds[:, :3, :3])
    t = adj[:, :3, 3:] + camera.camera_to_worlds[:, :3, 3:]
    camera.camera_to_worlds = torch.cat([R, t], dim=-1)
else:
    adj = torch.cat([adj, torch.Tensor([0, 0, 0, 1])[None, None].to(adj)], dim=1)
    camera.camera_to_worlds = torch.bmm(camera.camera_to_worlds, adj)
@jb-ye
Copy link
Collaborator

jb-ye commented Apr 15, 2024

Thanks for sharing the knowledge. I think the correction transform (dR, dt) should be multiply to T_camera_to_world transform from the left, which is (dR @ R, dR t + dt ). This is neither of what was implemented in apply_to_raybundle and apply_to_camera.

Think about the case when we have pose estimate of a sensor rig that contains two cameras. The two cameras have their own pose estimates subject to their fixed extrinsics T_cam1_to_cam2, such that T_cam2_to_world * T_cam1_to_cam2 = T_cam1_to_world. A shared correction transform to the pose estimate of this sensor rig must be multiplied from the left to each of its cameras.

@oseiskar Just want to learn your thoughts on this?

@oseiskar
Copy link
Contributor

@jb-ye I did not actually check this in detail in e8e05ff, and just kept the multiplication order from the original 3DGS PR

I agree with you that it would be better to multiply camera-to-world from the left since that would also work for multi-camera rigs: If you have a single adjusment matrix for all cameras in the rig, the multiplying with that from the left would work for adjusting the pose of the rig correctly.

However, as long as there are no such rig constraints but all cameras are optimized independently, then all of the versions discussed in this thread are OK. They are just different ways to parametrize adjustments to a SE(3) pose.

The main problem in the current version seems to be that the NeRF (ray bundle) and 3DGS use different interpretations for the parameters, which is not nice since it breaks in cases like @nexuslrf 's test.

It might be safest to fix the 3DGS version to match the NeRF one, since at least for 3DGS, all versions should be fine, but for NeRFs, there could be more performance implications (just guessing here, though). If support for multi-camera setups is added, then both version should probably be fixed to use the multiply-from-the-left form.

@brentyi
Copy link
Collaborator

brentyi commented Apr 15, 2024

Thanks everyone for the discussion! Some quick thoughts.

The reason for the original apply_to_raybundle() implementation was likely runtime: the reconstruction quality for all of these implementations (left-multiply, right-multiply, the current behavior) should all be basically the same, but (dR @ R, t + dt) (what's implemented) has one less matrix multiplication than (dR @ R, dR @ t + dt). I recall some tests being run and the overhead being nontrivial.

Assuming this is true, one suggestion is to keep the current behavior for NeRFs, but match the splatfacto behavior for consistency and document it better: it's a separate rotation around the camera frame origin + translation in world coordinates.1

Think about the case when we have pose estimate of a sensor rig that contains two cameras. The two cameras have their own pose estimates subject to their fixed extrinsics T_cam1_to_cam2, such that T_cam2_to_world * T_cam1_to_cam2 = T_cam1_to_world. A shared correction transform to the pose estimate of this sensor rig must be multiplied from the left to each of its cameras.

If we want to switch to a single rigid transform, between left- vs right- multiplied my very weak personal bias is actually for right-multiplied, mostly because it seems like the standard in packages like Ceres and gtsam. I don't find the multi-sensor rig case super convincing, since (a) you have to write custom code anyways and (b) right-multiplication is more convenient for the complementary case where you have good global rig poses but don't trust per-sensor calibrations relative to the rig.

Footnotes

  1. Edit: on a second read this is exactly what @oseiskar suggested in the comment above. sorry for not reading more thoroughly!

@oseiskar
Copy link
Contributor

If we want to switch to a single rigid transform, between left- vs right- multiplied my very weak personal bias is actually for right-multiplied, mostly because it seems like the standard in packages like Ceres and gtsam. I don't find the multi-sensor rig case super convincing, since (a) you have to write custom code anyways and (b) right-multiplication is more convenient for the complementary case where you have good global rig poses but don't trust per-sensor calibrations relative to the rig.

I would like to add that being able to handle all relevant cases with a single rigid transform, which is either left or right multiplication is not a very sustainable goal. In a system that can deal with cases such as multi-camera rigs and possibly extrinsic calibration correction, you will anyway end up simultaneously transforming the poses from both left (tune the pose of the rig) and right (fixed or adjustable transform within the local rig coordinates) and some of these transforms will be shared between multiple cameras.

... and, without multi-camera support, it does not really matter so I also support this:

Assuming this is true, one suggestion is to keep the current behavior for NeRFs, but match the splatfacto behavior for consistency and document it better: it's a separate rotation around the camera frame origin + translation in world coordinates.

oseiskar added a commit to SpectacularAI/nerfstudio that referenced this issue May 26, 2024
Also allow exporting Nerfacto-optimized poses. Fixes nerfstudio-project#3073.
oseiskar added a commit to SpectacularAI/nerfstudio that referenced this issue May 26, 2024
Also allow exporting Nerfacto-optimized poses. Fixes nerfstudio-project#3073.
oseiskar added a commit to SpectacularAI/nerfstudio that referenced this issue May 29, 2024
Also allow exporting Nerfacto-optimized poses. Fixes nerfstudio-project#3073.
Second revision where the rotation and translation parts
are handled separately.
oseiskar added a commit to SpectacularAI/nerfstudio that referenced this issue May 29, 2024
Also allow exporting Nerfacto-optimized poses. Fixes nerfstudio-project#3073.
Second revision where the rotation and translation parts
are handled separately.
@jb-ye jb-ye closed this as completed in 5491df9 May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants