Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pose optimization to Splatfacto #2885

Conversation

jh-surh
Copy link
Contributor

@jh-surh jh-surh commented Feb 8, 2024

Requires nerfstudio-project/gsplat#123

Installation requirements:

git clone https://github.com/jh-surh/gsplat.git
cd gsplat
git submodule update --init --recursive
git checkout jhsurh/add-pose-grad
pip install -e .

@jb-ye
Copy link
Collaborator

jb-ye commented Feb 8, 2024

Do you have more experiments? I am not sure back-propagating gradients to camera pose would work robustly for splatfacto, considering the following factors:

(1) training splatfacto is not like a typical nerf, it has non-grad operations (splitting, culling, resetting gaussians), computing gradients right before / after those operations can be very unstable.
(2) the gradient of each step is computed w.r.t. an image. If a user has 1000 input images, the 30k training steps only allow each camera to be updated with gradients by 30 times. I would imagine this easily breaks the cross-frame consistency (I might be wrong).

Regardless my concerns, I think this is a research area worth exploration. I recommend to experiment with more datasets. One way to validate the work is to start from poses that are known to be less accurate than SfM poses and experiment to check if pose opt can bring back the quality as good as SfM poses. For example, iphone provides online pose estimates in its ARKit ( https://github.com/apple/ARKitScenes ), one can test if training directly from ARKit poses with pose optimizer can produce equally good results.

Besides qualitative evaluation of rendering videos, one can at least monitor the training loss and see if camera_opt reduces training loss by a significant margin.

Another quantitative evaluation method is to backpropagating gradients to only optimize validation cameras using validation images and evaluate against standard metrics, this is similar to what has been in the original nerfstudio paper.

@ichsan2895
Copy link

Interesting. Let me check the quality for another dataset

@ichsan2895
Copy link

Hey, any documentation how to test it?
or just git clone your repo which you give PR then it run out of the box?

@ichsan2895
Copy link

Hey, any documentation how to test it? or just git clone your repo which you give PR then it run out of the box?

image

I see new metric called Train Loss Dict/camera_opt_regularizer & Train Metrics Dict/camera_opt_rotation.

But the value always zero. Is that right?

Also, the are no change in quality. PSNR, SSIM, LPIPS still same as Nerfstudio v1.0.1
image

I am afraid I dont install it properly or maybe missing a new settings.

@jh-surh
Copy link
Contributor Author

jh-surh commented Feb 8, 2024

Do you have more experiments? I am not sure back-propagating gradients to camera pose would work robustly for splatfacto, considering the following factors:

(1) training splatfacto is not like a typical nerf, it has non-grad operations (splitting, culling, resetting gaussians), computing gradients right before / after those operations can be very unstable.
(2) the gradient of each step is computed w.r.t. an image. If a user has 1000 input images, the 30k training steps only allow each camera to be updated with gradients by 30 times. I would imagine this easily breaks the cross-frame consistency (I might be wrong).

Regardless my concerns, I think this is a research area worth exploration. I recommend to experiment with more datasets. One way to validate the work is to start from poses that are known to be less accurate than SfM poses and experiment to check if pose opt can bring back the quality as good as SfM poses. For example, iphone provides online pose estimates in its ARKit ( https://github.com/apple/ARKitScenes ), one can test if training directly from ARKit poses with pose optimizer can produce equally good results.

Besides qualitative evaluation of rendering videos, one can at least monitor the training loss and see if camera_opt reduces training loss by a significant margin.

Another quantitative evaluation method is to backpropagating gradients to only optimize validation cameras using validation images and evaluate against standard metrics, this is similar to what has been in the original nerfstudio paper.

@jb-ye

As per your concerns:
(1) The non-grad operations you mention happens in between training iterations so they should not make an impact on the gradient flow during the training phase. The refinement for the poses happens at the same time as that for the gaussians, so if the timing for gradient update is a problem for the pose, the same would be the case for the gaussians.
(2) As long as the magnitude of the learning rate is not too large, this should not be a problem - like most things in deep learning. However, it is true that if there a lot of images, each camera will get few pose updates. I think the next thing to add to gsplat to address this issue would be multi-image splatting, although I question it's feasibility.

Regarding your suggestion on using poses from Apple's ARKit, the dataset I captured already uses the pose acquired from it, since Record3D uses Apple's native AR routines to estimate poses online.

I agree with your suggestions on needing quantitative results. I will try to find a way to show some with the tools given. I'm thinking maybe comparing the Record3D poses, poses from my splatfacto update, and those from COLMAP. I'll have to find a way to extract the optimized poses.

@jh-surh
Copy link
Contributor Author

jh-surh commented Feb 8, 2024

Hey, any documentation how to test it? or just git clone your repo which you give PR then it run out of the box?

image

I see new metric called Train Loss Dict/camera_opt_regularizer & Train Metrics Dict/camera_opt_rotation.

But the value always zero. Is that right?

Also, the are no change in quality. PSNR, SSIM, LPIPS still same as Nerfstudio v1.0.1
image

I am afraid I dont install it properly or maybe missing a new settings.

@ichsan2895

Hey, have you installed my implementation of gsplat?

git clone https://github.com/jh-surh/gsplat.git
cd gsplat
git checkout jhsurh/add-pose-grad
pip install -e .

@ichsan2895
Copy link

Hey, any documentation how to test it? or just git clone your repo which you give PR then it run out of the box?

image
I see new metric called Train Loss Dict/camera_opt_regularizer & Train Metrics Dict/camera_opt_rotation.
But the value always zero. Is that right?
Also, the are no change in quality. PSNR, SSIM, LPIPS still same as Nerfstudio v1.0.1
image
I am afraid I dont install it properly or maybe missing a new settings.

@ichsan2895

Hey, have you installed my implementation of gsplat?

git clone https://github.com/jh-surh/gsplat.git
cd gsplat
git checkout jhsurh/add-pose-grad
pip install -e .

I think so, but let me reinstall gsplat again. I will inform the result

@ichsan2895
Copy link

ichsan2895 commented Feb 8, 2024

Thanks for fast response @jh-surh

I got error with this way

sudo apt-get install cuda-toolkit-11-8 -y
git clone https://github.com/jh-surh/gsplat.git
cd gsplat
git checkout jhsurh/add-pose-grad
pip install -e .

This is the error:

In file included from gsplat/cuda/csrc/backward.cu:2:
    gsplat/cuda/csrc/helpers.cuh:3:10: fatal error: third_party/glm/glm/glm.hpp: No such file or directory
        3 | #include "third_party/glm/glm/glm.hpp"
          |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    compilation terminated.
    error: command '/usr/local/cuda/bin/nvcc' failed with exit code 1
    [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.

[notice] A new release of pip is available: 23.3.1 -> 24.0
[notice] To update, run: python -m pip install --upgrade pip

Another attempt:

sudo apt-get install cuda-toolkit-11-8 -y
git clone--recursive  https://github.com/jh-surh/gsplat.git
cd gsplat
git checkout jhsurh/add-pose-grad
pip install -e .

Yes it works, but a new error happen when I run ns-train splatfacto

File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 487, in backward
    torch.autograd.backward(
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 200, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: Function _ProjectGaussiansBackward returned an invalid gradient at index 3 - got [4, 4] but expected shape compatible with [3, 4]

@jh-surh
Copy link
Contributor Author

jh-surh commented Feb 8, 2024

@ichsan2895

Hey, sorry for the confusion you have to initialize the submodules as well, i.e.:

git submodule update --init --recursive

@jb-ye
Copy link
Collaborator

jb-ye commented Feb 8, 2024

As per your concerns: (1) The non-grad operations you mention happens in between training iterations so they should not make an impact on the gradient flow during the training phase. The refinement for the poses happens at the same time as that for the gaussians, so if the timing for gradient update is a problem for the pose, the same would be the case for the gaussians.

I am afraid this is case by case: the culling and splitting operation happens locally to those specific gaussians being edited. They have a limited impact of the overall loss function. But for alpha resetting operation, the impact is very global, I am interested how pose grad changes right after alpha resetting.

(2) As long as the magnitude of the learning rate is not too large, this should not be a problem - like most things in deep learning. However, it is true that if there a lot of images, each camera will get few pose updates. I think the next thing to add to gsplat to address this issue would be multi-image splatting, although I question it's feasibility.

Deep learning doesn't care if the solution converges to a global minimum, but we had only one global optimal solution for pose estimation. That's why most popular pose estimation optimize on a bundle of frames together to reduce the noise of gradient estimation. For our problem, we essentially assume the initial pose is close enough so stochastic gradient won't move the pose away from global optimum. This is something that have no guarantees.

Setting learning rate too small has no impact on the final results, and setting it too large would converge to something unwanted. I found this to be very non-trivial.

Regarding your suggestion on using poses from Apple's ARKit, the dataset I captured already uses the pose acquired from it, since Record3D uses Apple's native AR routines to estimate poses online.
I am appreciating your attempt. But would it make sense to use public datasets so others can reproduce your experiment more easily?

I agree with your suggestions on needing quantitative results. I will try to find a way to show some with the tools given. I'm thinking maybe comparing the Record3D poses, poses from my splatfacto update, and those from COLMAP. I'll have to find a way to extract the optimized poses.

Look forward to your results. Put my concerns aside, I think we can have this camera opt option in the main branch as long as we found it is useful on some datasets, and other people are aware of how to use it properly. Nerfstudio is a research project and should welcome innovations, but I am more comfortable to set this option to false by default.

@@ -157,6 +158,8 @@ class SplatfactoModelConfig(ModelConfig):
"""
output_depth_during_training: bool = False
"""If True, output depth during training. Otherwise, only output depth during evaluation."""
camera_optimizer: CameraOptimizerConfig = field(default_factory=lambda: CameraOptimizerConfig(mode="SO3xR3"))
Copy link
Collaborator

@jb-ye jb-ye Feb 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I set this camera optimizer to be off, would it still trigger computation overheads w.r.t. computing the gradients w.r.t. to view_mat and proj_mat? If yes, how much overheads it brings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check this. Thank you for the suggestion!

@jh-surh
Copy link
Contributor Author

jh-surh commented Feb 8, 2024

@ichsan2895, sorry for the confusion, I had some changes that were not updated in nerfstudio-project/gsplat#123
I have pushed them, it should resolve your problem.

@jh-surh
Copy link
Contributor Author

jh-surh commented Feb 8, 2024

@jb-ye

Look forward to your results. Put my concerns aside, I think we can have this camera opt option in the main branch as long as we found it is useful on some datasets, and other people are aware of how to use it properly. Nerfstudio is a research project and should welcome innovations, but I am more comfortable to set this option to false by default.

Sounds good to me! I will try to get the results ASAP

@ichsan2895
Copy link

ichsan2895 commented Feb 8, 2024

Unfortunatelly, the result is not good. @jh-surh @jb-ye

This is kitchen dataset from ns-download-data. Red line is this implementation, The another one is Nerfstudio v1.0.1 and dataset is processed by colmap 3.8.

image

This is playroom dataset from original Inria Gaussian-splatting. Red line is this implementation, The another one is Nerfstudio v1.0.1 and dataset is processed by colmap 3.8.

image

My suggestion:
Try to tweak hyperparameters for example magnitude of the learning rate is not too large.

@kerrj
Copy link
Collaborator

kerrj commented Feb 8, 2024

Thanks for the effort on this! @jh-surh regarding " I think the next thing to add to gsplat to address this issue would be multi-image splatting, although I question it's feasibility", you can already roughly simulate this behavior by adding gradient accumulation (I believe this is already on for camera optimization (see here)). You can try adding it to gaussian parameter groups too to accumulate positional gradients from multiple cameras. I experimented with this behavior early on and found it helped with floaters at the cost of overall quality, but a lot of things have changed since then so it's worth experimenting with again.

As for testing, agree with the suggestions @jb-ye has; the two main things to test are 1) performance on non-COLMAP datasets like ARKit or Polycam, where pose optimization should help a great deal, and 2) performance on COLMAP datasets, where pose optimization could actually hurt, but it's important to quantify how much. Learning rate scheduling and regularizers on the pose deviation of each camera should hopefully help bring the quality drop in 2) down. If you want to be fancy, you can also try adding coarse-to-fine optimization by blurring images early in training and slowly re-adding high frequency information (similar in spirit to barf)

@jh-surh
Copy link
Contributor Author

jh-surh commented Feb 9, 2024

@ichsan2895

Oof, I'll try to tweak some of the hyperparameters. Thank you for your experimentation!

@kerrj

Thank you for your suggestions! I will try them and see how things work out.

@oseiskar
Copy link
Contributor

oseiskar commented Feb 9, 2024

We (Spectacular AI) have also been experimenting with pose optimization with Splatfacto/gsplat and could add the contributions here (I created a new PR #2891 for them since that was technically simplest at this point).

To recap what has already been said in the thread: The pose optimization is not required and can hurt with still image-based datasets succesfully optimized with COLMAP (or synthetic data). This is what most people in the academia are benchmarking with, which make them seem like the primary use case. However, if image data has been collected with a moving camera, the situation is very different. COLMAP does not give perfect results, and there are alternatives, none of which necessarily result to pixel-perfect SfM results for various reasons, or even aim to do so. These include proprietary systems like ARKit/Record3D, PolyCam and our SDK & tools. For this latter use case, some level of pose optimization is very useful.

I fully agree with #2885 (comment). It's unclear how, I theory, things like gradient accumulation etc. should exactly work with alpha reset. However, this approach seems to work in practice nevertheless.

Main additions:

  1. I don't think apply_to_camera currently works well for Gaussian splatting. The approach is not numerically stable and the priors do not work as supposed to. This commit attempts to fix this 5fe47c9
  2. Things seems to generally work nicer with this kind of hyperparameters: 2e37af2 , which start the pose optimization only after the initial reconstruction has somewhat converged (added a separate "step" ramp mode to support this 04149f7). When the optimization should start may also depend on how much off the initial poses are expected to be.
  3. There is a simpler (approximate) alternative for view matrix gradients in gsplat: Approximate view matrix gradient for pose optimization gsplat#127 (drop-in replacement Add camera pose and projection gradient flow gsplat#123)

@jb-ye
Copy link
Collaborator

jb-ye commented Apr 11, 2024

#2891 is merged, closing this PR.

@jb-ye jb-ye closed this Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants