-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking convergence criteria #98
Comments
Dear, I am also encountering unusual results in camera pose estimation. I replaced the differential rasterizer in the original GS work and pre-trained on the Blender dataset for 30,000 iterations, after which I saved the gaussians. Subsequently, I ceased the optimization of the gaussian model, focusing solely on adjusting the camera position. I introduced an error margin of +/- 0.2 meters and +/- 5 degrees to the camera, yet the final camera position merely fluctuates within a small range and fails to converge to the correct location. I utilized both RGB and depth as loss metrics. I am perplexed as to whether the issue lies in the design of the supervision loss or the gradient of the camera delta pose. Your insights would be highly appreciated. |
I've experienced similar issues as yours. By adding gradient computation to the GES from I've observed that when optimizing both rotational and translational components, I get divergence. Meanwhile, I'm steadily getting convergence to ground truth poses (usually within 1-5 cm margin), when NOT updating the rotational component. The magnitudes of additional translational noise differ within (5-50 cm). So, when using gradients to update the translation vector only, in my tests, poses converge. I'm using the original photometric loss from 3DGS (both L1 and SSIM losses are involved, without outlier rejection by image gradients or opacity). Also, I use the lowest order of spherical garmonics so that the color doesn't depend on the rotation. So, I think there is an issue with optimizing rotational component. I've been investigating gradients, but haven't found them to be computed incorrectly. I wonder if somebody experiences the same behavior. |
@WFram Thank you very much for sharing your insights. I conducted a similar test. My test involves using a pre-trained Gaussian model from Blender datasets (optimized for 30,000 iterations), then manually adding varying degrees of translation and rotation errors, and subsequently optimizing only the camera position. However, I found that the camera position is very sensitive to the image plane UV coordinates, and despite trying different learning rates, I have not succeeded in optimizing to the correct camera position. As you mentioned, there is indeed an issue with the camera's optimization on rotation. I suspect that even if the gradient direction is correct, the loss may not converge if there is an excessively large gradient scale in a particular direction. It is also possible that the loss function is inherently less sensitive to rotation. I have some experience regarding differentiable rendering, and I wonder if you would be interested in discussing this further. Perhaps through our exchange of ideas, we might come up with some innovative solutions. Looking forward to your response. |
@YufengJin Yes, it would be interesting to discuss this further. I also have some concerns about the sensitivity of the loss to the rotation matrix. In the last experiments, I was using image pyramids to optimize camera poses in coarse-to-fine manner. Like in Direct Sparse Odometry. But I used downscaling factors of 2, 4, while in DSO the pyramid has 5 layers and the image at the highest level is downscaled by 16 w.r.t. the original image. It's better to start optimizing rotation at the coarsest pyramid level due to the high non-linearity of the cost function. You can try experimenting with this. |
Also, I think different exposure times might impact the optimization. It's not the case for me, since I've been using synthetic data with fixed exposure times. In MonoGS they estimate affine brightness parameters, but they are used only for two frames in the loss. Before being projected to the image plane, all the gaussians are blended from a larger number of frames, and simply compensating brightness in the loss computation might not be enough. I think it's better to correct for exposure in preprosessing step, or just fix the exposure when recording image stream. |
@WFram First of all, thank you for your suggestions. I will give them a try. Additionally, due to reflections and inconsistent lighting, the colors of the splats are inconsistent during the training process. If we do not use the SH color encoder to represent colors, this issue can be somewhat mitigated. Regarding camera position optimization, I have another idea. We can use multiple viewpoints to optimize the camera position. |
Can I ask how you isolated the translation optimization from the rotational one ? I think my test kinda validate what you are experiencing. When the process is converging (and I do not converge a lot) I often have a "rather ok" translational error and a huge rotational error. I'll run further tests to really validate this on my side. |
By setting the rotational state update P. S.: Make sure that your rotation is actually expressed in the camera frame (world-to-camera, |
Hello! How to train the Gaussian Splatting on my scene ? I have already written code for load my dataset and yaml file. Which command you used ? Thank you and your response is very helpfully |
@Il-castor You are on the wrong issue here. I can only encourage you to check out the readme and other issues on custom dataset loading :) |
Hello everyone, I'm encountering a similar issue as described above. After training my model on a Blender dataset for 30,000 iterations using the original pipeline, I saved the Gaussian splats and then shifted focus to camera pose optimization with the diff-gaussian-rasterization pipeline. I introduced a small offset to the camera position (+/- 0.2 meters and +/- 5 degrees) to test if the optimization would correct the pose back to its original state. However, the results are concerning:
Despite trying different setups, the camera position fails to converge as expected. I’m unsure if this issue is due to the supervision loss design, the gradient calculation of the camera delta pose, or an interaction between the differential rasterization and pose optimization. Has anyone found a solution to this problem? Thank you for your time and assistance! |
Dear community, authors,
As explained in #93 I'm trying to use the MonoGS tracker to retreive a camera pose.
In my particular case I am training a gaussian splatting scene and then trying to retreive a pose between two close cameras belonging to my training dataset.
To do so I used the MonoGS rasterizer to get the jacobian computation and the tracking function from the MonoGS/utils/slam_frontend.py. I'm training the scene with a simple gaussian splatting training loop and trying to optimize the pose at the end.
So far I'm unable to get a proper pose estimation with my experiment.
I'm unable to reach convergence when using 100 gradient descent iterations, and when I reach convergence with 2000 iterations my results are really far from the goal pose. In fact the I even tend to see the estimation diverging from my goal pose ...
As an example, I'm initializing my translation vector as
[ 1.96003743 0.59285834 -0.86803944]
, my end goal is to estimate this translation vector[ 1.96642986 0.66022297 -0.83239669]
and after 100 iterations I get this estimate :[ 2.1265576 -0.12794144 -0.41376677]
.I'm looking at the learning rates and the convergence factor (update_pose(camera, converged_threshold=1e-4)) as possible ways to improve the estimations. Though the learning rate is constant in all the config files provided ...
Have somebody faced a similar problem or could guess what's causing the issue ?
Any ideas on how to set the
converged_threshold
value or learning rates to fit my use case ?Thanks in advance,
Best,
Hugo
Edit : I tried changing the learning rates but it seems that the gradient is simply not descended ... No matter what learning rate I use I don't see the result converging towards the goal pose. I paid attention to the attachment of the projection matrices and extrinsics to the computational graph and this doesn't seem to be the issue here :/
The text was updated successfully, but these errors were encountered: