Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about Gaussian normalization in the paper and alpha blending implementation in the code #294

Open
KaziiBotashev opened this issue Oct 9, 2023 · 15 comments

Comments

@KaziiBotashev
Copy link

KaziiBotashev commented Oct 9, 2023

Dear authors, thank you for this outstanding work.

I have some questions related to the alpha blending implementation in the code.

In the lines 336-359 of forward.cu , we do alpha blending with the following procedure:

float4 con_o = collected_conic_opacity[j];
float power = -0.5f * (con_o.x * d.x * d.x + con_o.z * d.y * d.y) - con_o.y * d.x * d.y;
if (power > 0.0f)
continue;

// Eq. (2) from 3D Gaussian splatting paper.
// Obtain alpha by multiplying with Gaussian opacity
// and its exponential falloff from mean.
// Avoid numerical instabilities (see paper appendix). 
float alpha = min(0.99f, con_o.w * exp(power));
if (alpha < 1.0f / 255.0f)
continue;
float test_T = T * (1 - alpha);
if (test_T < 0.0001f)
{
done = true;
continue;
}

// Eq. (3) from 3D Gaussian splatting paper.
for (int ch = 0; ch < CHANNELS; ch++)
C[ch] += features[collected_id[j] * CHANNELS + ch] * alpha * T;
T = test_T;

Following EWA splatting paper the final C[ch] is equivalent to this (ommiting low-pass filter):
image
with following:
image
and following:
image

It seems to me that in order to compute the final color value, we also need to multiply it with the normalization factor, which is the multiplication of the determinants of the Jacobian, camera rotation (the rotation one is identity because of orthonormality), and the square root of the covariance matrix image. If I do this, I will get just the square root of the Vk (world reference frame) matrix.

However, in the code, I can't find any of these determinants or related multiplications either in forward or backward processes, we only use exponential part without normalization and it confuses me a lot. Jacobian is not a constant value; it actually depends on the positions (3D means) of our gaussians, so we can't just simply omit it as well as det(Vk), which is our direct optimization parameter.

I would be very grateful if you could clarify either where we do that part or why we don't need to do it.

Thank you in advance!

@KaziiBotashev KaziiBotashev changed the title Question about paper and alpha blending implementation in the code Question about Gaussian normalization in the paper and alpha blending implementation in the code Oct 9, 2023
@KaziiBotashev
Copy link
Author

KaziiBotashev commented Oct 10, 2023

Dear @grgkopanas,

Could you, please, take a look on that question? Many thanks in advance!

@grgkopanas
Copy link
Collaborator

We have our best guy looking at it :) Its indeed an interesting observation

@f-dy
Copy link

f-dy commented Oct 13, 2023

Normalization and alpha play the same role in the equations, so you can think of alpha as "normalization*the_real_alpha". I actually prefer not having the normalization term (as it is now), because the Gaussians are not the result of blurring a Dirac: I see them as a "mass of stuff". If there was the normalization term, large Gaussian would have to have an alpha value way larger than 1, which makes little sense. I prefer to see the Gaussians as blobs, where alpha is the transparency at the center.

@slefkimmiatis
Copy link

Normalization and alpha play the same role in the equations, so you can think of alpha as "normalization*the_real_alpha". I actually prefer not having the normalization term (as it is now), because the Gaussians are not the result of blurring a Dirac: I see them as a "mass of stuff". If there was the normalization term, large Gaussian would have to have an alpha value way larger than 1, which makes little sense. I prefer to see the Gaussians as blobs, where alpha is the transparency at the center.

I am not sure that the normalization term that @KaziiBotashev refers to be can be absorbed by the alpha parameter (that would indeed be very convenient in terms of implementation simplicity). The reason for this, unless I am wrong, is that the opacity of the volume is independent of the camera view, while the normalization term directly depends on it through the Jacobian J_k which internally involves the camera rotation and translation.

@grgkopanas do you have any updates that you could share with us about this issue?

@KaziiBotashev
Copy link
Author

KaziiBotashev commented Oct 19, 2023

@f-dy If there is the normalization term, large Gaussian would have to have an normalization term value way larger than 1. That effect might be compensated by "the real alpha" learned value (for large gaussians we will have large normalization term and small real alpha value trained). Can you, please, elaborate a bit more on why it makes little sense?

@adam-ce
Copy link
Collaborator

adam-ce commented Oct 19, 2023

regarding the normalisation of the gaussian based on the det(covariance): to my understanding, mathematically it makes no difference, it can be baked into alpha. numerically, it might make a difference, i don't know. performance wise it's faster to not compute the normalisation. but that's done in the preprocess phase, so it should not really matter.

regarding the normalisation based on the jakobian:
to my understanding it boils down to keeping the integral of the transformed gaussian the same as the untransformed one. if the gaussian (or its 1 sigma isoellipsoid) becomes larger, the scaling factor is < 1. so let's say we have a gaussian with alpha 1.0 -> completely opaque. let's now say we are closing in on that gaussian. it'll become stretched eventually. if using the jakobian normalisation, it would become transparent. without it'll stay opaque. and the authors apparently decided to keep it opaque. that's at least my theory. i still don't understand all of it completely.

@f-dy
Copy link

f-dy commented Oct 20, 2023

OK I found one place where normalization consideration is missing, this is where the 2D Gaussian is convolved with an isotropic 2D Gaussian of sigma sqrt(0.3) to simulate pixel integration (this is not in the paper):

  • here in forward.cu
  • here in backward.cu

Let us say you have a 2D Gaussian with an opacity of 1 at the center. When doing a convolution with another 2D Gaussian, if the opacity is currently left unchanged the Gaussian will become larger while remaining opaque and may obscure Gaussians that are behind (we observed that on grid patterns).

Take an extreme case where the original Gaussian has size 0.1 and opacity 1, and we blur it with a Gaussian of sigma 10. The result is a Gaussian with sigma=sqrt(0.1^2+10^2), but the opacity shouldn't be 1!

Instead, the opacity should be reduced so that the integrated opacity of the resulting Gaussian is the same as the original one. Thus in the 3DGS code the opacity should be multiplied by the factor sqrt(det(Sigma)/det(Sigma+diag(0.3,0.3))). @grgkopanas

In the above example, the factor (and thus the final opacity at the center of the 2D Gaussian) would be sqrt(0.1^4/10.1^4) = 0.0001.

@Snosixtyboo
Copy link
Collaborator

Snosixtyboo commented Oct 20, 2023

OK I found one place where normalization consideration is missing, this is where the 2D Gaussian is convolved with an
isotropic 2D Gaussian of sigma sqrt(0.3) to simulate pixel integration (this is not in the paper):

That is entirely correct!

We discovered this some time ago. We tested it with and without proper compensation, but we found it has no measurable impact on image quality according to standard metrics. So we left it the same way it was used for the paper evaluation.

Hth,
Bernhard

@jb-ye
Copy link

jb-ye commented Oct 20, 2023

OK I found one place where normalization consideration is missing, this is where the 2D Gaussian is convolved with an
isotropic 2D Gaussian of sigma sqrt(0.3) to simulate pixel integration (this is not in the paper):

That is entirely correct!

We discovered this some time ago. We tested it with and without proper compensation, but we found it has no measurable impact on image quality according to standard metrics. So we left it the same way it was used for the paper evaluation.

Hth, Bernhard

The impact on standard metrics may be small because the validation images are selected to render at similar distance from training images. If you captured data at distance 0.5m and render them at 2m, or using different focal length, the effect could be obvious.

@f-dy
Copy link

f-dy commented Oct 20, 2023

+1 we've seen a very visible impact when rendering from a different distance

@f-dy
Copy link

f-dy commented Oct 24, 2023

So we left it the same way it was used for the paper evaluation.

Hi Bernhard @Snosixtyboo, would it be possible to have that at least as an option? I'm not a CUDA expert, and find it difficult to compute a scalar here and use it somewhere else, but since you did it before, could you share the solution?

@jb-ye
Copy link

jb-ye commented Oct 24, 2023

Here are two videos for the demonstrated effect as mentioned by @f-dy The first video was just do 2D convolution without compensation of opacity, when render camera moves from far to close (near the captured distance), we observe the color on the grid pattern of acoustic amplifier changes and creates aliasing like effect (though it is not aliasing). The second video was 2D convolution with compensation of opacity.

The demonstration was done using a third party implementation (https://github.com/wanmeihuali/taichi_3d_gaussian_splatting/tree/main/taichi_3d_gaussian_splatting).

aliasing_like.mp4
no_aliasing_like.mp4

@tdzdog
Copy link

tdzdog commented Nov 17, 2023

Another question about the alpha, I notice that the alpha formulation in equation (2) in the paper and the implementation in the code are different. It is 1 - exp in the paper but exp in the code. Can anyone explain the reasons? @KaziiBotashev @Snosixtyboo @grgkopanas

@ys-koshelev
Copy link

@tdzdog I believe the typo is not in the code, but in the paper, where in the Eq. 2 it should be $\alpha_i =\textrm{exp} \left(−\sigma_𝑖 \delta_𝑖 \right)$ instead of $\alpha_i = \left(1 − \textrm{exp} \left(−\sigma_𝑖 \delta_𝑖 \right) \right)$.

@AlexRoss-WHS
Copy link

@tdzdog @ys-koshelev Eq. 2 in the paper is definitely correct, $\alpha_i = (1-\text{exp}(-\sigma_i \delta_i))$ (The opacity $\alpha$ should get bigger for greater density $\sigma$ or interval $\delta$). But this is not done in the code at all, as Eq. 2 describes the raymarching approach of NeRF-like volumetric representations.
In Gaussian Splatting each Gaussian stores the opacity $\alpha$ directly. The exp(power) in line 343 of forward.cu (that I assume @tdzdog is referring to) represents the evaluation of the projected 2D Gaussian.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants