Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on the cameras and the projection matrices #826

Closed
MatteoFusconi opened this issue May 22, 2024 · 9 comments
Closed

Question on the cameras and the projection matrices #826

MatteoFusconi opened this issue May 22, 2024 · 9 comments

Comments

@MatteoFusconi
Copy link

In my use case, I need to reproject the points from the camera pixels to the world reference frame, given also the depth taken from the rasterization of the gaussian.

Until now I was using the FoV variables and the extrinsics, and performing the calculations of the Pinhole camera model, but I obtained wrong results.

Is it something to do with the projection matrix taken from getProjectionMatrix?
Could someone give me some clarifications on what the projectionmatrix and fullprojectionmatrix are, and why are they 4x4?

Any help would be much appreciated

@PanagiotisP
Copy link

Camera models are always a pain to get right. I haven't bothered about the intrinsic at all, so I cannot help you with that. Regarding the world to pixel space though I think I can help.
First of all, you have the world_view_transform that simply transforms points from world to view/camera space.
Then you have the projection_matrix that transforms points from view/camera to NDC space.
The full_proj_transform is just the combination of the two. It transforms a point from world to NDC space.

Regarding the getProjectionMatrix, the code uses the OpenGL projection matrix. A very nice blog about that can be found here. However, there are two differences with the matrix shown on that blog. The first is that the used sign is the positive one (so entries [2,2] and [3,2] have a positive sign). Also, instead of using a cube from [-1, 1] in all coordinates for the NDC space, the z coordinate spans just [0,1]. So at entry [2,2] instead of (f + n) / (f - n), you have f / (f - n).
As for the 4x4 shape, it is as such to handle homogenous coordinates.

I also want to point out that OpenGL uses column-major matrices, so they might be the transposed version of what you would expect (this is covered in the blog too).
Hope I helped

@MatteoFusconi
Copy link
Author

Thank you for the answer, now it is way clearer.

It is still not clear to me what is the point of projecting in the NDC space.
Does the rasterizer need to have the gaussians in NDC in order to work?
Because the points can still be projected from the image space to the world space without passing through the NDC

Thanks a lot for your availability

@PanagiotisP
Copy link

NDC is nothing more than a 3D representation of the distorted (after the perspective transform) space. Definitely, you could skip it, but in general, I find it useful because it is still a 3D representation (you have a sense of depth), in which you have the comforts of orthographic projection (rays are parallel to each other, ray direction is just [0, 0, +-1], hit and occlusion tests are trivial etc).

@lihao2333
Copy link

lihao2333 commented Jun 6, 2024

Camera models are always a pain to get right. I haven't bothered about the intrinsic at all, so I cannot help you with that. Regarding the world to pixel space though I think I can help. First of all, you have the world_view_transform that simply transforms points from world to view/camera space. Then you have the projection_matrix that transforms points from view/camera to NDC space. The full_proj_transform is just the combination of the two. It transforms a point from world to NDC space.

Regarding the getProjectionMatrix, the code uses the OpenGL projection matrix. A very nice blog about that can be found here. However, there are two differences with the matrix shown on that blog. The first is that the used sign is the positive one (so entries [2,2] and [3,2] have a positive sign). Also, instead of using a cube from [-1, 1] in all coordinates for the NDC space, the z coordinate spans just [0,1]. So at entry [2,2] instead of (f + n) / (f - n), you have f / (f - n). As for the 4x4 shape, it is as such to handle homogenous coordinates.

I also want to point out that OpenGL uses column-major matrices, so they might be the transposed version of what you would expect (this is covered in the blog too). Hope I helped

I think you missed the third difference that your camera z axis is opposite with opengl camera.
Otherwise, the projection matrix should be this, I think.
image

@PanagiotisP
Copy link

The sign I mentioned affects just the z axis. The code is pretty clear on how the sign is used.
To make it clear, the projection matrix that post mentions is

$$\begin{bmatrix} \dfrac{2n}{r-l} & 0 & \dfrac{r+l}{r-l} & 0\\\ 0 & \dfrac{2n}{t-b} & \dfrac{t+b}{t-b} & 0\\\ 0 & 0 & -\dfrac{f+n}{f-n} & -\dfrac{2fn}{f-n}\\\ 0 & 0 & -1 & 0\\\ \end{bmatrix}$$

while this code uses:

$$\begin{bmatrix} \dfrac{2n}{r-l} & 0 & \dfrac{r+l}{r-l} & 0\\\ 0 & \frac{2n}{t-b} & \dfrac{t+b}{t-b} & 0\\\ 0 & 0 & \dfrac{f}{f-n} & -\dfrac{fn}{f-n}\\\ 0 & 0 & 1 & 0\\\ \end{bmatrix}$$

@lihao2333
Copy link

If we put $z=-n$ in your matrix then we get $\frac{\frac{f}{f-n}(-n) - \frac{fn}{f-n}}{-n} = \frac{2f}{f-n} \neq 0$.
So I think your projection is not meat to map z from (-n, -f) into (0, 1).
Instead, you may map (n, f) into (0, 1).
In other world, your camera z axis is opposite to opengl camera z axis.

@lihao2333
Copy link

lihao2333 commented Jun 7, 2024

I find a way to understand your projection matrix.
Opengl projection matrix is response for mapping frustum into NDC cube and let $w_n=-z_c$, that is:

x: [l, r] -> [-1, 1] 
y: [b, t] -> [-1, 1]
z: [-n, -f] -> [-1, 1]. 

And your projection matrix is response for mapping the frustum which is symmetric about the origin into NDC cube and only map z to [0, 1] and let $w_n=z_c$, that is:

x: [-r, -l] -> [-1, 1] 
y: [-t, -b] -> [-1, 1]
z: [n, f] -> [0, 1]. 

Here is my derivation:

image
image
image
image
While $x_c$ means x coord in camera coord and $x_n$ means x coord in homogenous NDC coord and $x_n'$ means x coord in NDC coord.

I tried may ways to derive your projection matrix and only this way of understanding works. Please help me if I have miss understanding.

@PanagiotisP
Copy link

If we put z=−n in your matrix then we get ff−n(−n)−fnf−n−n=2ff−n≠0. So I think your projection is not meat to map z from (-n, -f) into (0, 1). Instead, you may map (n, f) into (0, 1). In other world, your camera z axis is opposite to opengl camera z axis.

Ok, I see where the mixup is. You are right, the camera space here, unlike OpenGL, has a positive z-axis, so the boundaries are [n, f]. I'm sorry, I didn't even know that OpenGL had a negative z-axis even for the camera space. I thought it was only the NDC/clip space.

@zzg-zzg
Copy link

zzg-zzg commented Jun 17, 2024

I find a way to understand your projection matrix. Opengl projection matrix is response for mapping frustum into NDC cube and let wn=−zc, that is:

x: [l, r] -> [-1, 1] 
y: [b, t] -> [-1, 1]
z: [-n, -f] -> [-1, 1]. 

And your projection matrix is response for mapping the frustum which is symmetric about the origin into NDC cube and only map z to [0, 1] and let wn=zc, that is:

x: [-r, -l] -> [-1, 1] 
y: [-t, -b] -> [-1, 1]
z: [n, f] -> [0, 1]. 

Here is my derivation:

image image image image While xc means x coord in camera coord and xn means x coord in homogenous NDC coord and xn′ means x coord in NDC coord.

I tried may ways to derive your projection matrix and only this way of understanding works. Please help me if I have miss understanding.

I have tried many ways to understand this function, and your answer is the most convincing explanation I've seen so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants