Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix missing +0.5 in calculating uv coordinate #151

Merged
merged 1 commit into from
Apr 2, 2024

Conversation

jb-ye
Copy link
Collaborator

@jb-ye jb-ye commented Mar 30, 2024

In previous PR #97 I realized the author didn't add back the +0.5 term in the rasterize_forward() calls.

Just to clarify we need this 0.5 term. For an image, a pixel whose index is at (i, j) represents an integrated box of area 1, whose center is actually at (i + 0.5, j + 0.5). So when computing a point (x, y, z) whose projection locates at pixel (i, j), the following equation is established:

i + 0.5 = fx * x / z + cx
j + 0.5 = fy * y / z + cy

For example, given a image of width 999x999 and cx = 499.5 and cy=499.5 (half width, height) The pixel index at (499, 499) corresponds to the exact principal point of the image.

In our previous implementation before #97, we minus 0.5 in the xys, which I think doesn't make enough sense, either. The proper way is to add 0.5 to the px, py in rasterize_forward() calls. Without resolving this issue, the project_gaussian() and rasterize_forward() are mismatched by 0.5 pixel and causing splatfacto training to be degraded.

This PR also bump the version to 0.1.10 since this is important bug fix for splatfacto.

@jb-ye jb-ye requested a review from kerrj March 30, 2024 02:13
@ichsan2895
Copy link

My experiment GSplat 0.1.8 vs 0.1.9 vs 0.1.10 (your PR)

Red = GSplat 0.1.10
Blue = GSplat 0.1.9
Black = GSplat 0.18

ns-train splatfacto --logging.steps-per-log 200 --vis viewer+wandb --viewer.websocket-port 7007 \
    --pipeline.model.resolution-schedule 3000 \
    --pipeline.model.num-downscales 2 \
    nerfstudio-data \
    --data path/to/scene --downscale-factor 1

Truck (251 images @ 1957 x 1051 already undistorted with colmap )

image

Purancak (my private dataset, 599 images @ 1995 x 1494 already undistorted with colmap)

image

@ichsan2895
Copy link

GSPLAT 0.1.8

0000_GSplat018_convertINRIA-NP_DEVJbYe

0006_GSplat018_convertINRIA-NP_DEVJbYe

0009_GSPlat018_convertINRIA-NP_DEVJbYe

GSPLAT 0.1.10

0000_GSplat0110_convertINRIA-NP_DEVJbYe

0006_GSplat0110_convertINRIA-NP_DEVJbYe

0009_GSplat0110_convertINRIA-NP_DEVJbYe

@oseiskar
Copy link
Contributor

oseiskar commented Mar 30, 2024

@jb-ye I don't think this is correct. Whether or not the center of the top left pixel is (0.5, 0.5) or (0.0, 0.0) is a matter of convention. In OpenGL, it's (0.5, 0.5), in OpenCV, for example, it's (0, 0). (EDIT: see links).

Which one is correct? It depends on the camera calibration. AFAIK, most some real-world calibration tools assume (0, 0). This includes COLMAP, OpenCV, Kalibr and Basalt. (EDIT: actually mostly 0.5, 0,5, thanks for pointing this out)

EDIT: The previous version had +0.5, because it converted between the OpenCV and OpenGL conventions. Now the OpenGL NDC coordinates are gone and the conversion is not needed.

Links:

@jb-ye
Copy link
Collaborator Author

jb-ye commented Mar 30, 2024

@ichsan2895 Just want to confirm, your results are based on colmap, right?

@jb-ye
Copy link
Collaborator Author

jb-ye commented Mar 30, 2024

@oseiskar I did some research, here are my findings:

(1) nerfstudio assume +0.5 offset for all image coordinates when applying calibration intrinsics. See this line.

(2) calibration software actually use the convention of feature detection outputs to decide if it wants to apply +0.5. For example, as described here: colmap assume the center of top-left pixel to be (0.5, 0.5) because its feature detection inputs follow the same convention.

In section: https://colmap.github.io/tutorial.html#feature-detection-and-extraction it says

Note that by convention the upper left corner of an image has coordinate (0, 0) and the center of the upper left most pixel has coordinate (0.5, 0.5).

Kalibr uses AprilTag detection, and detected coordinate will add +0.5 offset here.

(3) This observation is further confirmed by how the those libraries implement rescale function (e.g. here for colmap and here for nerfstudio). If and only if they assume the image coordinate offset to be +0.5, one can apply the simple rescale formula cx' = cx * scale_x. Why? Consider a camera of width 2000 and cx=1000, a point (0, 0, 1) in view space projects to image coordinate (1000, 1000) which essentially sits between pixels whose index are at (999, 999) and (1000, 1000). When rescale the camera by half, the same point would project to image coordinates (500, 500), which sits between pixels whose index are at (499, 499) and (500, 500).

(4) PR #97 actually change the behavior of project_gaussians() and we observe slight decrease in splatfacto quality.

@ichsan2895
Copy link

@ichsan2895 Just want to confirm, your results are based on colmap, right?

Yes, colmap 3.8 with CUDA.

@oseiskar
Copy link
Contributor

@jb-ye Thank you for pointing that out 👍 I stand corrected. So apparently only OpenCV uses the (0,0) convention and even there it's a bit unclearly documented (opencv/opencv#10130).

Another possibility could be just changing the calibration. Subtracting -0.5 from principal point X/Y could do the same and it could be possible to support both (0,0) and (0.5,0.5) conventions if necessary. However, the approach in this PR is cleaner if the 0.5,0.5 convention is consistently followed in the gsplat & Nerfstudio codebases.

Perhaps documenting this more clearly than most software does could be good extra improvement.

@yzslab
Copy link
Contributor

yzslab commented Apr 1, 2024

My experiment GSplat 0.1.8 vs 0.1.9 vs 0.1.10 (your PR)

Red = GSplat 0.1.10 Blue = GSplat 0.1.9 Black = GSplat 0.18

ns-train splatfacto --logging.steps-per-log 200 --vis viewer+wandb --viewer.websocket-port 7007 \
    --pipeline.model.resolution-schedule 3000 \
    --pipeline.model.num-downscales 2 \
    nerfstudio-data \
    --data path/to/scene --downscale-factor 1

Truck (251 images @ 1957 x 1051 already undistorted with colmap )

image

Purancak (my private dataset, 599 images @ 1995 x 1494 already undistorted with colmap)

image

Experiment on blender dataset will show a larger difference. Here are the metrics produce by ns-eval for gsplat 0.1.8 and 0.1.9:
image

This is the training command: ns-train splatfacto --data ~/data/nerf/nerf_synthetic/lego --experiment-name ... --vis tensorboard --pipeline.model.background_color "white" --optimizers.camera-opt.optimizer.lr 0. --pipeline.model.random_scale 2.6 blender-data

@jb-ye jb-ye requested a review from vye16 April 1, 2024 16:53
Copy link
Collaborator

@vye16 vye16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems reasonable to me, thank you for bringing this. for future reference, let's document this somewhere.

@jb-ye jb-ye merged commit 4609d3b into nerfstudio-project:main Apr 2, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants