Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable coarse-to-fine training #2984

Merged
merged 2 commits into from
Mar 11, 2024
Merged

enable coarse-to-fine training #2984

merged 2 commits into from
Mar 11, 2024

Conversation

jb-ye
Copy link
Collaborator

@jb-ye jb-ye commented Mar 5, 2024

In the original Inria GS code, the coarse-to-fine training was never implemented. This is because the authors didn't observe benefits for training images no greater than 1600 pixels. However, a lot users want to train splatfacto at very high resolutions (e.g. 4k images). I found that when training at high resolutions, it is very likely for the optimization stuck at local minimums (due to various reasons, such as SfM errors, thin structures, aliasing etc.): a lot fine details are not reconstructed properly through densification of gaussians. One counter-intuitive phenomenon is when training at higher resolution, splatfacto actually creates less gaussians.

One way to work around this issue is to re-enable coarse-to-fine training in splatfacto. I found that the default resolution_schedule=250 is too short to recover fine details. I basically set it to 3000 to allow more densifications when training with coarser images. The change also significantly shorten the time to finish the first 6k iterations.

I experiment with mip360 dataset's bicycle, garden, and some private datasets, and found the setting can meaningfully improve metrics and also visually when training with images at about 1k resolution or higher. I also want to learn about this setting from other users.

@Zunhammer
Copy link
Contributor

Hi and thanks for your efforts in testing to improve. I've tested with a single high-res dataset (slightly above 4k) but couldn't really confirm better visual results. PSNR is oscillating a lot in this dataset which is why I cannot compare those values. I think it might also depend on how images were taken/are aligned? Could you share some more details about your datasets where you experience improvements?

@ichsan2895
Copy link

Interesting, If I have free time, I will try this implementation and share the result...

@ichsan2895
Copy link

ichsan2895 commented Mar 11, 2024

My Experiment:

Desolation dataset

Images size: 1899x1064
Already processed by colmap image_undistorter, then Recreating sparse_pc.ply and transforms.json.
Using splatfacto-big & antialiased rasterizer.

Blue : without coarse to fine training

ns-train splatfacto-big --logging.steps-per-log 200 --vis viewer+wandb --viewer.websocket-port 7007 \
    --pipeline.model.rasterize_mode antialiased \
    nerfstudio-data \
    --data path/to/scene --downscale-factor 1

Red : with coarse to fine training

ns-train splatfacto-big --logging.steps-per-log 200 --vis viewer+wandb --viewer.websocket-port 7007 \
    --pipeline.model.rasterize_mode antialiased \
    --pipeline.model.resolution-schedule 3000 \
    --pipeline.model.num-downscales 2 \
    nerfstudio-data \
    --data path/to/scene --downscale-factor 1

image

@ichsan2895
Copy link

ichsan2895 commented Mar 11, 2024

My Second Experiment:

Truck dataset

Images size: 3904x2176 ((has been enlarged 4 times with Waifu4X super resolution))
Already processed by colmap image_undistorter, then Recreating sparse_pc.ply and transforms.json.
Using splatfacto-big & classic rasterizer.

Blue : without coarse to fine training

ns-train splatfacto-big --logging.steps-per-log 200 --vis viewer+wandb --viewer.websocket-port 7007 \
    nerfstudio-data \
    --data path/to/scene --downscale-factor 1

Red : with coarse to fine training

ns-train splatfacto-big --logging.steps-per-log 200 --vis viewer+wandb --viewer.websocket-port 7007 \
    --pipeline.model.resolution-schedule 3000 \
    --pipeline.model.num-downscales 2 \
    nerfstudio-data \
    --data path/to/scene --downscale-factor 1

image

Like @jb-ye being said:

One counter-intuitive phenomenon is when training at higher resolution, splatfacto actually creates less gaussians.

Yes, it seems right. lower resolution (976x544) has higher gaussians than current repo & this PR too. Please see black line
image

Copy link
Collaborator

@kerrj kerrj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@kerrj kerrj enabled auto-merge (squash) March 11, 2024 17:45
@kerrj kerrj merged commit 8e0c687 into nerfstudio-project:main Mar 11, 2024
2 checks passed
ichsan2895 added a commit to ichsan2895/nerfstudio that referenced this pull request Mar 12, 2024
@lxzbg
Copy link

lxzbg commented Mar 13, 2024

good work! I also have similar findings, at high resolution, shooting high frequency signal(sofa part), when the image is downsampled, the Gaussian distribution is more (left is original, right is downsampled)

Image 1 Description Image 2 Description

Image 1 Description Image 2 Description

@jb-ye
Copy link
Collaborator Author

jb-ye commented Mar 13, 2024

good work! I also have similar findings, at high resolution, shooting high frequency signal(sofa part), when the image is downsampled, the Gaussian distribution is more (left is original, right is downsampled)

Image 1 Description Image 2 Description

Image 1 Description Image 2 Description

Did this PR improve your results at the original resolution?

@lxzbg
Copy link

lxzbg commented Mar 14, 2024

good work! I also have similar findings, at high resolution, shooting high frequency signal(sofa part), when the image is downsampled, the Gaussian distribution is more (left is original, right is downsampled)
Image 1 Description Image 2 Description
Image 1 Description Image 2 Description

Did this PR improve your results at the original resolution?

I haven't tried it yet. I'll let you know when I get the results.

@ichsan2895
Copy link

ichsan2895 commented Mar 17, 2024

My Third Experiment:

Purancak dataset (Private Dataset)

Images size: 2000x1500
Already processed by ns-process-data image --data /path/to/images --output-dir /path/to/output --skip-image-processing, then copying original images to output-dir.
Using splatfacto & classic rasterizer.

Blue : without coarse to fine training

ns-train splatfacto --logging.steps-per-log 200 --vis viewer+wandb --viewer.websocket-port 7007 \
    nerfstudio-data \
    --data path/to/scene --downscale-factor 1

Red : with coarse to fine training

ns-train splatfacto --logging.steps-per-log 200 --vis viewer+wandb --viewer.websocket-port 7007 \
    --pipeline.model.resolution-schedule 3000 \
    --pipeline.model.num-downscales 2 \
    nerfstudio-data \
    --data path/to/scene --downscale-factor 1

image

It gives a little bit improvement so far in my third experiment. @jb-ye

Michael-Spleenlab pushed a commit to Michael-Spleenlab/nerfstudio that referenced this pull request Apr 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants