New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor results for 360 captured scene #50
Comments
There are two reasons:
Hence, the closest depth is computed from the upper cameras (because they are closer to the scene). In this case, the predefined object range Lines 244 to 245 in 748d817
DOES NOT include the object at all for the images taken at the lower two rounds (because they are farther from the object, and the far value is set too small)!Therefore, the model learns a weird 3d structure.
To solve the issue due to the above two reasons, you can:
After making the above changes, and use the following command to train
You could do better by training more epochs or increasing number of samples on the ray or increasing the resolution. Finally, the other implementation doesn't fail because
Let me know if there's still issue (my result is more noisy than I think, I'd like to know if it gets resolved if you train longer at higher resolution) |
Thanks for the quick reply! I removed the first part (ie: removed the min on "min(8 * near, self.bounds.max())"). I also tried setting the scale factor as follows:
Which gives a scale factor of 87.20806618688762 vs something around 70 for the jaxnerf repo. Is this derivation correct / something that you'd like me to submit a PR for, or should it be further refined to reflect the values of the other repository? I also noted that in the suggested command above you are not using the use_disp flag - is that intentional or an oversight/flag that I should indeed be using? In terms of further improving the quality of the reconstruction, I've tried using --N_importance and --N_samples flags to values that reflect that of https://github.com/google-research/google-research/blob/master/jaxnerf/configs/llff_360.yaml, but the GPU memory demands seem to be very high, even when using small batch sizes (820) and chunk sizes (1024), and generally slower than the other repo's implementation. With 4 2080Ti GPUs running an epoch seems to take 10 hours and I actually ended up getting a segmentation fault near the end :( Epoch 0: 94%|████████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 13578/14441 [9:48:04<37:22, 2.60s/it, loss=0.024, val/psnr=8.02, train/psnr=20.9]Segmentation fault And when trying to use 10 2080Ti gpus instead of 4, the initial validation sanity check takes an extremely long time (about 1 hour). Do you have suggestions on what might be happening and how to make things more performant? |
I answer the last question first, the validation should run the same time no matter how many gpus are used, maybe there's synchronization problem between the gpus (some run slower than others)? |
Re: the slowness with 10 gpus, could this just be due to how long it takes to copy the validation data to the gpus (which is being done sequentially)? The more vexing issue to me is the blurriness in the results relative to other implementations like the jaxnerf one. For an easier reproduction, consider the training set at https://drive.google.com/file/d/1QqRfYaKNrH98VGl8GAd9SYds88iNEyM0/view?usp=sharing. Here's the captured poses should be just one orbit unlike the fountain set in the original post. Training with a scale factor of 4 (320x180 instead of 1280x720) and the same sample settings (256 coarse and 512 fine samples) trains pretty quickly in both implementations however the quality varies significantly. In this repo I get a low-quality background after 30 epochs even with the default settings, removing the min(8 * near) section you mentioned earlier, manually setting near and far to 0.2 and 100 as in the jaxnerf implementation, training for more epochs than 30, etc: Whereas the jaxnerf implementation very quickly converges to a nearly perfect result: My full training command is:
|
The To sum up, for fountain scene you'd need to scale the poses fairly largely, and set the far plane farther since it has a large portion of background. The I will not apply these changes to the branches because it will cause the pretrained models to break. Finally, there is nerf++ that better handles this kind of far backgrounds than NeRF that you might want to try. |
First of all, thanks for the great implementation!
I've managed to get good results with the code in this repository for frontal scenes but am struggling to have it properly work with 360 captures. Attached is an example capture of a fountain taken from a variety of different angles (and where the camera poses are "ground truth" poses gathered from the simulation): https://drive.google.com/file/d/1FbtrupOXURc0eTDtDOmD1oKZz5e2MIAE/view?usp=sharing
Below are the results after 6 epochs of training (I've trained for longer but it never converges to anything useful).
In contrast, other nerf implementations such as https://github.com/google-research/google-research/tree/master/jaxnerf seem to provide more sensible results even after a few thousand iterations:
I'm using the spherify flag and have tried both with and without the use_disp option. I've also tried setting N_importance and N_samples to match the config of the other NeRF implementation that I tried (https://github.com/google-research/google-research/blob/master/jaxnerf/configs/llff_360.yaml). Would you have any pointers as to where the difference could be coming from?
The text was updated successfully, but these errors were encountered: