-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement nerfacto alpha transparency training #2165
Conversation
@SamDM, @jkulhanek, @f-dy let me know if you have any comments :) I tried to address all mentioned suggestions in #2025 while keeping the functionality the same. |
a4bf22a
to
a28c90a
Compare
def blend_background( | ||
cls, | ||
image: Tensor, | ||
outputs: Dict[str, Union[Tensor, list]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please pass the optional background_color directly instead of passing the outputs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, can you please pass RGB image and "opacity" as two separate arguments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passed them as separate arguments. Also, it is probably better to not allow for optional arguments but only call this function if we actually want to blend the background. We can resolve this if you agree with removing the optionality of the arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But shouldn't blending happen always? When background_color=random it would use the provided background_color, otherwise it would use the renderer's background colour? Perhaps I am missing something...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should only happen if the input data includes an alpha channel. If this is the case, we would also have a background_color
to pass to this method.
So you are right, if "background_color" in outputs
is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember looking at this a good while ago and thinking that a scoreboard of transient values, like this run's background color selection, could be more generally useful.
The other use case I had in my head was some way to prevent single bad iterations from having a catastrophic impact, maybe pause/stop training if metrics plummet instead of saving checkpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are you thinking in terms of scoreboard? Do you mean having a dedicated object/place to write values like background_color
to instead of storing them in outputs
?
The other use case I had in my head was some way to prevent single bad iterations from having a catastrophic impact, maybe pause/stop training if metrics plummet instead of saving checkpoint.
I'm sorry, but I'm not quite following how this relates to always blending the background color if an alpha channel is present. Or do you mean as a use case for the scoreboard idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean having a dedicated object/place to write values like background_color to instead of storing them in outputs?
That was the thought, although not sure whether there is need.
Or do you mean as a use case for the scoreboard idea?
Yes, this (sorry)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like something that would warrant a separate pull request that then also implements one of your suggested functionalities.
However, I do see one reason for implementing something like this as part of this PR: All the outputs are currently passed through this resizing operation. This doesn't make much sense for background_color
. It does not crash as the background color is a per-image pixel color, but it still isn't great.
Any opinions on this or separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate PR probably - unless it would fit neatly into this one
Do we think this is in condition to be merged? |
If set the background to random, rendering works fine, but there seems to be a problem with exporting mesh.
|
What does it mean the function is not working properly? Can you please post the error message or something? |
I didn't get amazing results with Poisson on that dataset. This isn't alpha transparent training dependent though... TSDF fusion works great and is significantly improved by alpha transparent training. This is actually my sole use case of this feature. Currently on the way to an airport, but will post some mesh export results once I'm there :) |
@nepfaff , Can you please add me as a contributor to your fork so I can try implementing some changes? If you are not comfortable with this idea, I can alternatively create a new branch... |
If you export pointcloud to a learned model, it will not proceed at 0%, but you will continue to consume hardware resources (GPU). I don't know the exact cause, but I think it's because of the random value in the background. If |
I did manage to run Poisson but I can try with point cloud. Will report the results soon |
Done :) |
What I tested is my custom data, but I will train and test again with mustard data and share the results. |
Can you share the script you used to train mustard data?? |
Does the command in the description not work? |
Thanks! Will try to implement some changes, let me then know if you like them or not... |
It would also be great if you could share your data. The more diverse the test data, the better :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me! Will share some of the testing results soon
It looks good to me now. The export results are as desired and as shown in the previous images. |
Great! @nepfaff, thank you very much for your work. This has been a pleasure. |
I will now merge this to the main, ok? |
Nice! Just to confirm, this mode is automatic when input images contain alpha, but separate mask images will have previous non-carving behavior - is that right? Can both be used simultaneously? |
Almost. It is only automatic if the images have an alpha channel and the background color is random (this is not true by default). Hence, alpha images can be used with the previous behavior if the color is not random. |
Thanks for working on this! Just wondering if this should work with equirectangular images with an alpha channel? The old workflow with separate mask images did not work with equirectangular for some reason. If so, do I just need to add Also, do masks in the alpha channel use less VRAM than having separate masks? I was never able to use those with large datasets as I would get CUDA out of memory errors. |
Hello, Trying to follow up this great work. Is there any way to transform the image format( nerfstudios' original(.png) &camera position(transforms.jason) ) |
It should work. Just add an alpha channel and specify the random background color as you suggested. |
You need to add an alpha channel to your images. This is a 4th channel with values between 0 and 1. For alpha-transparent training you would use binar values where zero is transparent. Then additionally specify a random background color as specified in the PR description. |
Using rembg p <input.img.folder> <output.img.folder> |
Hey, thanks for getting back to me. I tested a few scenes last night, but got strange results with equirectangular and fisheye. My use case is is quite different to the mustard example - I'm masking myself out of 360 equirectangular images, and the black border on circular fisheye images. I exported png images with masks in the alpha from Metashape (black pixels masked) and used "ns-process metashape" to create the downscales, and trained nerfacto-huge with a random background color. The masks were definitely doing something, but the colours are strange and the masked objects were still kind of visible in the NeRF. It definitely seems to use less VRAM than seperate masks though, as I didn't get cuda oom errors for the first time! :) |
@gradeeterna to mask yourself out, you shouldn't use these masks ("alpha transparency trauining" / "alpha carving"), but the "ignore" masks. cat > add_mask_to_transforms_json.py <<EOF
#!/usr/bin/env python
import sys
import json
if len(sys.argv) != 3:
print(f"Usage: {sys.argv[0]} input_transforms.json output_transforms.json")
sys.exit(1)
with open(sys.argv[1]) as input_file:
file_contents = input_file.read()
parsed_json = json.loads(file_contents)
for frame in parsed_json["frames"]:
frame["mask_path"] = "masks/mask.png"
with open(sys.argv[2], "w") as output_file:
json.dump(parsed_json, output_file, indent=4)
EOF cat > downsize_mask.py <<EOF
#!/usr/bin/env python
import cv2
import sys
import json
from pathlib import Path
if len(sys.argv) != 2:
print(f"Usage: {sys.argv[0]} path_to/mask.png")
print(f"Output is path_to/masks_{downscale}/mask.png")
sys.exit(1)
mask_path = Path(sys.argv[1])
mask = cv2.imread(str(mask_path), cv2.IMREAD_GRAYSCALE)
height, width = mask.shape[:2]
processed_data_dir = mask_path.parent
downscale_factors = [2, 4, 8]
for downscale in downscale_factors:
mask_path_i = processed_data_dir / f"masks_{downscale}"
mask_path_i.mkdir(exist_ok=True)
mask_path_i = mask_path_i / "mask.png"
mask_i = cv2.resize(
mask, (width // downscale, height // downscale), interpolation=cv2.INTER_NEAREST
)
cv2.imwrite(str(mask_path_i), mask_i)
print(f"Wrote {mask_path_i}")
EOF |
@f-dy Hey, using a single ignore mask for all my fisheye images does work, but it slows down training by about 5x. When I use one mask per image with a big dataset, I get out of memory errors on my 3090 24GB. I was trying this method in case it was faster and uses less memory. The way masks work with NGP is great, and doesn't seem to slow down training or increase memory usage. One mask per image goes in the main images folder, and they don't need to be added to the transforms.json. Thanks for sharing those scripts, very useful! |
(IIRC there is an option that can be set which puts masks on gpu which significantly speeds up training.) |
Oh yeah, just found it - Training speed is almost the same as without masks, thanks a lot! |
This PR cleans up and replaces #2025.
Test data
Mustard data set with alpha masks.
https://drive.google.com/file/d/1XX4ioj9NgaRoMIA00Negp5x9XM8gWxjD/view?usp=sharing
Results
Accumulation without alpha transparent training:
Accumulation with alpha transparent training:
Closes #1498
Closes #2025