Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking loss in code seems not consistent with the paper. #90

Closed
identxxy opened this issue May 9, 2024 · 2 comments
Closed

Tracking loss in code seems not consistent with the paper. #90

identxxy opened this issue May 9, 2024 · 2 comments

Comments

@identxxy
Copy link

identxxy commented May 9, 2024

To my understanding, in the code, the tracking loss is opacity * L1 loss * a complicated mask (silhouetee) , but in the paper, it is simply L1 loss.

In the code

def get_loss_tracking_rgb(config, image, depth, opacity, viewpoint):

def get_loss_tracking_rgb(config, image, depth, opacity, viewpoint):
    gt_image = viewpoint.original_image.cuda()
    _, h, w = gt_image.shape
    mask_shape = (1, h, w)
    rgb_boundary_threshold = config["Training"]["rgb_boundary_threshold"]
    rgb_pixel_mask = (gt_image.sum(dim=0) > rgb_boundary_threshold).view(*mask_shape)
    rgb_pixel_mask = rgb_pixel_mask * viewpoint.grad_mask
    l1 = opacity * torch.abs(image * rgb_pixel_mask - gt_image * rgb_pixel_mask)
    return l1.mean()

where the viewport.grad_mask is computed here

def compute_grad_mask(self, config):

    def compute_grad_mask(self, config):
        edge_threshold = config["Training"]["edge_threshold"]

        gray_img = self.original_image.mean(dim=0, keepdim=True)
        gray_grad_v, gray_grad_h = image_gradient(gray_img)
        mask_v, mask_h = image_gradient_mask(gray_img)
        gray_grad_v = gray_grad_v * mask_v
        gray_grad_h = gray_grad_h * mask_h
        img_grad_intensity = torch.sqrt(gray_grad_v**2 + gray_grad_h**2)

        if config["Dataset"]["type"] == "replica":
            row, col = 32, 32
            multiplier = edge_threshold
            _, h, w = self.original_image.shape
            for r in range(row):
                for c in range(col):
                    block = img_grad_intensity[
                        :,
                        r * int(h / row) : (r + 1) * int(h / row),
                        c * int(w / col) : (c + 1) * int(w / col),
                    ]
                    th_median = block.median()
                    block[block > (th_median * multiplier)] = 1
                    block[block <= (th_median * multiplier)] = 0
            self.grad_mask = img_grad_intensity
        else:
            median_img_grad_intensity = img_grad_intensity.median()
            self.grad_mask = (
                img_grad_intensity > median_img_grad_intensity * edge_threshold
            )

But in the paper, the tracking loss is simply
image

I would like to ask is my understanding correct or I miss sth?

@WFram
Copy link

WFram commented May 23, 2024

I would like to ask is my understanding correct or I miss sth?

In my opinion, they use opacity and gradient masks to reduce the impact of immature splats on pose estimates. Because opacity depends on the covariance of ellipsoids that define a single gaussian (that's describes their uncertainty), while gradient masks capture high-contrast parts of an image (e.g., edges).

In the paper it's said:

We further <...> penalise non-edge or low-opacity pixels

In this sense, they are providing L1-loss as the residual, and then apply opacity and gradient mask as it's done in the code.

@identxxy
Copy link
Author

Thanks so much for your explaination. I just realized that I was reading the old version paper which does not mention the penaliziton. My bad...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants