Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The problem of inconsistent effects between torchsparse's conv3d downsampling and minkowski's #242

Closed
Jovendish opened this issue Sep 21, 2023 · 5 comments

Comments

@Jovendish
Copy link

Jovendish commented Sep 21, 2023

I'm using torchsparse's conv3d to do a downsampling operation with stride 2, but found that this operation not only reduces the size of the feature tensor, but also the coordinates, which is inconsistent with minkowski's performance. I was hoping to find a way to make torchsparse's conv3d's downsampling operation consistent with minkowski.

I checked the documentation of torchsparse but didn't find a relevant solution.

Is there any other parameter setting or custom operation method that can make torchsparse's conv3d downsampling operation consistent with minkowski? If you have any suggestions or guidance I would be very grateful.

Details

WechatIMG10

![WechatIMG11](https://github.com/mit-han-lab/torchsparse/assets/25397930/d9d041e8-1903-
4308-847c-b3987b67c739)

@zhijian-liu
Copy link

When applying a downsampling operation with a stride of 2, the coordinates are effectively halved. If you wish to maintain the original coordinate scale, you can easily achieve this by multiplying the coordinates by 2.

@Jovendish
Copy link
Author

Jovendish commented Oct 4, 2023

thanks for your reply.

In my code, I have three layers of downsampling operations. I have tried to scale the coordinates back to their original scale by multiplying them by two after each downsampling layer. However, I have noticed that this operation only works for the first downsampling layer, as the subsequent downsampling layers yield the same results. I'm unsure if I made a mistake in my implementation or if I have encountered some specific mechanism in torchsparse.

Partial code `
F.set_conv_mode(2)
F.set_kmap_mode('hashmap')
F.set_downsample_mode('minkowski')

class Encoder(torch.nn.Module):
    def __init__(self, channels=[1, 16, 32, 64, 32, 8]):
        super().__init__()

        self.stack_0 = nn.Sequential(
            spnn.Conv3d(channels[0], channels[1], 3, 1, bias=True),
            spnn.ReLU(inplace=True),
            spnn.Conv3d(channels[1], channels[2], 2, 2, bias=True),  # DownScale
            spnn.ReLU(inplace=True),
        )

        self.stack_1 = nn.Sequential(
            spnn.Conv3d(channels[2], channels[2], 3, 1, bias=True),
            spnn.ReLU(inplace=True),
            spnn.Conv3d(channels[2], channels[3], 2, 2, bias=True),  # DownScale
            spnn.ReLU(inplace=True),
        )

        self.stack_2 = nn.Sequential(
            spnn.Conv3d(channels[3], channels[3], 3, 1, bias=True),
            spnn.ReLU(inplace=True),
            spnn.Conv3d(channels[3], channels[4], 2, 2, bias=True), # DownScale
            spnn.ReLU(inplace=True),
        )

    def forward(self, x):
        out_0 = self.stack_0(x)
        out_0.C[:, 1:] *= 2

        out_1 = self.stack_1(out_0)
        out_1.C[:, 1:] *= 2

        out_2 = self.stack_2(out_1)
        out_2.C[:, 1:] *= 2

        return [out_2, out_1, out_0]

`

Result

WechatIMG21893

WechatIMG21894

WechatIMG21895

WechatIMG21896

@ys-2020
Copy link
Contributor

ys-2020 commented Oct 9, 2023

Hi @Jovendish , in the 2nd and 3rd layers, you are downsampling by stride=2 with the coordinates that has been multiplied by 2. Thus, the number of points will remain the same as the previous layers.

A potential solution might be:

def forward(self, x):
        out_0 = self.stack_0(x)
        out_1 = self.stack_1(out_0)
        out_2 = self.stack_2(out_1)

        out_0.C[:, 1:] *= 2
        out_1.C[:, 1:] *= 4
        out_2.C[:, 1:] *= 8

        return [out_2, out_1, out_0]

@Jovendish
Copy link
Author

Thank you very much for your patience. But actually, I want to be able to scale back in the middle of each layer, because I need to do some extra work in the middle of each layer, and I am wondering why torchsparsev2.1 changed the behavior of the downsampling layer,are there any considerations ?

@zhijian-liu
Copy link

You can follow @ys-2020 's approach to clone the coordinate tensor and do the scaling in the middle. We change this behavior to follow SpConv.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants