Skip to content

Conversation

@swahtz
Copy link
Contributor

@swahtz swahtz commented Nov 17, 2025

While working on fixing fvdb.nn.SimpleUNet for the issues reported in #335 where the previously removed fvdb.nn.ReLU module was being used, a few other issues were discovered. This PR proposes to fix the following:

  1. In SimpleUNet, replace the use of fvdb.nn.ReLU(inplace=True) with the functional fvdb.relu_
  2. Remove an erroneous check in SparseConvPackInfo::SparseConvPackInfo that threw an exception if a kmap was being built for a 1x1 conv
  3. Fix the semantics of the parameters to transposed conv to match PyTorch's conventions. Reading PyTorch's Transposed Conv module docs, their convention is that to express a convolution A's transpose convolution B (i.e. the convolution B that inverts the operation performed by A), the channel parameters for B should be the reverse of A (e.g. A's in/out 4/16, B's in/out 16/4) and the 'topological' parameters should be carried over from A (i.e. A's stride=2, B's stride=2; not 1/2, etc.).
    In our API, we seem to expect inverse expressions of both types of parameters (but our SimpleUNet code was expressing neither inversely).
    Therefore, in this PR, SparseConvolutionKernelMap::forward and SparseConvolutionKernelMap::backward have been fixed to treat normal and transposed in/out channel specifications the same way (it is up to the user to reverse them if that's the desired effect per PyTorch's pattern) and SimpleUNet was changed to reverse the ordering of the supplied source/target grid, to match what the source/target girds are of the convolution it wants to invert (also per PyTorch's pattern of specifying arguments like stride from the inverse convolution).

Fixes #335

Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>
Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>
Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>
…ation, I believe in/out channels should be expressed as the desired in/out (not transposed) and the topology parameters (like stride, source/target grid) should be expressed as if they are the 'original' conv operator this transposed conv is meant to invert.

Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>
@swahtz swahtz changed the title Fix use of fvdb.nn.ReLU in SimpleUNet Fixes for fvdb.nn.SimpleUNet Nov 18, 2025
@swahtz swahtz marked this pull request as ready for review November 18, 2025 23:43
@swahtz swahtz requested a review from a team as a code owner November 18, 2025 23:43
@swahtz swahtz closed this Nov 18, 2025
@swahtz swahtz reopened this Nov 18, 2025
@swahtz swahtz added the core library Core fVDB library. i.e. anything in the _Cpp module (C++) or fvdb python module label Nov 18, 2025
@swahtz swahtz self-assigned this Nov 18, 2025
@swahtz swahtz added this to fVDB Nov 18, 2025
@swahtz swahtz added the bug Something isn't working label Nov 18, 2025
Copy link
Contributor

@blackencino blackencino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to hold off on the transpose argument reordering until we validate the default transposed convolution with the same rigor that we validated the regular default convolution. I think the errors you're encountering here are in our transpose implementation.

The relu changes are great.

def forward(self, data: JaggedTensor, padded_grid: GridBatch, grid: GridBatch) -> JaggedTensor:
plan = ConvolutionPlan.from_grid_batch_transposed(
kernel_size=self.kernel_size, stride=1, source_grid=padded_grid, target_grid=grid
kernel_size=self.kernel_size, stride=1, source_grid=grid, target_grid=padded_grid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks backwards to me, and counter to the intention. I need to look at the torch.convtranspose3d code more carefully, and create a testing framework that confirms the ordering. Generally speaking, we should not be reordering arguments like this, it indicates a semantic mismatch. If the transpose plans are wrong compared to pytorch, then we should fix it in the plan, not in the unet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, with source_grid=paded_grid, target_grid=grid, an exception will be thrown as if the order is incorrect. Investigating whether that's a mistake in the transposed convolution code or the ordering of the arguments, is what I was explaining in the PR description above, as to which direction to take this in. If you look at the 'topological' arguments to a PyTorch transposed conv (like stride), a stride of 2 will upsample (insert zeros in the input) in the inverse way a stride of 2 in a conv operator will downsample. So does that imply that we should order our source and target arguments to the transposed conv as if they were the source and target of the conv operator (since we'd also use stride=2, etc. to have the same meaning and not expect stride=1/2)?

I'm fine with not doing this and source/target take the natural meanings, I'm just trying to determine what is the intent both of PyTorch and the state of the transposed conv kmap code. You could also read the PyTorch docs for transposed conv as redefining what 'stride' means in transposed conv as basically 'inverse stride'.

kernels = kernels.permute({2, 3, 4, 0, 1}).reshape({-1, inC, outC}).contiguous();
}

TORCH_CHECK_VALUE(!transposed ? inFeatures.size(0) == sizes[0] : inFeatures.size(0) == sizes[1],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks backwards to me. I need to confirm what is expected in torch before agreeing that this is how it should behave. It may be that we need to switch how our transposed convolution works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working core library Core fVDB library. i.e. anything in the _Cpp module (C++) or fvdb python module

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

fvdb.nn.ReLU usage in fvdb.nn.simple_unet

2 participants