-
Notifications
You must be signed in to change notification settings - Fork 14
Fixes for fvdb.nn.SimpleUNet
#336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>
Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>
Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>
…ation, I believe in/out channels should be expressed as the desired in/out (not transposed) and the topology parameters (like stride, source/target grid) should be expressed as if they are the 'original' conv operator this transposed conv is meant to invert. Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>
fvdb.nn.ReLU in SimpleUNetfvdb.nn.SimpleUNet
blackencino
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to hold off on the transpose argument reordering until we validate the default transposed convolution with the same rigor that we validated the regular default convolution. I think the errors you're encountering here are in our transpose implementation.
The relu changes are great.
| def forward(self, data: JaggedTensor, padded_grid: GridBatch, grid: GridBatch) -> JaggedTensor: | ||
| plan = ConvolutionPlan.from_grid_batch_transposed( | ||
| kernel_size=self.kernel_size, stride=1, source_grid=padded_grid, target_grid=grid | ||
| kernel_size=self.kernel_size, stride=1, source_grid=grid, target_grid=padded_grid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks backwards to me, and counter to the intention. I need to look at the torch.convtranspose3d code more carefully, and create a testing framework that confirms the ordering. Generally speaking, we should not be reordering arguments like this, it indicates a semantic mismatch. If the transpose plans are wrong compared to pytorch, then we should fix it in the plan, not in the unet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, with source_grid=paded_grid, target_grid=grid, an exception will be thrown as if the order is incorrect. Investigating whether that's a mistake in the transposed convolution code or the ordering of the arguments, is what I was explaining in the PR description above, as to which direction to take this in. If you look at the 'topological' arguments to a PyTorch transposed conv (like stride), a stride of 2 will upsample (insert zeros in the input) in the inverse way a stride of 2 in a conv operator will downsample. So does that imply that we should order our source and target arguments to the transposed conv as if they were the source and target of the conv operator (since we'd also use stride=2, etc. to have the same meaning and not expect stride=1/2)?
I'm fine with not doing this and source/target take the natural meanings, I'm just trying to determine what is the intent both of PyTorch and the state of the transposed conv kmap code. You could also read the PyTorch docs for transposed conv as redefining what 'stride' means in transposed conv as basically 'inverse stride'.
| kernels = kernels.permute({2, 3, 4, 0, 1}).reshape({-1, inC, outC}).contiguous(); | ||
| } | ||
|
|
||
| TORCH_CHECK_VALUE(!transposed ? inFeatures.size(0) == sizes[0] : inFeatures.size(0) == sizes[1], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks backwards to me. I need to confirm what is expected in torch before agreeing that this is how it should behave. It may be that we need to switch how our transposed convolution works.
While working on fixing
fvdb.nn.SimpleUNetfor the issues reported in #335 where the previously removedfvdb.nn.ReLUmodule was being used, a few other issues were discovered. This PR proposes to fix the following:SimpleUNet, replace the use offvdb.nn.ReLU(inplace=True)with the functionalfvdb.relu_SparseConvPackInfo::SparseConvPackInfothat threw an exception if a kmap was being built for a 1x1 convIn our API, we seem to expect inverse expressions of both types of parameters (but our
SimpleUNetcode was expressing neither inversely).Therefore, in this PR,
SparseConvolutionKernelMap::forwardandSparseConvolutionKernelMap::backwardhave been fixed to treat normal and transposed in/out channel specifications the same way (it is up to the user to reverse them if that's the desired effect per PyTorch's pattern) andSimpleUNetwas changed to reverse the ordering of the supplied source/target grid, to match what the source/target girds are of the convolution it wants to invert (also per PyTorch's pattern of specifying arguments like stride from the inverse convolution).Fixes #335