Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizer initialization issue in DeepLabv3+ #21

Closed
SunnerLi opened this issue Jul 21, 2018 · 4 comments
Closed

Optimizer initialization issue in DeepLabv3+ #21

SunnerLi opened this issue Jul 21, 2018 · 4 comments

Comments

@SunnerLi
Copy link

SunnerLi commented Jul 21, 2018

Sorry to bother!
Recently, I try to use DeepLabv3+ and train the new model.
Also, I'm very thank that you can provide the code of model.
However, there is some error that will occur:

TypeError: optimizer can only optimize Tensors, but one of the params is NoneType

I think the issue is that the bias term in first convolution layer is set as False.
This is the default setting in standard ResNet.
However, the initialization part will yield the bias term into SGD constructor.
Hence the SGD raise Exception since the param is Nonetype.
Here is the part of the SGD source:

for param in param_group['params']:
    if not isinstance(param, Variable):
        raise TypeError("optimizer can only optimize Variables, "
                        "but one of the params is " + torch.typename(param))
    if not param.requires_grad:
        raise ValueError("optimizing a parameter that doesn't require gradients")
    if not param.is_leaf:
        raise ValueError("can't optimize a non-leaf Variable")

I give some advice at the end!
Maybe we can add some constraint to check if the bias term is None in train.py.
Just like the following:

def get_lr_params(model, key):
    # For Dilated FCN
    if key == "1x":
        for m in model.named_modules():
            if "layer" in m[0]:
                if isinstance(m[1], nn.Conv2d):
                    for p in m[1].parameters():
                        yield p
    # For conv weight in the ASPP module
    if key == "10x":
        for m in model.named_modules():
            if "aspp" in m[0]:
                if isinstance(m[1], nn.Conv2d):
                    yield m[1].weight
    # For conv bias in the ASPP module
    if key == "20x":
        for m in model.named_modules():
            if "aspp" in m[0]:
                if isinstance(m[1], nn.Conv2d):
                    if m[1].bias is not None:    # Add this line
                        yield m[1].bias

After this small revision, the code can run normally.

@kazuto1011
Copy link
Owner

kazuto1011 commented Jul 23, 2018

Thank you for suggesting the revision. As far as I can see the last snippet, I think the issue is related to the improved ASPP module in the v3+ rather than the non-biased conv in ResNet. The yielding of the ResNet part is done in the "1x" scope without causing the NoneType error. The script train.py is made just for parsing and training the params in the v2 model. The reported error is due to the fact that the v3+ ASPP does not have biasses, while the v2 one has them.
Anyway, I think we need more strict modification for adapting it to v3/v3+, e.g., batch norms should also be observed/trained. I'm sorry but this codebase does not assume the v3/v3+ training now.

@SunnerLi
Copy link
Author

SunnerLi commented Jul 23, 2018

I think you are right!
However, I hope that the the training part of v3/v3+ can be bare soon
Very thanks for your contribution!

@Ericargus
Copy link

class _ConvBatchNormReLU(nn.Sequential):
def init(
self,
in_channels,
out_channels,
kernel_size,
stride,
padding,
dilation,
relu=True,
):
super(_ConvBatchNormReLU, self).init()
self.add_module(
"conv",
nn.Conv2d(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=kernel_size,
stride=stride,
padding=padding,
dilation=dilation,
bias=False,
),
)
###
I think the problem is in you _ConvBatchNormReLU function, because you set conv's bias to be False

@kazuto1011
Copy link
Owner

Do you mean the _ConvBatchNormReLU in v3+ ASPP? I have mentioned above:

The reported error is due to the fact that the v3+ ASPP does not have biasses, while the v2 one has them.

The non-biased conv is from the official implementation. And the init part is just for v2 here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants