Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bias term in Conv operator #54

Closed
zhreshold opened this issue Sep 24, 2017 · 8 comments
Closed

Bias term in Conv operator #54

zhreshold opened this issue Sep 24, 2017 · 8 comments

Comments

@zhreshold
Copy link

Is there a possible bug in Conv operator defined in ConvOpSchemaGenerator?

schema.NumInputs(2, 3);
            schema.NumOutputs(1);
            schema.Input(0,
                         "X",
                         "Input data tensor from previous layer; has size (N x C x H x W)"
                         ", where N is the batch size, C is the number of channels, and"
                         " H and W are the height and width. Note that this is for the 2D image."
                         "Otherwise the size is (N x D1 x D2 ... x Dn)");
            schema.Input(1,
                         "weights",
                         "The weight tensor that will be used in the convolutions; "
                         "has size (M x C x kH x kW), where C is the number of channels, "
                         "and kH and kW are the height and width of the kernel, and M is the number "
                         "of feature maps. For more than 2 dimensions, the kernel shape will be "
                         "(M x C x k1 x k2 x ... x kn), where is the dimension of the kernel");
            schema.Output(0,
                          "Y",
                          "Output data tensor that contains the result of the convolution. The "
                          "output dimensions are functions of the kernel size, stride size, "
                          "and pad lengths.");

Input 2 as bias missing here?

BTW, there's no 'num_output' attribute for conv, conv_transpose, fullyconnected, and no 'use_bias' attribute as well.
In other words, these attributes are not retrievable in graph, instead, we have to parse these attributes from the initializers, this design is obviously not friendly to frameworks other than caffe2.

Am I missing something?

@zhreshold
Copy link
Author

Okay, seems like bias terms are explicitly excluded from convolution op and registered as an 'add' op. Is it on purpose?

@ebarsoum
Copy link
Contributor

@zhreshold It isn't missing, the reason that we still have number of inputs 2 and optional 3, is to avoid breaking PyTorch or Caffe2. In my previous PR, I updated the number of inputs to only have 2 inputs and I revert it back temporarily in order to avoid breaking current back-end. So yes it is in purpose for bias simply use add op.

@ezyang
Copy link
Contributor

ezyang commented Sep 25, 2017

This is related to the situation at #24, where there is a tension between keeping the operator definitions minimal (and thus easier to implement from first principles) versus having fused operators which accurately reflect the SoA implementations that everyone uses (e.g. CuDNN)

@bddppq
Copy link
Member

bddppq commented Sep 25, 2017

@zhreshold How is Caffe2 different from other frameworks regarding to retrieving these information in the graph?

@zhreshold
Copy link
Author

@bddppq Is there anyway I can get the # channles for convolution or # hidden nodes in FC in GraphProto without checking the shape of weights?
In the example of onnx-caffe2, https://github.com/onnx/onnx-caffe2/blob/master/onnx_caffe2/backend.py, it seems like no such kind of information is required for caffe2?

@ezyang
Copy link
Contributor

ezyang commented Oct 9, 2017

@zhreshold Unfortunately not. If you have a need for it, we should consider adding it; there is currently a debate about how much information should/should not go in operators; the more info we add, the harder it is for producers but the easier it is for consumers. Use cases will help us make a better decision.

@ebarsoum
Copy link
Contributor

FC is experimental and we put there temporarily because Caffe depend on it. But I was surprised that the ONNX model that we share uses FC instead of core ONNX OPs that we agree on.

@ezyang
Copy link
Contributor

ezyang commented Oct 11, 2017

PyTorch HEAD has been fixed to use Gemm instead of FC, but I think @zhreshold will have the same problem with Gemm.

@bddppq bddppq closed this as completed Aug 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants