New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Layout field to Conv and Pool nodes and remove OCL specific versions #3367
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, this looks pretty clean. Few comments.
llvm::ArrayRef<unsigned_t> strides, | ||
llvm::ArrayRef<unsigned_t> pads, unsigned_t group, | ||
unsigned_t dilation = 1); | ||
ConvolutionNode * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update doxygen with the new param everywhere.
@@ -203,6 +203,9 @@ inline ShapeHWD calculate3DConvPoolOutputDims( | |||
/// Modes of the padding operation. | |||
enum PaddingMode { CONSTANT = 0, REFLECT, EDGE }; | |||
|
|||
/// Convolution Layouts. | |||
enum ConvolutionLayout { NHWC = 0, NCHW }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is applicable to pools as well, i could see this as a TensorLayout.
Slight downside in that case is that TensorLayout might be too generic and overloaded term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think TensorLayout is too general, but can't think of a better name :/
@@ -753,6 +755,7 @@ static void fwdMaxPool(Tensor *inW, Tensor *outW, Tensor *argmaxW, | |||
} | |||
|
|||
void BoundInterpreterFunction::fwdMaxPoolInst(const MaxPoolInst *I) { | |||
assert(I->getLayout() == NHWC && "Glow Interpreter supports only NHWC Pools"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same checks for gradient version?
@@ -1804,14 +1793,6 @@ bool OCLBackend::isOpSupported(const NodeInfo &NI) const { | |||
{ConvolutionNode::BiasIdx}) && | |||
(NI.getInElemTy(ConvolutionNode::BiasIdx) == ElemKind::Int32QTy); | |||
|
|||
case Kinded::Kind::OCLConvolutionNodeKind: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for convolution: we do not support NCHW with group > 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but the node doesn't get Transformed if group is > 1. That's also a runtime param that can't be checked here in isOpSupported.
tools/ClassGen/NodeGen.cpp
Outdated
.setDocstring( | ||
"Performs an Average Pool operation on the Input given " | ||
"provided Kernels, Strides, and Pads. Supported layouts are defined " | ||
"in the COnvolutionLayout enum: NHWC and NCHW."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: COnvo../Convo...
@@ -138,6 +140,7 @@ int main(int argc, char **argv) { | |||
.addMember(MemberType::VectorUnsigned, "Kernels") | |||
.addMember(MemberType::VectorUnsigned, "Strides") | |||
.addMember(MemberType::VectorUnsigned, "Pads") | |||
.addMember(MemberType::Unsigned, "Layout") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not enum as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enums are not supported for Instructions (enum is basically a typedef of Unsigned anyway in the Nodegen so no real difference).
addressed comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nickgg has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nickgg has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nickgg has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@nickgg are you going to merge this? |
Summary: For convolutions in glow we use NHWC layout, but it's more efficient to use NCHW on GPUS. To enable this the OCL backend adds some specific OCLConvolution, OCLMaxPool and OCLAvgPool nodes and transforms general Conv and Pool nodes into them by shuffling dimensions and adding TransposeNodes.
This PR adds a field
Layout
to ConvolutionNode, MaxPoolNode and AvgPoolNode which can be either NHWC or NCHW. The OCL backend still transforms these nodes but into the same node type with the layout set to NCHW (we still add transposes). This will allow us to reuse this logic in other backends with the same NCHW preference without needing multiple identical BackendConvolutionNodes (etc).This DOES NOT add NCHW convolution support to any backends, or change any OCL kernels - it just removes OCL specific nodes in favour of more general ones.
This was a consensus decision, but worth thinking about whether or no the code is clearer after this change or not too.
Documentation: deleted the section in docs referencing these nodes.
Test Plan: ninja test in various modes (debug, release, asan). Ran resnet-runtime on OCL, ran image-classifier on OCL, tracing-compare across interp, cpu and ocl.