Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ONNXModelLoader] Enabling operator instance based mixed precision support #5145

Closed
wants to merge 1 commit into from

Conversation

quic-grathina
Copy link

@quic-grathina quic-grathina commented Dec 8, 2020

Summary:
Mixed precision mode enables running certain operator instances in fp16, certain OP instances in FP32 and rest in INT8 (PGQ)
The Operator instances required to run in FP16 are specified by the name of their “first output” in a yaml file. For such operator instances convertTofp16 nodes are added at its input (activations) and convertTofp32 nodes are added to its output. For constant inputs, tensor payload is converted to FP16. When used along with quantization, the operator instances that need to run in INT8 precision are not specified in the yaml file and they will run in INT8 through the regular quantization path if operator is supported in the backend. The Operator kinds that need to remain in their original precision can be specified with the existing compiler option “keep-original-precision-for-nodes”.

What this PR contains

  1. Code changes in ONNX model loader where the OPs indicated in the yaml file to run in FP16 are created as a MAP. At the time of loadOperator() the OP is loaded in FP16 if the instance is present in the map
  2. Specific handling of OPs like NMS and Resize whose input are mapped as attributes in GLOW IR node. Other OPs in this category and will be updated on similar lines
  3. Additions to graph optimization layer that fuses/ optimizes out quantize(convertTo16(x)) -> quantize(x), convertTo(dequantize(x)) -> dequantize(x), dequantize(quantize(x)) -> ConvertTo if “x” is in FP32 and the o/p was required in FP16 or "x" is in FP16 and o/p was required in FP32

Documentation:
Will be updated at the time of submission if the approach is fine.

Test Plan:
Test case added in OnnxImporterTest, GraphOptzTest

@facebook-github-bot
Copy link

Hi @rgopinath8!

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file.

In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

@facebook-github-bot
Copy link

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

2 similar comments
@facebook-github-bot
Copy link

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

@facebook-github-bot
Copy link

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

@quic-grathina quic-grathina force-pushed the mixed_precision branch 3 times, most recently from 4908f60 to a70e143 Compare December 9, 2020 15:56
Copy link
Contributor

@jfix71 jfix71 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rgopinath8 Great to see this functionality upstreamed!

I added comments -- one main one about the way this is implemented in the loader and the need for fixups when there are multiple users. LMK your thoughts!

#ifndef GLOW_IMPORTER_MODELLOADERPRECISION_H
#define GLOW_IMPORTER_MODELLOADERPRECISION_H

#include "llvm/ADT/APInt.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unused?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bump -- is this used?

/// Holds info about mixed precision details which can be used across model
/// loaders
struct ModelLoaderPrecisionConfiguration{
/// Used during operator loading while costructing glow graph to keep the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: costructing

/// conversion is skipped and FP16 conversion is done for any node kinds
/// found here). This creates a graph where some nodes execute in quantized
/// or FP32 precision and remaining in FP16 precision. If the node kind
/// specified via it's name is unsupported by the backend in FP16 precision
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grammar: it's -> its

// Open YAML input stream.
llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> text =
llvm::MemoryBuffer::getFileAsStream(fileName);
CHECK(!text.getError()) << "Unable to open file with name: "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can return MAKE_ERR() here instead of CHECK?

void setModelLoaderPrecisionOpt(llvm::StringRef fileName);

/// Deserialize Model loader precision info from the \p YAML file
bool deserializeModelLoaderPrecisionInfosFromYaml(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We never check the return value here/it seems irrelevant for now. And we do some error checking inside of the function, though it dies via CHECK if there's an issue reading the yaml. Can we instead return an Error?

Comment on lines 461 to 472
//Re-assign non constant inputs to original nodevalues
for (auto itr = nonConstantsNodeValueMap.begin();
itr != nonConstantsNodeValueMap.end(); ++itr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//Re-assign non constant inputs to original nodevalues
for (auto itr = nonConstantsNodeValueMap.begin();
itr != nonConstantsNodeValueMap.end(); ++itr) {
// Re-assign non constant inputs to original nodevalues
for (auto itr = nonConstantsNodeValueMap.begin(),
e = nonConstantsNodeValueMap.end(); itr != e; ++itr) {

tests/unittests/GraphOptzTest.cpp Show resolved Hide resolved
@@ -5087,7 +5089,54 @@ Error ONNXModelLoader::loadNetwork(ONNX_NAMESPACE::GraphProto &net,
continue;
}
}

// Find if OpOutPutName is specified to set in fp16. Node precision can
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for consistency with code

Suggested change
// Find if OpOutPutName is specified to set in fp16. Node precision can
// Find if OpOutputName is specified to set in fp16. Node precision can

RETURN_IF_ERR(loadOperator(op));

// If OpOutPutName is specified to run in fp16, the operator outputs will
// be FP16 kind but next operator maynot run in fp16 precision. So add
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: maynot

Comment on lines 412 to 423
// If input type is fp32 add convert to fp16 node and update
// Global nodeValueByName_ map, so that operator in the model
// recieves appropriate input value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I'm not sure I like this approach very much. I understand it makes things cleaner in that all ops can retrieve their inputs naturally as they currently do via nodeValueByName_. However as you have had to implement it requires some sort of fixup via the nonConstantsNodeValueMap. I wonder if instead we could e.g. modify getNodeValueByName(), so that it checks if it should insert a ConvertToNode to FP16 before returning the NodeValue. Then we never modify nodeValueByName_ and don't require the fixup. Additionally, CSE during graph optimizations should combine many ConvertToNodes that all use the same op. WDYT?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jfix71,
I agree with you that changing the getNodeValueByName() and getConstantByName() we can avoid the fixup needed for nodes with multiple users via nonConstantsNodeValueMap. To handle operator output precision I can think of two approaches

  1. Modify addNodeAsOutput() method to add ConvertToFP32 node to its outputs and update nodeValueByName_.

  2. In getNodeValueByName() check if operator output name is not specified to run in fp16 and input is coming from ConvertToFP16 node, then add ConvertToFP32 and return the NodeValue. So getNodeValueByName() will return three possible NodeValues

          -    Original NodeValue
          -    Add ConvertToFP16 node and return the NodeValue if operator is needed to be in FP16 precision
          -    Add ConvertToFP32 node if the operator is needed to be in FP32 precision but the input is coming from a Node set to FP16 precision
    

Kinldy provide your suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rgopinath8 I think option 1 is fine, except that there are a non-trivial number of places where we do not go through addNodeAsOutput() and instead update nodeValueByName_ directly. If you update other places and make sure that is consistent then it's a reasonable approach, and we'd need to ensure future cases always use addNodeAsOutput() too. It's probably a better approach than 2 if you want to go update all such places.

@quic-grathina
Copy link
Author

Hi @jfix71,

  1. We are observing some accuracy loss when running certain instances of operators in FP16 along with Quantization
  2. If we select an operator instance to run in fp16 which otherwise would have been fused/sinked/hoisted later by GLOW optimization pipeline (leading to the operator not getting fused due to datatype difference between consecutive node) , we have inaccuracy
    Example: Conv instances are set to fp16 in model loader, due to which BatchNorm nodes following Conv will not be fused with Conv due to datatype mismatch. Instead of only running all Convolutions in fp16 (bad accuracy as per point 2), if we run Convolutions and BN in fp16, we have no accuracy loss.

So do you think it is advisable to set BatchNorm to FP16 if preceding Convolution is set to FP16. If yes this can be implemented in two places

  1. New optimization pass in FunctionPassPipeline which should come before OptimizeBatchNorm pass.
  2. In the Model loader

Kindly let me know your suggestions.

Copy link
Contributor

@jfix71 jfix71 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rgopinath8 -- so it seems to me that a preferable option is that the model should simply specify the BN should be in FP16 too. I.e. it doesn't seem to me like Glow should be making precision decisions without the user specifying this to be the case -- if the input BN is specified to run in FP32 then that should be respected unless explicitly told to do so. Why isn't the BN set to FP16?

Another option could be to just update the optimization to fuse FP32 BNs into FP16 Convs.

Comment on lines 412 to 423
// If input type is fp32 add convert to fp16 node and update
// Global nodeValueByName_ map, so that operator in the model
// recieves appropriate input value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rgopinath8 I think option 1 is fine, except that there are a non-trivial number of places where we do not go through addNodeAsOutput() and instead update nodeValueByName_ directly. If you update other places and make sure that is consistent then it's a reasonable approach, and we'd need to ensure future cases always use addNodeAsOutput() too. It's probably a better approach than 2 if you want to go update all such places.

@quic-grathina
Copy link
Author

quic-grathina commented Jan 27, 2021

Hi @jfix71,
Requested changes are done. Sorry for the delay, could you please review this changes?. I have added a new pass in FunctionPassPipeline which will convert BN to fp16 if preceding Conv is set to fp16 so that OptimizeBatchNorm pass can fuse Conv and BN

Copy link
Contributor

@jfix71 jfix71 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking better -- added some more comments

include/glow/Importer/CommonOperatorLoader.h Outdated Show resolved Hide resolved
include/glow/Importer/ModelLoaderPrecisionConfiguration.h Outdated Show resolved Hide resolved
include/glow/Importer/ModelLoaderPrecisionConfiguration.h Outdated Show resolved Hide resolved
include/glow/Importer/ModelLoaderPrecisionConfiguration.h Outdated Show resolved Hide resolved
include/glow/Importer/ModelLoaderPrecisionConfiguration.h Outdated Show resolved Hide resolved
tests/unittests/OnnxImporterTest.cpp Outdated Show resolved Hide resolved
tests/unittests/OnnxImporterTest.cpp Show resolved Hide resolved
tests/unittests/OnnxImporterTest.cpp Show resolved Hide resolved
lib/Optimizer/GraphOptimizer/GraphOptimizer.cpp Outdated Show resolved Hide resolved
lib/Importer/ModelLoaderPrecisionConfiguration.cpp Outdated Show resolved Hide resolved
@quic-grathina quic-grathina force-pushed the mixed_precision branch 3 times, most recently from 2e2451e to adface4 Compare February 4, 2021 07:53
Copy link
Contributor

@jfix71 jfix71 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting pretty close! Few more things, mostly nits.

include/glow/Importer/ProtobufLoader.h Outdated Show resolved Hide resolved
lib/Importer/ProtobufLoader.cpp Outdated Show resolved Hide resolved
include/glow/Importer/ProtobufLoader.h Outdated Show resolved Hide resolved
lib/Importer/ONNXModelLoader.cpp Outdated Show resolved Hide resolved
lib/Importer/ProtobufLoader.cpp Outdated Show resolved Hide resolved
lib/Importer/ProtobufLoader.cpp Outdated Show resolved Hide resolved
tests/unittests/OnnxImporterTest.cpp Outdated Show resolved Hide resolved
tests/unittests/OnnxImporterTest.cpp Outdated Show resolved Hide resolved
include/glow/Importer/CommonOperatorLoader.h Outdated Show resolved Hide resolved
#ifndef GLOW_IMPORTER_MODELLOADERPRECISION_H
#define GLOW_IMPORTER_MODELLOADERPRECISION_H

#include "llvm/ADT/APInt.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bump -- is this used?

@quic-grathina quic-grathina force-pushed the mixed_precision branch 2 times, most recently from 35fe221 to 5cfabd6 Compare February 12, 2021 11:43
Copy link
Contributor

@jfix71 jfix71 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for your patience! I added some small nits/suggestions, and one place I think we should add an explicit error check.

lib/Importer/ProtobufLoader.cpp Show resolved Hide resolved
tests/unittests/OnnxImporterTest.cpp Outdated Show resolved Hide resolved
tests/unittests/OnnxImporterTest.cpp Outdated Show resolved Hide resolved
tests/unittests/OnnxImporterTest.cpp Outdated Show resolved Hide resolved
tests/unittests/OnnxImporterTest.cpp Outdated Show resolved Hide resolved
tests/unittests/OnnxImporterTest.cpp Outdated Show resolved Hide resolved
tests/unittests/OnnxImporterTest.cpp Show resolved Hide resolved
tests/unittests/OnnxImporterTest.cpp Show resolved Hide resolved
Copy link
Contributor

@jfix71 jfix71 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating on this @rgopinath8, looks great!

@facebook-github-bot
Copy link

@jfix71 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@stale
Copy link

stale bot commented Jun 9, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

@stale
Copy link

stale bot commented Jul 21, 2021

This PR has been automatically closed due to being stale for 15 days. Thank you for your contributions and feel free to reopen it in case of further progress.

@stale stale bot closed this Jul 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants