[ONNXModelLoader] Enabling operator instance based mixed precision support #5145

quic-grathina · 2020-12-08T06:41:14Z

Summary:
Mixed precision mode enables running certain operator instances in fp16, certain OP instances in FP32 and rest in INT8 (PGQ)
The Operator instances required to run in FP16 are specified by the name of their “first output” in a yaml file. For such operator instances convertTofp16 nodes are added at its input (activations) and convertTofp32 nodes are added to its output. For constant inputs, tensor payload is converted to FP16. When used along with quantization, the operator instances that need to run in INT8 precision are not specified in the yaml file and they will run in INT8 through the regular quantization path if operator is supported in the backend. The Operator kinds that need to remain in their original precision can be specified with the existing compiler option “keep-original-precision-for-nodes”.

What this PR contains

Code changes in ONNX model loader where the OPs indicated in the yaml file to run in FP16 are created as a MAP. At the time of loadOperator() the OP is loaded in FP16 if the instance is present in the map
Specific handling of OPs like NMS and Resize whose input are mapped as attributes in GLOW IR node. Other OPs in this category and will be updated on similar lines
Additions to graph optimization layer that fuses/ optimizes out quantize(convertTo16(x)) -> quantize(x), convertTo(dequantize(x)) -> dequantize(x), dequantize(quantize(x)) -> ConvertTo if “x” is in FP32 and the o/p was required in FP16 or "x" is in FP16 and o/p was required in FP32

Documentation:
Will be updated at the time of submission if the approach is fine.

Test Plan:
Test case added in OnnxImporterTest, GraphOptzTest

facebook-github-bot · 2020-12-08T06:41:22Z

Hi @rgopinath8!

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file.

In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

facebook-github-bot · 2020-12-08T17:01:41Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

facebook-github-bot · 2020-12-08T17:08:12Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

facebook-github-bot · 2020-12-08T17:09:40Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

jfix71

@rgopinath8 Great to see this functionality upstreamed!

I added comments -- one main one about the way this is implemented in the loader and the need for fixups when there are multiple users. LMK your thoughts!

jfix71 · 2020-12-11T21:54:05Z

include/glow/Importer/ModelLoaderPrecisionConfiguration.h

+#ifndef GLOW_IMPORTER_MODELLOADERPRECISION_H
+#define GLOW_IMPORTER_MODELLOADERPRECISION_H
+
+#include "llvm/ADT/APInt.h"


nit: unused?

bump -- is this used?

jfix71 · 2020-12-11T21:54:54Z

include/glow/Importer/ModelLoaderPrecisionConfiguration.h

+/// Holds info about mixed precision details which can be used across model
+/// loaders
+struct ModelLoaderPrecisionConfiguration{
+  /// Used during operator loading while costructing glow graph to keep the


typo: costructing

jfix71 · 2020-12-11T21:55:32Z

include/glow/Importer/ModelLoaderPrecisionConfiguration.h

+  /// conversion is skipped and FP16 conversion is done for any node kinds
+  /// found here). This creates a graph where some nodes execute in quantized
+  /// or FP32 precision and remaining in FP16 precision. If the node kind
+  /// specified via it's name is unsupported by the backend in FP16 precision


grammar: it's -> its

jfix71 · 2020-12-11T21:58:09Z

lib/Importer/ModelLoaderPrecisionConfiguration.cpp

+    // Open YAML input stream.
+    llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> text =
+        llvm::MemoryBuffer::getFileAsStream(fileName);
+    CHECK(!text.getError()) << "Unable to open file with name: "


Perhaps we can return MAKE_ERR() here instead of CHECK?

jfix71 · 2020-12-11T21:59:10Z

include/glow/Importer/ModelLoaderPrecisionConfiguration.h

+void setModelLoaderPrecisionOpt(llvm::StringRef fileName);
+
+/// Deserialize Model loader precision info from the \p YAML file
+bool deserializeModelLoaderPrecisionInfosFromYaml(


We never check the return value here/it seems irrelevant for now. And we do some error checking inside of the function, though it dies via CHECK if there's an issue reading the yaml. Can we instead return an Error?

jfix71 · 2020-12-11T23:27:34Z

lib/Importer/ProtobufLoader.cpp

+  //Re-assign non constant inputs to original nodevalues
+  for (auto itr = nonConstantsNodeValueMap.begin();
+            itr != nonConstantsNodeValueMap.end(); ++itr) {


Suggested change

//Re-assign non constant inputs to original nodevalues

for (auto itr = nonConstantsNodeValueMap.begin();

itr != nonConstantsNodeValueMap.end(); ++itr) {

// Re-assign non constant inputs to original nodevalues

for (auto itr = nonConstantsNodeValueMap.begin(),

e = nonConstantsNodeValueMap.end(); itr != e; ++itr) {

tests/unittests/GraphOptzTest.cpp

jfix71 · 2020-12-11T23:34:28Z

lib/Importer/ONNXModelLoader.cpp

@@ -5087,7 +5089,54 @@ Error ONNXModelLoader::loadNetwork(ONNX_NAMESPACE::GraphProto &net,
        continue;
      }
    }
+
+    // Find if OpOutPutName is specified to set in fp16. Node precision can


nit: for consistency with code

Suggested change

// Find if OpOutPutName is specified to set in fp16. Node precision can

// Find if OpOutputName is specified to set in fp16. Node precision can

jfix71 · 2020-12-11T23:36:14Z

lib/Importer/ONNXModelLoader.cpp

    RETURN_IF_ERR(loadOperator(op));
+
+    // If OpOutPutName is specified to run in fp16, the operator outputs will
+    // be FP16 kind but next operator maynot run in fp16 precision. So add


typo: maynot

jfix71 · 2020-12-12T00:18:29Z

lib/Importer/ProtobufLoader.cpp

+    // If input type is fp32 add convert to fp16 node and update
+    // Global nodeValueByName_ map, so that operator in the model
+    // recieves appropriate input value.


So, I'm not sure I like this approach very much. I understand it makes things cleaner in that all ops can retrieve their inputs naturally as they currently do via nodeValueByName_. However as you have had to implement it requires some sort of fixup via the nonConstantsNodeValueMap. I wonder if instead we could e.g. modify getNodeValueByName(), so that it checks if it should insert a ConvertToNode to FP16 before returning the NodeValue. Then we never modify nodeValueByName_ and don't require the fixup. Additionally, CSE during graph optimizations should combine many ConvertToNodes that all use the same op. WDYT?

Hi @jfix71,
I agree with you that changing the getNodeValueByName() and getConstantByName() we can avoid the fixup needed for nodes with multiple users via nonConstantsNodeValueMap. To handle operator output precision I can think of two approaches

Modify addNodeAsOutput() method to add ConvertToFP32 node to its outputs and update nodeValueByName_.

In getNodeValueByName() check if operator output name is not specified to run in fp16 and input is coming from ConvertToFP16 node, then add ConvertToFP32 and return the NodeValue. So getNodeValueByName() will return three possible NodeValues

- Original NodeValue - Add ConvertToFP16 node and return the NodeValue if operator is needed to be in FP16 precision - Add ConvertToFP32 node if the operator is needed to be in FP32 precision but the input is coming from a Node set to FP16 precision

Kinldy provide your suggestions.

@rgopinath8 I think option 1 is fine, except that there are a non-trivial number of places where we do not go through addNodeAsOutput() and instead update nodeValueByName_ directly. If you update other places and make sure that is consistent then it's a reasonable approach, and we'd need to ensure future cases always use addNodeAsOutput() too. It's probably a better approach than 2 if you want to go update all such places.

quic-grathina · 2020-12-24T12:59:29Z

Hi @jfix71,

We are observing some accuracy loss when running certain instances of operators in FP16 along with Quantization
If we select an operator instance to run in fp16 which otherwise would have been fused/sinked/hoisted later by GLOW optimization pipeline (leading to the operator not getting fused due to datatype difference between consecutive node) , we have inaccuracy
Example: Conv instances are set to fp16 in model loader, due to which BatchNorm nodes following Conv will not be fused with Conv due to datatype mismatch. Instead of only running all Convolutions in fp16 (bad accuracy as per point 2), if we run Convolutions and BN in fp16, we have no accuracy loss.

So do you think it is advisable to set BatchNorm to FP16 if preceding Convolution is set to FP16. If yes this can be implemented in two places

New optimization pass in FunctionPassPipeline which should come before OptimizeBatchNorm pass.
In the Model loader

Kindly let me know your suggestions.

jfix71

Hi @rgopinath8 -- so it seems to me that a preferable option is that the model should simply specify the BN should be in FP16 too. I.e. it doesn't seem to me like Glow should be making precision decisions without the user specifying this to be the case -- if the input BN is specified to run in FP32 then that should be respected unless explicitly told to do so. Why isn't the BN set to FP16?

Another option could be to just update the optimization to fuse FP32 BNs into FP16 Convs.

jfix71 · 2021-01-05T15:52:28Z

lib/Importer/ProtobufLoader.cpp

+    // If input type is fp32 add convert to fp16 node and update
+    // Global nodeValueByName_ map, so that operator in the model
+    // recieves appropriate input value.


@rgopinath8 I think option 1 is fine, except that there are a non-trivial number of places where we do not go through addNodeAsOutput() and instead update nodeValueByName_ directly. If you update other places and make sure that is consistent then it's a reasonable approach, and we'd need to ensure future cases always use addNodeAsOutput() too. It's probably a better approach than 2 if you want to go update all such places.

quic-grathina · 2021-01-27T10:04:26Z

Hi @jfix71,
Requested changes are done. Sorry for the delay, could you please review this changes?. I have added a new pass in FunctionPassPipeline which will convert BN to fp16 if preceding Conv is set to fp16 so that OptimizeBatchNorm pass can fuse Conv and BN

jfix71

Looking better -- added some more comments

include/glow/Importer/CommonOperatorLoader.h

include/glow/Importer/ModelLoaderPrecisionConfiguration.h

tests/unittests/OnnxImporterTest.cpp

lib/Optimizer/GraphOptimizer/GraphOptimizer.cpp

lib/Importer/ModelLoaderPrecisionConfiguration.cpp

jfix71

Getting pretty close! Few more things, mostly nits.

include/glow/Importer/ProtobufLoader.h

lib/Importer/ProtobufLoader.cpp

include/glow/Importer/ProtobufLoader.h

lib/Importer/ONNXModelLoader.cpp

lib/Importer/ProtobufLoader.cpp

tests/unittests/OnnxImporterTest.cpp

include/glow/Importer/CommonOperatorLoader.h

jfix71 · 2021-02-09T04:41:11Z

include/glow/Importer/ModelLoaderPrecisionConfiguration.h

+#ifndef GLOW_IMPORTER_MODELLOADERPRECISION_H
+#define GLOW_IMPORTER_MODELLOADERPRECISION_H
+
+#include "llvm/ADT/APInt.h"


bump -- is this used?

jfix71

LGTM, thanks for your patience! I added some small nits/suggestions, and one place I think we should add an explicit error check.

lib/Importer/ProtobufLoader.cpp

tests/unittests/OnnxImporterTest.cpp

…pport

jfix71

Thanks for iterating on this @rgopinath8, looks great!

facebook-github-bot · 2021-04-07T03:17:35Z

@jfix71 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

stale · 2021-06-09T03:03:06Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

stale · 2021-07-21T00:07:38Z

This PR has been automatically closed due to being stale for 15 days. Thank you for your contributions and feel free to reopen it in case of further progress.

quic-grathina force-pushed the mixed_precision branch from bed0ba5 to 598afa0 Compare December 8, 2020 06:46

facebook-github-bot added the CLA Signed label Dec 8, 2020

quic-grathina force-pushed the mixed_precision branch 3 times, most recently from 4908f60 to a70e143 Compare December 9, 2020 15:56

jfix71 requested changes Dec 12, 2020

View reviewed changes

jfix71 reviewed Jan 5, 2021

View reviewed changes

quic-grathina force-pushed the mixed_precision branch 4 times, most recently from 528992d to ff71fe2 Compare January 12, 2021 07:57

quic-grathina force-pushed the mixed_precision branch from ff71fe2 to b5a17e5 Compare January 27, 2021 10:00

quic-grathina force-pushed the mixed_precision branch from b5a17e5 to 7a4a36c Compare January 27, 2021 10:21

jfix71 requested changes Feb 3, 2021

View reviewed changes

quic-grathina force-pushed the mixed_precision branch 3 times, most recently from 2e2451e to adface4 Compare February 4, 2021 07:53

quic-grathina requested a review from jfix71 February 4, 2021 08:43

jfix71 reviewed Feb 9, 2021

View reviewed changes

quic-grathina force-pushed the mixed_precision branch 2 times, most recently from 35fe221 to 5cfabd6 Compare February 12, 2021 11:43

quic-grathina requested a review from jfix71 February 12, 2021 12:06

jfix71 approved these changes Feb 22, 2021

View reviewed changes

quic-grathina force-pushed the mixed_precision branch from 5cfabd6 to 1c41961 Compare March 23, 2021 08:04

[ONNXModelLoader] Enabling operator instance based mixed precision su…

7c05a56

…pport

quic-grathina force-pushed the mixed_precision branch from 1c41961 to 7c05a56 Compare March 23, 2021 08:20

quic-grathina requested a review from jfix71 March 23, 2021 09:48

jfix71 approved these changes Apr 7, 2021

View reviewed changes

stale bot added the stale_will_be_closed label Jun 9, 2021

stale bot closed this Jul 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ONNXModelLoader] Enabling operator instance based mixed precision support #5145

[ONNXModelLoader] Enabling operator instance based mixed precision support #5145

quic-grathina commented Dec 8, 2020 •

edited

facebook-github-bot commented Dec 8, 2020

facebook-github-bot commented Dec 8, 2020

facebook-github-bot commented Dec 8, 2020

facebook-github-bot commented Dec 8, 2020

jfix71 left a comment

jfix71 Dec 11, 2020

jfix71 Feb 9, 2021

jfix71 Dec 11, 2020

jfix71 Dec 11, 2020

jfix71 Dec 11, 2020

jfix71 Dec 11, 2020

jfix71 Dec 11, 2020

jfix71 Dec 11, 2020

jfix71 Dec 11, 2020

jfix71 Dec 12, 2020

quic-grathina Dec 24, 2020

jfix71 Jan 5, 2021

quic-grathina commented Dec 24, 2020

jfix71 left a comment

jfix71 Jan 5, 2021

quic-grathina commented Jan 27, 2021 •

edited

jfix71 left a comment

jfix71 left a comment

jfix71 Feb 9, 2021

jfix71 left a comment

jfix71 left a comment

facebook-github-bot commented Apr 7, 2021

stale bot commented Jun 9, 2021

stale bot commented Jul 21, 2021

	// Find if OpOutPutName is specified to set in fp16. Node precision can
	// Find if OpOutputName is specified to set in fp16. Node precision can

[ONNXModelLoader] Enabling operator instance based mixed precision support #5145

[ONNXModelLoader] Enabling operator instance based mixed precision support #5145

Conversation

quic-grathina commented Dec 8, 2020 • edited

facebook-github-bot commented Dec 8, 2020

facebook-github-bot commented Dec 8, 2020

facebook-github-bot commented Dec 8, 2020

facebook-github-bot commented Dec 8, 2020

jfix71 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

quic-grathina commented Dec 24, 2020

jfix71 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

quic-grathina commented Jan 27, 2021 • edited

jfix71 left a comment

Choose a reason for hiding this comment

jfix71 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jfix71 left a comment

Choose a reason for hiding this comment

jfix71 left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Apr 7, 2021

stale bot commented Jun 9, 2021

stale bot commented Jul 21, 2021

quic-grathina commented Dec 8, 2020 •

edited

quic-grathina commented Jan 27, 2021 •

edited