New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ONNXModelLoader] Enabling operator instance based mixed precision support #5145
Conversation
Hi @rgopinath8! Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks! |
bed0ba5
to
598afa0
Compare
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks! |
2 similar comments
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks! |
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks! |
4908f60
to
a70e143
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rgopinath8 Great to see this functionality upstreamed!
I added comments -- one main one about the way this is implemented in the loader and the need for fixups when there are multiple users. LMK your thoughts!
#ifndef GLOW_IMPORTER_MODELLOADERPRECISION_H | ||
#define GLOW_IMPORTER_MODELLOADERPRECISION_H | ||
|
||
#include "llvm/ADT/APInt.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bump -- is this used?
/// Holds info about mixed precision details which can be used across model | ||
/// loaders | ||
struct ModelLoaderPrecisionConfiguration{ | ||
/// Used during operator loading while costructing glow graph to keep the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: costructing
/// conversion is skipped and FP16 conversion is done for any node kinds | ||
/// found here). This creates a graph where some nodes execute in quantized | ||
/// or FP32 precision and remaining in FP16 precision. If the node kind | ||
/// specified via it's name is unsupported by the backend in FP16 precision |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grammar: it's -> its
// Open YAML input stream. | ||
llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> text = | ||
llvm::MemoryBuffer::getFileAsStream(fileName); | ||
CHECK(!text.getError()) << "Unable to open file with name: " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can return MAKE_ERR()
here instead of CHECK
?
void setModelLoaderPrecisionOpt(llvm::StringRef fileName); | ||
|
||
/// Deserialize Model loader precision info from the \p YAML file | ||
bool deserializeModelLoaderPrecisionInfosFromYaml( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We never check the return value here/it seems irrelevant for now. And we do some error checking inside of the function, though it dies via CHECK if there's an issue reading the yaml. Can we instead return an Error
?
lib/Importer/ProtobufLoader.cpp
Outdated
//Re-assign non constant inputs to original nodevalues | ||
for (auto itr = nonConstantsNodeValueMap.begin(); | ||
itr != nonConstantsNodeValueMap.end(); ++itr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//Re-assign non constant inputs to original nodevalues | |
for (auto itr = nonConstantsNodeValueMap.begin(); | |
itr != nonConstantsNodeValueMap.end(); ++itr) { | |
// Re-assign non constant inputs to original nodevalues | |
for (auto itr = nonConstantsNodeValueMap.begin(), | |
e = nonConstantsNodeValueMap.end(); itr != e; ++itr) { |
lib/Importer/ONNXModelLoader.cpp
Outdated
@@ -5087,7 +5089,54 @@ Error ONNXModelLoader::loadNetwork(ONNX_NAMESPACE::GraphProto &net, | |||
continue; | |||
} | |||
} | |||
|
|||
// Find if OpOutPutName is specified to set in fp16. Node precision can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for consistency with code
// Find if OpOutPutName is specified to set in fp16. Node precision can | |
// Find if OpOutputName is specified to set in fp16. Node precision can |
lib/Importer/ONNXModelLoader.cpp
Outdated
RETURN_IF_ERR(loadOperator(op)); | ||
|
||
// If OpOutPutName is specified to run in fp16, the operator outputs will | ||
// be FP16 kind but next operator maynot run in fp16 precision. So add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: maynot
lib/Importer/ProtobufLoader.cpp
Outdated
// If input type is fp32 add convert to fp16 node and update | ||
// Global nodeValueByName_ map, so that operator in the model | ||
// recieves appropriate input value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I'm not sure I like this approach very much. I understand it makes things cleaner in that all ops can retrieve their inputs naturally as they currently do via nodeValueByName_
. However as you have had to implement it requires some sort of fixup via the nonConstantsNodeValueMap
. I wonder if instead we could e.g. modify getNodeValueByName()
, so that it checks if it should insert a ConvertToNode to FP16 before returning the NodeValue. Then we never modify nodeValueByName_
and don't require the fixup. Additionally, CSE during graph optimizations should combine many ConvertToNodes that all use the same op. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jfix71,
I agree with you that changing the getNodeValueByName() and getConstantByName() we can avoid the fixup needed for nodes with multiple users via nonConstantsNodeValueMap. To handle operator output precision I can think of two approaches
-
Modify addNodeAsOutput() method to add ConvertToFP32 node to its outputs and update nodeValueByName_.
-
In getNodeValueByName() check if operator output name is not specified to run in fp16 and input is coming from ConvertToFP16 node, then add ConvertToFP32 and return the NodeValue. So getNodeValueByName() will return three possible NodeValues
- Original NodeValue - Add ConvertToFP16 node and return the NodeValue if operator is needed to be in FP16 precision - Add ConvertToFP32 node if the operator is needed to be in FP32 precision but the input is coming from a Node set to FP16 precision
Kinldy provide your suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rgopinath8 I think option 1 is fine, except that there are a non-trivial number of places where we do not go through addNodeAsOutput()
and instead update nodeValueByName_
directly. If you update other places and make sure that is consistent then it's a reasonable approach, and we'd need to ensure future cases always use addNodeAsOutput()
too. It's probably a better approach than 2 if you want to go update all such places.
Hi @jfix71,
So do you think it is advisable to set BatchNorm to FP16 if preceding Convolution is set to FP16. If yes this can be implemented in two places
Kindly let me know your suggestions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @rgopinath8 -- so it seems to me that a preferable option is that the model should simply specify the BN should be in FP16 too. I.e. it doesn't seem to me like Glow should be making precision decisions without the user specifying this to be the case -- if the input BN is specified to run in FP32 then that should be respected unless explicitly told to do so. Why isn't the BN set to FP16?
Another option could be to just update the optimization to fuse FP32 BNs into FP16 Convs.
lib/Importer/ProtobufLoader.cpp
Outdated
// If input type is fp32 add convert to fp16 node and update | ||
// Global nodeValueByName_ map, so that operator in the model | ||
// recieves appropriate input value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rgopinath8 I think option 1 is fine, except that there are a non-trivial number of places where we do not go through addNodeAsOutput()
and instead update nodeValueByName_
directly. If you update other places and make sure that is consistent then it's a reasonable approach, and we'd need to ensure future cases always use addNodeAsOutput()
too. It's probably a better approach than 2 if you want to go update all such places.
528992d
to
ff71fe2
Compare
ff71fe2
to
b5a17e5
Compare
Hi @jfix71, |
b5a17e5
to
7a4a36c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking better -- added some more comments
2e2451e
to
adface4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting pretty close! Few more things, mostly nits.
#ifndef GLOW_IMPORTER_MODELLOADERPRECISION_H | ||
#define GLOW_IMPORTER_MODELLOADERPRECISION_H | ||
|
||
#include "llvm/ADT/APInt.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bump -- is this used?
35fe221
to
5cfabd6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for your patience! I added some small nits/suggestions, and one place I think we should add an explicit error check.
5cfabd6
to
1c41961
Compare
1c41961
to
7c05a56
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating on this @rgopinath8, looks great!
@jfix71 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 15 days if no further activity occurs. Thank you for your contributions. |
This PR has been automatically closed due to being stale for 15 days. Thank you for your contributions and feel free to reopen it in case of further progress. |
Summary:
Mixed precision mode enables running certain operator instances in fp16, certain OP instances in FP32 and rest in INT8 (PGQ)
The Operator instances required to run in FP16 are specified by the name of their “first output” in a yaml file. For such operator instances convertTofp16 nodes are added at its input (activations) and convertTofp32 nodes are added to its output. For constant inputs, tensor payload is converted to FP16. When used along with quantization, the operator instances that need to run in INT8 precision are not specified in the yaml file and they will run in INT8 through the regular quantization path if operator is supported in the backend. The Operator kinds that need to remain in their original precision can be specified with the existing compiler option “keep-original-precision-for-nodes”.
What this PR contains
Documentation:
Will be updated at the time of submission if the approach is fine.
Test Plan:
Test case added in OnnxImporterTest, GraphOptzTest