Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Quantization][Refactor] Make profile include only min/max #4326

Closed

Conversation

@mciprian13
Copy link
Contributor

mciprian13 commented Mar 20, 2020

Summary

Until this point the profile generated after the profiling phase included the quantization parameters scale and offset which were computed for a given quantization configuration (schema, precision, precisionBias). This approach was extremely error prone (multiple times I had this problem) because in order for the flow to be consistent, same configuration had to be provided during the profiling (e.g. while using image-classifier or model-profiler) versus during the quantization (e.g. while using the model-compiler).

With this PR the profile includes only the profiling specific information (min and max) while the actual quantization parameters are computed during the quantization phase. This has the following benefits:

  • the profiling phase becomes totally independent on the quantization configuration (schema, precision, precisionBias) thus making the usage of the front-end tools less error prone (for model-profiler or image-classifier no quantization parameters are required)
  • the information in the profile.yml is much more readable in terms of the tensor dynamic range (which I think was the purpose of using the YAML format in the first place) because just having the scale and offset was not indicative on the tensor dynamic range because:
    • the information itself was not sufficient without the data precision (given offset & scale mean totally different things for int8 vs int32)
    • some information was lost because for (e.g.) Symmetric quantization schema the information about the original (min,max) pair was lost due to symmetrization
  • the profiling information is closer to where the actual quantization is performed, allowing more elaborate and precise quantization schemes (joint operand quantization, etc)

The functionality of serializing the tensor quantization parameters (scale & offset) is still there and we might use it for debugging (although the scales and offset are readable in the dumped graph).

Fixes
#2600
#2599 (No longer required)

Documentation
Updated AOT.md.

Test Plan
The quantization unit tests are passing.
Added new unit test for the new getNodeValueByName() utility function.
I manually tested the LeNet MNIST model and verified the results of the quantized bundle are exactly the same (bit exact) before and after this change. This is possible because the serialization of the min/max preserves the floating-point bit-exactness (just as it did for the floating-point scale).

Next
We should also include in the profile the histogram (which is already computed) and use some smarter quantization (outlier removal, KL divergence minimization, etc).

…r while the qparams are computed during the quantization phase.
mciprian13 added 2 commits Mar 20, 2020
Copy link

facebook-github-bot left a comment

@jackm321 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

mciprian13 added 2 commits Mar 20, 2020
@mciprian13

This comment has been minimized.

Copy link
Contributor Author

mciprian13 commented Mar 20, 2020

@jackm321 Can we have this reviewed? Thanks!

Copy link
Contributor

jackm321 left a comment

This is great! Looks really good.

lib/Quantization/Quantization.cpp Outdated Show resolved Hide resolved
include/glow/Quantization/Quantization.h Outdated Show resolved Hide resolved
@mciprian13

This comment has been minimized.

Copy link
Contributor Author

mciprian13 commented Mar 21, 2020

@jackm321 Can we have this landed?
Should also close the associated issues #2600 and #2599.
Thanks!

@mciprian13

This comment has been minimized.

Copy link
Contributor Author

mciprian13 commented Mar 21, 2020

Side question: why the preference for llvm::StringRef and llvm::ArrayRef instead of std::string and std::vector? What are the benefits?

Copy link

facebook-github-bot left a comment

@jackm321 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@jackm321

This comment has been minimized.

Copy link
Contributor

jackm321 commented Mar 21, 2020

@mciprian13 llvm::StringRef and llvm::ArrayRef are preferred just because they don't mandate the underlying container of the argument

@mciprian13

This comment has been minimized.

Copy link
Contributor Author

mciprian13 commented Mar 23, 2020

@jackm321 Thanks for the explanation. Can we have this landed?

@mciprian13

This comment has been minimized.

Copy link
Contributor Author

mciprian13 commented Mar 24, 2020

@jfix71 Can we have this landed? I`m working on other quantization refactorings and I need this landed. Thanks!

Copy link

facebook-github-bot left a comment

@jfix71 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot

This comment has been minimized.

Copy link

facebook-github-bot commented Mar 25, 2020

@jfix71 merged this pull request in 5f11b6a.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.