Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FrontEnd] Add Model Tuner front-end tool. #3816

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
115 changes: 115 additions & 0 deletions docs/ModelTuner.md
@@ -0,0 +1,115 @@
## ModelTuner

This front end tool is used for tuning (calibrating) the quantization parameters of a model.
During the quantization flow, the model is first profiled by gathering the dynamic range (min/max)
for each tensor in the graph. Next, the quantization parameters are chosen in such a way that, for
the given profile, no saturation occurs. Although this makes sense at first glance, there
is actually a tradeoff when choosing the quantization parameters for a given tensor: it might be
be beneficial overall if the quantization parameters are chosen such that to provide a smaller
quantization step (e.g. smaller **scale** parameter) which means a better representation of most
of the tensor values (the bulk of the histogram) at the expense of actually saturating the extreme
values (outliers).

This tool is basically tuning the quantization parameters by using the following simple algorithm:
- For each node in the graph, try different quantization parameters in the vicinity of the initially
chosen values (right after the profiling). For example, this is done by successively dividing the
**scale** parameter by 2 for a maximum of 3 iterations.
- For each tested quantization parameters, keep the ones which provide the best accuracy with respect
to a given dataset.

### Command line options

The specific command line options for running this tool are presented below. Apart from the specific
options, some generic options are used which are also used for the other front end tools (see the
image-classifier documentation):
- options for specifying the model, the quantization options (schema, precision), the backend,
the image preprocessing options (layout, channel order, normalization).

```
model-tuner -model=<model-path> <image-options> <quantization-options> -dataset-path=<dataset-folder>
-dataset-file=<dataset-file> -load-profile=<input-profile> -dump-tuned-profile=<tuned-profile>
```

where:
- *dataset-path* - the folder where the dataset files are located. The assumption is that all the dataset files
are located in the same directory.
- *dataset-file* - the path to the dataset description file which contains on each line a data path and integer
label separated by space (" ") or comma (","). The integer labels start with 0 (0,1,...).
An example might look like this:
image0.png 0
image1.png 13
.............
Another example might look like this:
image0.png,0,
image1.png,13,
..............
- *load-profile* - the path of the input profile which is loaded and tuned.
- *dump-tuned-profile* - the path where the tuned profile is written.

More information can be acquired by typing the following command:
```
model-tuner -help
```

### Extra command line options

There are a couple of extra command line parameters which can be used to tweak the algorithm behavior:
- *target-accuracy* - The tuning procedure is stopped when the accuracy has reached or surpassed the given
value. A float value between 0.0 and 1.0 is expected. If not specified, the tuning will
run until completion.
- *max-iter-per-node* - The maximum number of tuning iterations per node (default is 3).
- *acc-drop-skip* - The accuracy drop for which the tuning of any node is skipped. The default value is 0.05 (5%).

### Command line output

When running this tool the console output will might look like this:

```
Computing initial accuracy ...
Initial accuracy: 81.0180 %
Number of nodes: 277
Target accuracy: 100.0000 %

[1/277] Tuning node "broadcast_B_tile0_save__1:0"
[1/3] Testing scale = 0.00195
Accuracy = 81.0180 %
Tunning stopped for this node (no effect)
Best accuracy : 81.0180 %
Iteration time: 34 seconds
Remaining time: 2 hours 36 minutes

[2/277] Tuning node "W52__1:0"
[1/3] Testing scale = 0.06250
Accuracy = 81.4422 %
[2/3] Testing scale = 0.03125
Accuracy = 79.0032 %
[3/3] Testing scale = 0.01562
Accuracy = 67.1262 %
Best accuracy : 81.4422 %
Iteration time: 68 seconds
Remaining time: 5 hours 11 minutes

..................................
..................................

[277/277] Tuning node "W42__1:0"
[1/3] Testing scale = 0.01562
Accuracy = 90.2439 %
Tunning stopped for this node
Best accuracy : 97.9852 %
Iteration time: 66 seconds
Remaining time: 0 hours 0 minutes


Final accuracy: 97.9852 %

Total time: 5 hours 6 minutes
```

Notes:
- The quantization tuning procedure is a long procedure: the order of magnitude of the time
required to run is similar to training. For example, the model used for tuning in the above example
is a medium size model (e.g. similar to a Mobile Net with a scale factor of 0.5). For this reason
the tool also prints an estimated remaining time for running the tuning (the estimation gets
better after calibrating more nodes).
- When the estimated time for the tuning is too much, one might use a smaller tuning dataset.
19 changes: 12 additions & 7 deletions include/glow/Importer/ProtobufLoader.h
Expand Up @@ -165,8 +165,8 @@ class ProtobufLoader {
bool hasNodeByName(llvm::StringRef name) const;

/// Constructs new ProtobufLoader object. It will populate the network into
/// \p F. The list \p types and \p names are used to initialized the inputs
/// and outputs with specific names and types. If \p errPtr is not null then
/// \p F. The list \p types and \p names are used to initialize the inputs
/// of the model with specific names and types. If \p errPtr is not null then
/// if an error occurs it will get assigned there otherwise if an error
/// occurs it will abort.
ProtobufLoader(llvm::ArrayRef<const char *> tensorNames,
Expand All @@ -191,16 +191,21 @@ class ProtobufLoader {
/// that there is only one output, returns Error otherwise. For image
/// classification, this single final output is usually the result of the
/// last softmax or regression layer.
Expected<Placeholder *> getSingleOutput() {
RETURN_ERR_IF_NOT(outputVarsByName_.size() == 1,
"There must be only one output.");
return outputVarsByName_.begin()->second;
}
Expected<Placeholder *> getSingleOutput() const;

/// \returns the single input of the network. The function assumes that there
/// is only one input, returns Error otherwise. For most of the models the
/// single input is usually an image tensor.
Expected<Placeholder *> getSingleInput() const;

/// \returns the Placeholder for the external output with \p name.
/// \pre outputVarsByName_.find(name) != outputVarsByName_.end()
Expected<Placeholder *> getOutputByName(llvm::StringRef name) const;

/// \returns the Placeholder for the external input with \p name.
/// \pre inputVarsByName_.find(name) != inputVarsByName_.end()
Expected<Placeholder *> getInputByName(llvm::StringRef name) const;

/// \returns True if the operator with name \p typeName having input node
/// list as \p inputs is constant foldable.
bool isConstantFoldable(llvm::ArrayRef<NodeValue> inputs,
Expand Down
9 changes: 9 additions & 0 deletions include/glow/Quantization/Base/Base.h
Expand Up @@ -255,6 +255,15 @@ Tensor tensor4BitsFusedRowwiseDequantization(const Tensor &input);
QuantizationTransform32To8 quantizeScaleOffset32To8(float scale,
int32_t offset);

/// Function to get the quantized range for a given precision type \p qTy.
/// \returns the range as a (min, max) pair.
std::pair<int64_t, int64_t> getQuantizationRange(ElemKind qTy);

/// Function to validate that the given quantization parameters \p qParams
/// comply with the given quantization \p schema and precision \p qTy.
void validateQuantizationParams(TensorQuantizationParams qParams, Schema schema,
ElemKind qTy);

/// Calculate TensorQuantizationParams based on the clipped \p min and \p max
/// floating point range and using the base quantization type \p qTy and the
/// quantization method described by \p schema.
Expand Down
31 changes: 26 additions & 5 deletions lib/Importer/ProtobufLoader.cpp
Expand Up @@ -84,6 +84,18 @@ bool ProtobufLoader::hasConstantByName(llvm::StringRef name) const {
return getConstantByNameOrNull(name) != nullptr;
}

Expected<Placeholder *> ProtobufLoader::getSingleOutput() const {
RETURN_ERR_IF_NOT(outputVarsByName_.size() == 1,
"There must be only one output.");
return outputVarsByName_.begin()->second;
}

Expected<Placeholder *> ProtobufLoader::getSingleInput() const {
RETURN_ERR_IF_NOT(inputVarsByName_.size() == 1,
"There must be only one input.");
return inputVarsByName_.begin()->second;
}

Expected<Placeholder *>
ProtobufLoader::getOutputByName(llvm::StringRef name) const {
auto it = outputVarsByName_.find(name);
Expand All @@ -94,6 +106,16 @@ ProtobufLoader::getOutputByName(llvm::StringRef name) const {
return it->second;
}

Expected<Placeholder *>
ProtobufLoader::getInputByName(llvm::StringRef name) const {
auto it = inputVarsByName_.find(name);
RETURN_ERR_IF_NOT(
it != inputVarsByName_.end(),
llvm::Twine("No external input Variable was registered with name ", name)
.str());
return it->second;
}

NodeValue
ProtobufLoader::getNodeValueByNameOrNullNodeValue(llvm::StringRef name) const {
auto it = nodeValueByName_.find(name);
Expand Down Expand Up @@ -187,11 +209,10 @@ ProtobufLoader::ProtobufLoader(llvm::ArrayRef<const char *> tensorNames,
for (size_t i = 0, e = tensorNames.size(); i < e; i++) {
RETURN_ERR_IF_NOT(!hasNodeByName(tensorNames[i]),
"Input names have duplicate");
auto placeholderOrErr =
createAndRegisterPlaceholder(tensorNames[i], types[i]);
if (!placeholderOrErr) {
return placeholderOrErr.takeError();
}
Placeholder *placeholder;
ASSIGN_VALUE_OR_RETURN_ERR(
placeholder, createAndRegisterPlaceholder(tensorNames[i], types[i]));
inputVarsByName_.try_emplace(tensorNames[i], placeholder);
}
return Error::success();
};
Expand Down
67 changes: 41 additions & 26 deletions lib/Quantization/Base/Base.cpp
Expand Up @@ -265,11 +265,7 @@ QuantizationTransform32To8 quantizeScaleOffset32To8(float scale,
offset);
}

TensorQuantizationParams chooseQuantizationParams(float min, float max,
Schema schema, ElemKind qTy) {
assert(min <= max && "min must not be bigger than max");

// Compute the quantized int range.
std::pair<int64_t, int64_t> getQuantizationRange(ElemKind qTy) {
// Pick int64_t in order to cover the uint32_t range.
int64_t qmin;
int64_t qmax;
Expand Down Expand Up @@ -310,6 +306,45 @@ TensorQuantizationParams chooseQuantizationParams(float min, float max,
default:
llvm_unreachable("Quantized type not supported");
}
return std::pair<int64_t, int64_t>(qmin, qmax);
}

void validateQuantizationParams(TensorQuantizationParams qParams, Schema schema,
ElemKind qTy) {

// Get the quantized range.
auto minMaxPair = getQuantizationRange(qTy);
int64_t qmin = minMaxPair.first;
int64_t qmax = minMaxPair.second;

// Validate params.
(void)(qmin);
(void)(qmax);
assert((qmin <= qParams.offset) && (qParams.offset <= qmax) &&
"The offset must be within the quantized range");
if (schema == quantization::Schema::Symmetric) {
assert((qParams.offset == 0) &&
"Symmetric quantization should have offset 0");
} else if (schema == quantization::Schema::SymmetricWithUnsigned) {
assert((qParams.offset == qmin || qParams.offset == 0) &&
"SymmetricWithUnsigned quantization should have offset 0 or qmin");
} else if (schema == quantization::Schema::SymmetricWithPower2Scale) {
assert((qParams.offset == 0) &&
"SymmetricWithPower2Scale quantization should have offset 0");
assert(isFloatPowerOf2(qParams.scale) &&
"SymmetricWithPower2Scale quantization parameter should be a power "
"of 2");
}
}

TensorQuantizationParams chooseQuantizationParams(float min, float max,
Schema schema, ElemKind qTy) {
assert(min <= max && "min must not be bigger than max");

// Get the quantized range.
auto minMaxPair = getQuantizationRange(qTy);
int64_t qmin = minMaxPair.first;
int64_t qmax = minMaxPair.second;

// We extend the [min, max] interval to ensure that it contains 0.
// Otherwise, we would not meet the requirement that 0 be an exactly
Expand Down Expand Up @@ -403,27 +438,7 @@ TensorQuantizationParams chooseQuantizationParams(float min, float max,
}

TensorQuantizationParams result{static_cast<float>(scale), nudgedZeroPoint};
// The only valid offset for symmetric quantization is 0.
assert((result.offset == 0 || schema != quantization::Schema::Symmetric) &&
"Symmetric quantization should be centered on 0");

// The only valid offsets for symmetric quantization with unsigned support are
// 0 and qmin.
assert((result.offset == qmin || result.offset == 0 ||
schema != quantization::Schema::SymmetricWithUnsigned) &&
"Symmetric quantization with unsigned should be centered on 0 or on "
"-qmin");

// For SymmetricWithPower2Scale schema the offset should be 0.
assert((result.offset == 0 ||
schema != quantization::Schema::SymmetricWithPower2Scale) &&
"Symmetric quantization should be centered on 0");

// For SymmetricWithPower2Scale schema the scale should be a power of 2.
assert((isFloatPowerOf2(result.scale) ||
schema != quantization::Schema::SymmetricWithPower2Scale) &&
"Scale quantization parameter should be a power of 2");

validateQuantizationParams(result, schema, qTy);
return result;
}

Expand Down
18 changes: 18 additions & 0 deletions tools/loader/CMakeLists.txt
Expand Up @@ -82,3 +82,21 @@ target_link_libraries(model-compiler
GraphOptimizer
Quantization
LLVMSupport)

add_executable(model-tuner
Loader.cpp
LoaderUtils.cpp
ModelTuner.cpp)

target_link_libraries(model-tuner
PRIVATE
Backends
Base
Converter
Graph
HostManager
Importer
ExecutionEngine
GraphOptimizer
Quantization
LLVMSupport)
3 changes: 2 additions & 1 deletion tools/loader/ImageClassifier.cpp
Expand Up @@ -252,7 +252,8 @@ buildAndCompileAndGetInAndOutPair(Loader &loader, PlaceholderBindings &bindings,

// Compile the model, and perform quantization/emit a bundle/dump debug info
// if requested from command line.
CompilationContext cctx{&bindings};
CompilationContext cctx = loader.getCompilationContext();
cctx.bindings = &bindings;
cctx.backendOpts.autoInstrument = autoInstrument;
loader.compile(cctx);

Expand Down