pytorch · mciprian13 · Nov 25, 2019 · Nov 25, 2019 · Nov 25, 2019 · Dec 19, 2019
diff --git a/docs/ModelTuner.md b/docs/ModelTuner.md
@@ -0,0 +1,115 @@
+## ModelTuner
+
+This front end tool is used for tuning (calibrating) the quantization parameters of a model.
+During the quantization flow, the model is first profiled by gathering the dynamic range (min/max)
+for each tensor in the graph. Next, the quantization parameters are chosen in such a way that, for
+the given profile, no saturation occurs. Although this makes sense at first glance, there
+is actually a tradeoff when choosing the quantization parameters for a given tensor: it might be
+be beneficial overall if the quantization parameters are chosen such that to provide a smaller
+quantization step (e.g. smaller **scale** parameter) which means a better representation of most
+of the tensor values (the bulk of the histogram) at the expense of actually saturating the extreme
+values (outliers).
+
+This tool is basically tuning the quantization parameters by using the following simple algorithm:
+- For each node in the graph, try different quantization parameters in the vicinity of the initially
+chosen values (right after the profiling). For example, this is done by successively dividing the
+**scale** parameter by 2 for a maximum of 3 iterations.
+- For each tested quantization parameters, keep the ones which provide the best accuracy with respect
+to a given dataset.
+
+### Command line options
+
+The specific command line options for running this tool are presented below. Apart from the specific
+options, some generic options are used which are also used for the other front end tools (see the 
+image-classifier documentation):
+- options for specifying the model, the quantization options (schema, precision), the backend,
+the image preprocessing options (layout, channel order, normalization).
+
+```
+model-tuner -model=<model-path> <image-options> <quantization-options> -dataset-path=<dataset-folder>
+-dataset-file=<dataset-file> -load-profile=<input-profile> -dump-tuned-profile=<tuned-profile>
+```
+
+where:
+- *dataset-path* - the folder where the dataset files are located. The assumption is that all the dataset files
+                    are located in the same directory.
+- *dataset-file* - the path to the dataset description file which contains on each line a data path and integer
+                   label separated by space (" ") or comma (","). The integer labels start with 0 (0,1,...).
+                   An example might look like this:
+                     image0.png 0 
+                     image1.png 13
+                     .............
+                   Another example might look like this:
+                     image0.png,0, 
+                     image1.png,13,
+                     ..............
+- *load-profile* - the path of the input profile which is loaded and tuned.
+- *dump-tuned-profile* - the path where the tuned profile is written.
+
+More information can be acquired by typing the following command:
+```
+model-tuner -help
+```
+
+### Extra command line options
+
+There are a couple of extra command line parameters which can be used to tweak the algorithm behavior:
+- *target-accuracy* - The tuning procedure is stopped when the accuracy has reached or surpassed the given
+                      value. A float value between 0.0 and 1.0 is expected. If not specified, the tuning will
+                      run until completion.
+- *max-iter-per-node* - The maximum number of tuning iterations per node (default is 3).
+- *acc-drop-skip* - The accuracy drop for which the tuning of any node is skipped. The default value is 0.05 (5%).
+
+### Command line output
+
+When running this tool the console output will might look like this:
+
+```
+Computing initial accuracy ... 
+Initial accuracy: 81.0180 %
+Number of nodes: 277
+Target accuracy: 100.0000 %
+
+[1/277] Tuning node "broadcast_B_tile0_save__1:0"
+  [1/3] Testing scale = 0.00195
+  Accuracy = 81.0180 %
+  Tunning stopped for this node (no effect)
+Best accuracy : 81.0180 %
+Iteration time: 34 seconds
+Remaining time: 2 hours 36 minutes
+
+[2/277] Tuning node "W52__1:0"
+  [1/3] Testing scale = 0.06250
+  Accuracy = 81.4422 %
+  [2/3] Testing scale = 0.03125
+  Accuracy = 79.0032 %
+  [3/3] Testing scale = 0.01562
+  Accuracy = 67.1262 %
+Best accuracy : 81.4422 %
+Iteration time: 68 seconds
+Remaining time: 5 hours 11 minutes
+
+..................................
+..................................
+
+[277/277] Tuning node "W42__1:0"
+  [1/3] Testing scale = 0.01562
+  Accuracy = 90.2439 %
+  Tunning stopped for this node
+Best accuracy : 97.9852 %
+Iteration time: 66 seconds
+Remaining time: 0 hours 0 minutes
+
+
+Final accuracy: 97.9852 %
+
+Total time: 5 hours 6 minutes
+```
+
+Notes:
+- The quantization tuning procedure is a long procedure: the order of magnitude of the time
+required to run is similar to training. For example, the model used for tuning in the above example
+is a medium size model (e.g. similar to a Mobile Net with a scale factor of 0.5). For this reason
+the tool also prints an estimated remaining time for running the tuning (the estimation gets
+better after calibrating more nodes).
+- When the estimated time for the tuning is too much, one might use a smaller tuning dataset.
diff --git a/include/glow/Importer/ProtobufLoader.h b/include/glow/Importer/ProtobufLoader.h
@@ -165,8 +165,8 @@ class ProtobufLoader {
   bool hasNodeByName(llvm::StringRef name) const;
 
   /// Constructs new ProtobufLoader object. It will populate the network into
-  /// \p F. The list \p types and \p names are used to initialized the inputs
-  /// and outputs with specific names and types. If \p errPtr is not null then
+  /// \p F. The list \p types and \p names are used to initialize the inputs
+  /// of the model with specific names and types. If \p errPtr is not null then
   /// if an error occurs it will get assigned there otherwise if an error
   /// occurs it will abort.
   ProtobufLoader(llvm::ArrayRef<const char *> tensorNames,
@@ -191,16 +191,21 @@ class ProtobufLoader {
   /// that there is only one output, returns Error otherwise. For image
   /// classification, this single final output is usually the result of the
   /// last softmax or regression layer.
-  Expected<Placeholder *> getSingleOutput() {
-    RETURN_ERR_IF_NOT(outputVarsByName_.size() == 1,
-                      "There must be only one output.");
-    return outputVarsByName_.begin()->second;
-  }
+  Expected<Placeholder *> getSingleOutput() const;
+
+  /// \returns the single input of the network. The function assumes that there
+  /// is only one input, returns Error otherwise. For most of the models the
+  /// single input is usually an image tensor.
+  Expected<Placeholder *> getSingleInput() const;
 
   /// \returns the Placeholder for the external output with \p name.
   /// \pre outputVarsByName_.find(name) != outputVarsByName_.end()
   Expected<Placeholder *> getOutputByName(llvm::StringRef name) const;
 
+  /// \returns the Placeholder for the external input with \p name.
+  /// \pre inputVarsByName_.find(name) != inputVarsByName_.end()
+  Expected<Placeholder *> getInputByName(llvm::StringRef name) const;
+
   /// \returns True if the operator with name \p typeName having input node
   /// list as \p inputs is constant foldable.
   bool isConstantFoldable(llvm::ArrayRef<NodeValue> inputs,

diff --git a/include/glow/Quantization/Base/Base.h b/include/glow/Quantization/Base/Base.h
@@ -255,6 +255,15 @@ Tensor tensor4BitsFusedRowwiseDequantization(const Tensor &input);
 QuantizationTransform32To8 quantizeScaleOffset32To8(float scale,
                                                     int32_t offset);
 
+/// Function to get the quantized range for a given precision type \p qTy.
+/// \returns the range as a (min, max) pair.
+std::pair<int64_t, int64_t> getQuantizationRange(ElemKind qTy);
+
+/// Function to validate that the given quantization parameters \p qParams
+/// comply with the given quantization \p schema and precision \p qTy.
+void validateQuantizationParams(TensorQuantizationParams qParams, Schema schema,
+                                ElemKind qTy);
+
 /// Calculate TensorQuantizationParams based on the clipped \p min and \p max
 /// floating point range and using the base quantization type \p qTy and the
 /// quantization method described by \p schema.

diff --git a/lib/Importer/ProtobufLoader.cpp b/lib/Importer/ProtobufLoader.cpp
@@ -84,6 +84,18 @@ bool ProtobufLoader::hasConstantByName(llvm::StringRef name) const {
   return getConstantByNameOrNull(name) != nullptr;
 }
 
+Expected<Placeholder *> ProtobufLoader::getSingleOutput() const {
+  RETURN_ERR_IF_NOT(outputVarsByName_.size() == 1,
+                    "There must be only one output.");
+  return outputVarsByName_.begin()->second;
+}
+
+Expected<Placeholder *> ProtobufLoader::getSingleInput() const {
+  RETURN_ERR_IF_NOT(inputVarsByName_.size() == 1,
+                    "There must be only one input.");
+  return inputVarsByName_.begin()->second;
+}
+
 Expected<Placeholder *>
 ProtobufLoader::getOutputByName(llvm::StringRef name) const {
   auto it = outputVarsByName_.find(name);
@@ -94,6 +106,16 @@ ProtobufLoader::getOutputByName(llvm::StringRef name) const {
   return it->second;
 }
 
+Expected<Placeholder *>
+ProtobufLoader::getInputByName(llvm::StringRef name) const {
+  auto it = inputVarsByName_.find(name);
+  RETURN_ERR_IF_NOT(
+      it != inputVarsByName_.end(),
+      llvm::Twine("No external input Variable was registered with name ", name)
+          .str());
+  return it->second;
+}
+
 NodeValue
 ProtobufLoader::getNodeValueByNameOrNullNodeValue(llvm::StringRef name) const {
   auto it = nodeValueByName_.find(name);
@@ -187,11 +209,10 @@ ProtobufLoader::ProtobufLoader(llvm::ArrayRef<const char *> tensorNames,
     for (size_t i = 0, e = tensorNames.size(); i < e; i++) {
       RETURN_ERR_IF_NOT(!hasNodeByName(tensorNames[i]),
                         "Input names have duplicate");
-      auto placeholderOrErr =
-          createAndRegisterPlaceholder(tensorNames[i], types[i]);
-      if (!placeholderOrErr) {
-        return placeholderOrErr.takeError();
-      }
+      Placeholder *placeholder;
+      ASSIGN_VALUE_OR_RETURN_ERR(
+          placeholder, createAndRegisterPlaceholder(tensorNames[i], types[i]));
+      inputVarsByName_.try_emplace(tensorNames[i], placeholder);
     }
     return Error::success();
   };

diff --git a/lib/Quantization/Base/Base.cpp b/lib/Quantization/Base/Base.cpp
@@ -265,11 +265,7 @@ QuantizationTransform32To8 quantizeScaleOffset32To8(float scale,
                                     offset);
 }
 
-TensorQuantizationParams chooseQuantizationParams(float min, float max,
-                                                  Schema schema, ElemKind qTy) {
-  assert(min <= max && "min must not be bigger than max");
-
-  // Compute the quantized int range.
+std::pair<int64_t, int64_t> getQuantizationRange(ElemKind qTy) {
   // Pick int64_t in order to cover the uint32_t range.
   int64_t qmin;
   int64_t qmax;
@@ -310,6 +306,45 @@ TensorQuantizationParams chooseQuantizationParams(float min, float max,
   default:
     llvm_unreachable("Quantized type not supported");
   }
+  return std::pair<int64_t, int64_t>(qmin, qmax);
+}
+
+void validateQuantizationParams(TensorQuantizationParams qParams, Schema schema,
+                                ElemKind qTy) {
+
+  // Get the quantized range.
+  auto minMaxPair = getQuantizationRange(qTy);
+  int64_t qmin = minMaxPair.first;
+  int64_t qmax = minMaxPair.second;
+
+  // Validate params.
+  (void)(qmin);
+  (void)(qmax);
+  assert((qmin <= qParams.offset) && (qParams.offset <= qmax) &&
+         "The offset must be within the quantized range");
+  if (schema == quantization::Schema::Symmetric) {
+    assert((qParams.offset == 0) &&
+           "Symmetric quantization should have offset 0");
+  } else if (schema == quantization::Schema::SymmetricWithUnsigned) {
+    assert((qParams.offset == qmin || qParams.offset == 0) &&
+           "SymmetricWithUnsigned quantization should have offset 0 or qmin");
+  } else if (schema == quantization::Schema::SymmetricWithPower2Scale) {
+    assert((qParams.offset == 0) &&
+           "SymmetricWithPower2Scale quantization should have offset 0");
+    assert(isFloatPowerOf2(qParams.scale) &&
+           "SymmetricWithPower2Scale quantization parameter should be a power "
+           "of 2");
+  }
+}
+
+TensorQuantizationParams chooseQuantizationParams(float min, float max,
+                                                  Schema schema, ElemKind qTy) {
+  assert(min <= max && "min must not be bigger than max");
+
+  // Get the quantized range.
+  auto minMaxPair = getQuantizationRange(qTy);
+  int64_t qmin = minMaxPair.first;
+  int64_t qmax = minMaxPair.second;
 
   // We extend the [min, max] interval to ensure that it contains 0.
   // Otherwise, we would not meet the requirement that 0 be an exactly
@@ -403,27 +438,7 @@ TensorQuantizationParams chooseQuantizationParams(float min, float max,
   }
 
   TensorQuantizationParams result{static_cast<float>(scale), nudgedZeroPoint};
-  // The only valid offset for symmetric quantization is 0.
-  assert((result.offset == 0 || schema != quantization::Schema::Symmetric) &&
-         "Symmetric quantization should be centered on 0");
-
-  // The only valid offsets for symmetric quantization with unsigned support are
-  // 0 and qmin.
-  assert((result.offset == qmin || result.offset == 0 ||
-          schema != quantization::Schema::SymmetricWithUnsigned) &&
-         "Symmetric quantization with unsigned should be centered on 0 or on "
-         "-qmin");
-
-  // For SymmetricWithPower2Scale schema the offset should be 0.
-  assert((result.offset == 0 ||
-          schema != quantization::Schema::SymmetricWithPower2Scale) &&
-         "Symmetric quantization should be centered on 0");
-
-  // For SymmetricWithPower2Scale schema the scale should be a power of 2.
-  assert((isFloatPowerOf2(result.scale) ||
-          schema != quantization::Schema::SymmetricWithPower2Scale) &&
-         "Scale quantization parameter should be a power of 2");
-
+  validateQuantizationParams(result, schema, qTy);
   return result;
 }
 

diff --git a/tools/loader/CMakeLists.txt b/tools/loader/CMakeLists.txt
@@ -82,3 +82,21 @@ target_link_libraries(model-compiler
                         GraphOptimizer
                         Quantization
                         LLVMSupport)
+
+add_executable(model-tuner
+  Loader.cpp
+  LoaderUtils.cpp
+  ModelTuner.cpp)
+
+target_link_libraries(model-tuner
+                      PRIVATE
+                        Backends
+                        Base
+                        Converter
+                        Graph
+                        HostManager
+                        Importer
+                        ExecutionEngine
+                        GraphOptimizer
+                        Quantization
+                        LLVMSupport)
diff --git a/tools/loader/ImageClassifier.cpp b/tools/loader/ImageClassifier.cpp
@@ -252,7 +252,8 @@ buildAndCompileAndGetInAndOutPair(Loader &loader, PlaceholderBindings &bindings,
 
   // Compile the model, and perform quantization/emit a bundle/dump debug info
   // if requested from command line.
-  CompilationContext cctx{&bindings};
+  CompilationContext cctx = loader.getCompilationContext();
+  cctx.bindings = &bindings;
   cctx.backendOpts.autoInstrument = autoInstrument;
   loader.compile(cctx);