v1.0.0: ONNX Runtime optimization and quantization support
ONNX Runtime support
- An
ORTConfigclass was introduced, allowing the user to define the desired export, optimization and quantization strategies. - The
ORTOptimizerclass takes care of the model's ONNX export as well as the graph optimization provided by ONNX Runtime. In order to create an instance ofORTOptimizer, the user needs to provide anORTConfigobject, defining the export and graph-level transformations informations. Then optimization can be perfomed by calling theORTOptimizer.fitmethod. - ONNX Runtime static and dynamic quantization can also be applied on a model by using the newly added
ORTQuantizerclass. In order to create an instance ofORTQuantizer, the user needs to provide anORTConfigobject, defining the export and quantization informations, such as the quantization approach to use or the activations and weights data types. Then quantization can be applied by calling theORTQuantizer.fitmethod.
Additionnal features for Intel Neural Compressor
We have also added a new class called IncOptimizer which will take care of combining the pruning and the quantization processes.