v1.1.0: ORTTrainer, Seq2SeqORTTrainer, ONNX Runtime optimization and quantization API improvements
ORTTrainer and Seq2SeqORTTrainer
The ORTTrainer and Seq2SeqORTTrainer are two newly experimental classes.
- Both
ORTTrainerandSeq2SeqORTTrainerwere created to have a similar user-facing API as theTrainerandSeq2SeqTrainerof the Transformers library. ORTTrainerallows the usage of the ONNX Runtime backend to train a given PyTorch model in order to accelerate training. ONNX Runtime will run the forward and backward passes using an optimized automatically-exported ONNX computation graph, while the rest of the training loop is executed by native PyTorch.ORTTrainerallows the usage of ONNX Runtime inferencing during both the evaluation and the prediction step.- For
Seq2SeqORTTrainer, ONNX Runtime inferencing is incompatible with--predict_with_generate, as the generate method is not supported yet.
ONNX Runtime optimization and quantization APIs improvements
The ORTQuantizer and ORTOptimizer classes underwent a massive refactoring that should allow a simpler and more flexible user-facing API.
- Addition of the possibility to iteratively compute the quantization activation ranges when applying static quantization by using the
ORTQuantizermethodpartial_fit. This is especially useful when using memory-hungry calibration methods such as Entropy and Percentile methods. - When using the MinMax calibration method, it is now possible to compute the moving average of the minimum and maximum values representing the activations quantization ranges instead of the global minimum and maximum (feature available with onnxruntime v1.11.0 or higher).
- The classes
OptimizationConfig,QuantizationConfigandCalibrationConfigwere added in order to better segment the different ONNX Runtime related parameters instead of having one unique configurationORTConfig. - The
QuantizationPreprocessorclass was added in order to find the nodes to include and / or exclude from quantization, by finding the nodes following a given pattern (such as the nodes forming LayerNorm for example). This is particularly useful in the context of static quantization, where the quantization of modules such as LayerNorm or GELU are responsible of important drop in accuracy.