v1.4.0: ORTQuantizer and ORTOptimizer refactorization

echarlaix released this 08 Sep 17:56

· 1042 commits to main since this release

1b08fb5

ONNX Runtime

Refactorization of ORTQuantizer (#270) and ORTOptimizer (#294)
Add ONNX Runtime fused Adam Optimizer (#295)
Add ORTModelForCustomTasks allowing ONNX Runtime inference support for custom tasks (#303)
Add ORTModelForMultipleChoice allowing ONNX Runtime inference for models with multiple choice classification head (#358)

Torch FX

Add FuseBiasInLinear a transformation that fuses the weight and the bias of linear modules (#253)

Improvements and bugfixes

Enable the possibility to disregard the precomputed past_key_values during ONNX Runtime inference of Seq2Seq models (#241)
Enable node exclusion from quantization for benchmark suite (#284)
Enable possibility to use a token authentication when loading a calibration dataset (#289)
Fix optimum pipeline when no model is given (#301)

Assets 2