Skip to content

Olive-ai 0.5.0

Compare
Choose a tag to compare
@xiaoyu-work xiaoyu-work released this 07 Mar 23:48

Examples

The following examples are added:

Passes (optimization techniques)

New Passes

  • PyTorch
    • Introduce GenAIModelExporter pass to export a PyTorch model using GenAI exporter.
    • Introduce LoftQ pass which performs model fine-tuning using the LoftQ initialization proposed in https://arxiv.org/abs/2310.08659.
  • ONNXRuntime
    • Introduce DynamicToFixedShape pass to convert dynamic shape to fixed shape for ONNX model.
    • Introduce OnnxOpVersionConversion pass to convert an existing ONNX model with another target opset.
    • [QNN-EP] Add the option of prepare_qnn_config:bool for quantization under QNN-EP where the int16/uint16 are supported both for weights and activation.
    • [QNN-EP] Introduce QNNPreprocess pass to preprocess the model before quantization.
  • QNN
    • Introduce QNNConversion pass to convert models to QNN C++ model.
    • Introduce QNNContextBinaryGenerator pass to generate the context binary from a compiled model library using a specific backend.
    • Introduce QNNModelLibGenerator pass to compile the C++ model into a model library for the desired target.

Updates

  • OnnxConversion
    • Support both past_key_values.index.key/value and past_key_value.index.
  • OptimumConversion
    • Provide parameter components if the user wants to export only some models such as decoder_model and decoder_with_past_model.
    • Uses the default exporter args and behavior of the underlying optimum version. For versions 1.14.0+, this means legacy=False and no_post_process=False. User must provide them using extra_args if legacy behavior is desired.
  • OpenVINO
    • Upgrade OpenVINO API to 2023.2.0.
  • OrtPerTuning
    • Add tunable_op_enable and tunable_op_tuning_enable for ROCM ep to speed up the performance.
  • LoRA/QLoRA
    • Support bfloat16 with ort-training.
    • Support resuming training from checkpoint by
      • resume_from_checkpoint option.
      • overwrite_output_dir option.
  • MoEExpertsDistributor
    • Add option to configure number of parallel jobs.

Engine

  • As for Zipfile packaging, add models rank json file. This file ranks all output models from different EPs. This json file includes model_config and metrics.
  • Add Auto Optimizer which is a tool that can be used to automatically search Olive passes combination.

System

  • Add hf_token support for Olive systems.
  • AzureMLSystem
    • Olive config file will be uploaded to AML jobs under codes folder.
    • Support adding tags to the AML jobs.
    • Support using existing AML workspace Environment for AzureMLSystem.
  • DockerSystem
    • Support running Olive Pass.
  • PythonEnvironmentSystem requires Olive to be installed in the environment. It can run passes and evaluate models.
  • New IsolatedORTSystem introduced that only supports evaluation of ONNX models. It requires onnxruntime to be installed in the environment. Can be used to for packages like onnxruntime-qnn which can only be run on Windows ARM64 python environment.

Data

  • Add AML resource support for data configs.
  • Add audio classification data preprocess function.

Model

  • Rename model_loading_args to from_pretrained_args in hf_config.

Metrics

  • Add throughput metric support.

Dependencies:

Support onnxruntime 1.17.1.