# How to leverage Olive to search optimal optimization among different EPs

In many of the cases, customers are not familiar with the different EPs and their capabilities.

For example:
1. With CUDAExecutionProvider, user cannot enable `trt_fp16_enable` in `PerfTuning`, but in TensorrtExecutionProvider, it is suggested enable `trt_fp16_enable` in `PerfTuning`.
3. With CUDAExecutionProvider, int8 quantization is not suggested in Onnxruntime.
4. With CUDAExecutionProvider, sometimes the `opt_level=2` is better for model A but `opt_level=1` is better for model B.
5. ...

In this notebook, we will show how to use Olive to search the optimal optimization among different EPs for a given model, evaluation criteria and needed systems plus EPs.

### Prerequisites
Before running this notebook, please make sure you have installed the Olive package. Please refer to [this](https://github.com/microsoft/Olive?tab=readme-ov-file#installation) for more details.

### Olive Optimizations Configs

#### Input model
In this notebook, we will use a simple `bert-base-uncased` model as an example:

```json
"input_model":{
    "type": "PyTorchModel",
    "config": {
        "hf_config": {
            "model_name": "Intel/bert-base-uncased-mrpc",
            "task": "text-classification"
        }
    }
}
```

With the above input model, Olive will download the `bert-base-uncased-mrpc` model from Huggingface model hub. The model is a text classification model.

### Data configurations

```json
"data_configs": [
    {
        "name": "glue_mrpc",
        "type": "HuggingfaceContainer",
        "params_config": {
            "data_name": "glue",
            "subset": "mrpc",
            "split": "validation",
            "input_cols": [
                "sentence1",
                "sentence2"
            ],
            "label_cols": [
                "label"
            ],
            "batch_size": 1
        }
    }
]
```

Above json object defines the corresponding dataset from the GLUE dataset with MRPC subset. The input data is a pair of sentences and the output is a label.

#### Evaluation Criteria
```json
"evaluators": {
    "common_evaluator": {
        "metrics":[
            {
                "name": "accuracy",
                "type": "accuracy",
                "backend": "huggingface_metrics",
                "data_config": "glue_mrpc",
                "sub_types": [
                    {"name": "accuracy", "priority": 1, "goal": {"type": "max-degradation", "value": 0.01}},
                    {"name": "f1"}
                ]
            },
            {
                "name": "latency",
                "type": "latency",
                "data_config": "glue_mrpc",
                "sub_types": [
                    {"name": "avg", "priority": 2, "goal": {"type": "percent-min-improvement", "value": 20}},
                    {"name": "max"},
                    {"name": "min"}
                ]
            }
        ]
    }
}
```
We use `accuracy` and `latency` as the evaluation criteria. For `accuracy`, we use `accuracy` and `f1` as the sub-metrics. For `latency`, we use `avg`, `max` and `min` as the sub-metrics. Note that these two kinds of metrics own different goals. For `accuracy`, we want to maximize the `accuracy` and `f1`. For `latency`, we want to minimize the `avg`, `max` and `min` latency.


#### Devices

We use `local_system` as the device in this notebook. We enable `CUDAExecutionProvider` and `TensorrtExecutionProvider` in the `accelerators` field. Olive will search different optimization configs among these two EPs.

```json
"systems": {
    "local_system": {
        "type": "LocalSystem",
        "config": {
            "accelerators": [
                {
                    "device": "gpu",
                    "execution_providers": [
                        "CUDAExecutionProvider",
                        "TensorrtExecutionProvider"
                    ]
                }
            ]
        }
    }
}
```

#### Engine and search strategy

Engine is used to manage the optimization process where we run optimization on host device, and run evaluation on target device.
Search strategy is used to search the optimal optimization among different EPs. In this notebook, we use `joint` as the `execution_order` and `tpe` as the `search_algorithm`. We set the `num_samples` to 1 and `seed` to 0.

```json
"engine": {
    "search_strategy": {
        "execution_order": "joint",
        "search_algorithm": "tpe",
        "search_algorithm_config": {
            "num_samples": 1,
            "seed": 0
        }
    },
    "evaluator": "common_evaluator",
    "host": "local_system",
    "target": "local_system",
    "cache_dir": "cache",
    "output_dir": "models/bert_gpu"
}
```

### Start Optimization

In [2]:
! python -m olive.workflows.run --config bert_auto_opt_gpu.json

[2024-04-18 17:28:05,856] [INFO] [run.py:261:run] Loading Olive module configuration from: /home/dummy_user/venv/lib/python3.8/site-packages/olive/olive_config.json
[2024-04-18 17:28:05,857] [INFO] [run.py:267:run] Loading run configuration from: bert_auto_opt_gpu.json
2024-04-18 17:28:06.856491: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
[2024-04-18 17:28:16,070] [INFO] [accelerator.py:336:create_accelerators] Running workflow on accelerator specs: gpu-cuda,gpu-tensorrt
[2024-04-18 17:28:16,138] [INFO] [engine.py:106:initialize] Using cache directory: cache
[2024-04-18 17:28:16,139] [INFO] [engine.py:262:run] Running Olive on accelerator: gpu-cuda
[2024-04-18 17:28:16,263] [INFO] [engine.py:324:run_accelerator] Input model evaluation results: {
  "

Based on the above olive running history, we can see that:
1. Olive searched the configs for several rounds and found the optimal optimization and packed it with zip format to save.
2. During the search process, we can see that the invalid config are pruned and the valid configs are evaluated.
3. The best config is saved in the `models/bert_gpu` folder. And here is comparison between the output model and input model.

| model type | accuracy-accuracy | accuracy-f1 | latency-avg | latency-max | latency-min |
| --- | --- | --- | --- | --- | --- |
| Pytorch | 0.8603 | 0.9042 | 12.5278 | 12.8072 | 12.2083 |
| Olive Optimization | 0.8603 | 0.9042 | 1.3364 | 1.3426 | 1.3302 |
