Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OrtPerfTuning failed when running with the example. #475

Closed
wunianqing opened this issue Aug 10, 2023 · 10 comments
Closed

OrtPerfTuning failed when running with the example. #475

wunianqing opened this issue Aug 10, 2023 · 10 comments

Comments

@wunianqing
Copy link

I was trying to run the example DML squeezenet. But I got an error.

[2023-08-10 14:09:39,485] [DEBUG] [engine.py:539:resolve_goals] Resolving goals: {'latency': {'avg': None}}
[2023-08-10 14:09:39,487] [DEBUG] [engine.py:558:resolve_goals] No baseline got as no goal is provided the the goal is threshold
[2023-08-10 14:09:39,491] [DEBUG] [engine.py:460:run_search] Step 1 with search point {'OnnxConversion': {}, 'OnnxFloatToFloat16': {}, 'OrtPerfTuning': {}} ...
[2023-08-10 14:09:39,493] [DEBUG] [engine.py:725:_run_passes] Running pass OnnxConversion
Using cache found in C:\Users\yzhou/.cache\torch\hub\pytorch_vision_v0.10.0
============== Diagnostic Run torch.onnx.export version 2.0.1+cpu ==============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

[2023-08-10 14:09:41,043] [DEBUG] [engine.py:725:_run_passes] Running pass OnnxFloatToFloat16
e:\Users\yzhou\onnx projects\Olive_test\venv\lib\site-packages\onnxconverter_common\float16.py:43: UserWarning: the float32 number 1.7538578589437748e-08 will be truncated to 1e-07
  warnings.warn("the float32 number {} will be truncated to {}".format(pos_min, min_positive_val))
[2023-08-10 14:09:41,302] [DEBUG] [engine.py:725:_run_passes] Running pass OrtPerfTuning
[2023-08-10 14:09:42,588] [INFO] [perf_tuning.py:72:tune_onnx_model] Run tuning for: [('provider', 'DmlExecutionProvider'), ('execution_mode', 'ORT_SEQUENTIAL'), ('ort_opt_level', 99), ('io_bind', False)]
ERROR:root:Optimization failed for tuning combo ('DmlExecutionProvider', 'ORT_SEQUENTIAL', 99, False)
[2023-08-10 14:11:15,823] [INFO] [perf_tuning.py:81:tune_onnx_model] Best result: {'test_name': 'pretuning', 'latency_ms': 2.4117}
[2023-08-10 14:11:15,844] [DEBUG] [engine.py:898:_evaluate_model] Evaluating model ...
[2023-08-10 14:11:16,349] [DEBUG] [footprint.py:90:resolve_metrics] There is no goal set for metric: {metric_name}.
[2023-08-10 14:11:16,350] [DEBUG] [footprint.py:90:resolve_metrics] There is no goal set for metric: {metric_name}.
[2023-08-10 14:11:16,350] [DEBUG] [engine.py:765:_run_passes] Signal: {'latency-avg': 13.23219, 'latency-max': 14.0571, 'latency-min': 11.7933}
[2023-08-10 14:11:16,355] [INFO] [footprint.py:168:get_pareto_frontier] pareto frontier points: 2_OrtPerfTuning-1-db4d7470ffd292aaa61802d59d85b66a-gpu-cpu {'latency-avg': 13.23219, 'latency-max': 14.0571, 'latency-min': 11.7933}
[2023-08-10 14:11:16,356] [INFO] [engine.py:475:run_search] Output all 1 models
[2023-08-10 14:11:16,501] [DEBUG] [engine.py:539:resolve_goals] Resolving goals: {'latency': {'avg': None}}
[2023-08-10 14:11:16,502] [DEBUG] [engine.py:558:resolve_goals] No baseline got as no goal is provided the the goal is threshold
[2023-08-10 14:11:16,504] [DEBUG] [engine.py:460:run_search] Step 1 with search point {'OnnxConversion': {}, 'OnnxFloatToFloat16': {}, 'OrtPerfTuning': {}} ...
[2023-08-10 14:11:16,504] [DEBUG] [engine.py:725:_run_passes] Running pass OnnxConversion
[2023-08-10 14:11:16,507] [DEBUG] [engine.py:789:_run_pass] Loading model from cache ...
[2023-08-10 14:11:16,510] [DEBUG] [engine.py:725:_run_passes] Running pass OnnxFloatToFloat16
[2023-08-10 14:11:16,513] [DEBUG] [engine.py:789:_run_pass] Loading model from cache ...
[2023-08-10 14:11:16,527] [DEBUG] [engine.py:725:_run_passes] Running pass OrtPerfTuning
[2023-08-10 14:11:16,539] [DEBUG] [engine.py:789:_run_pass] Loading model from cache ...
[2023-08-10 14:11:16,542] [DEBUG] [engine.py:898:_evaluate_model] Evaluating model ...
[2023-08-10 14:11:16,550] [DEBUG] [engine.py:902:_evaluate_model] Loading evaluation from cache ...
[2023-08-10 14:11:16,551] [DEBUG] [footprint.py:90:resolve_metrics] There is no goal set for metric: {metric_name}.
[2023-08-10 14:11:16,552] [DEBUG] [footprint.py:90:resolve_metrics] There is no goal set for metric: {metric_name}.
[2023-08-10 14:11:16,552] [DEBUG] [engine.py:765:_run_passes] Signal: {'latency-avg': 13.23219, 'latency-max': 14.0571, 'latency-min': 11.7933}
[2023-08-10 14:11:16,557] [INFO] [footprint.py:168:get_pareto_frontier] pareto frontier points: 2_OrtPerfTuning-1-db4d7470ffd292aaa61802d59d85b66a-gpu-cpu {'latency-avg': 13.23219, 'latency-max': 14.0571, 'latency-min': 11.7933}
[2023-08-10 14:11:16,557] [INFO] [engine.py:475:run_search] Output all 1 models
[2023-08-10 14:11:16,559] [INFO] [engine.py:318:run] No packaging config provided, skip packaging artifacts

And it only generated two models. I think it should generate 3. There should be one for OrtPerfTuning.
image

No change to the code. Does anybody have the same issue?

@jambayk
Copy link
Contributor

jambayk commented Aug 10, 2023

This happened because it found that the original model to the perf_tuning pass had the best inference latency. You can see it in the logs where it says "Best result: {'test_name': 'pretuning', 'latency_ms': 2.4117}". So the output model of the perf_tuning pass is the same as the input. If you open the "2_OrtPerTuning-....json" file, you should see that me model path is the same as "1_OnnxFloat....".

Update: Actually, please ignore the above. The OrtPerfTuning pass never creates a new .onnx file. Only a .json file pointing to the path of the original .onnx model and the best inference settings found during the tuning. So this is expected. You can see this by opening the 2_OrtPerTuning-....json where model_path will be the same as that for 1_OnnxFloat....

@wunianqing
Copy link
Author

OK.
But the log message is not friendly for users like me. Maybe we can use the warning log level and say the model is good enough so PerTuning is skipped?
And if the model is good enough, why choose it to write the example?

@jambayk
Copy link
Contributor

jambayk commented Aug 11, 2023

Sorry, I was not clear in my response. My initial response was incorrect so I strike-throughed the comment before adding an update.

For context, the output model of olive consists of the actual .onnx file and the corresponding inference settings dict that is used to create the onnxruntime inference session.

PerfTuning was not skipped. It ran but chose the original model's inference settings to be the best in the environment you ran it on. On ours, it was not. This did not cause the missing 3rd model. This is expected for perf tuning regardless.

The perf-tuning pass doesn't create a new .onnx file since it's the same as the original .onnx file. What might differ is the inference_setting which can be seen in the 2_OrtPerfTuning-...json file. That file also points you to the .onnx file (=1_OnnxFloat.../model.onnx).

@wunianqing
Copy link
Author

[2023-08-10 14:09:42,588] [INFO] [perf_tuning.py:72:tune_onnx_model] Run tuning for: [('provider', 'DmlExecutionProvider'), ('execution_mode', 'ORT_SEQUENTIAL'), ('ort_opt_level', 99), ('io_bind', False)]
ERROR:root:Optimization failed for tuning combo ('DmlExecutionProvider', 'ORT_SEQUENTIAL', 99, False)
[2023-08-10 14:11:15,823] [INFO] [perf_tuning.py:81:tune_onnx_model] Best result: {'test_name': 'pretuning', 'latency_ms': 2.4117}

So, you mean this error is trying to say that no better solution was found?

PerfTuning was not skipped. It ran but chose the original model's inference settings to be the best in the environment you ran it on. On ours, it was not. This did not cause the missing 3rd model. This is expected for perf tuning regardless.

Is that possible? The code will download the original model from the Internet. I don't think we could have different results unless they changed the file on the Internet.

@jambayk
Copy link
Contributor

jambayk commented Aug 11, 2023

So, you mean this error is trying to say that no better solution was found?

This error does mean that specific set of inference settings combo failed. But this was not the reason a third model file was not created. since the perf tuning pass ran other combos if any.
I think I might have conflated the two issues here. The perf tuning pass ran fine, but some of the combos it looked at failed. We might have to look into the second issue more to see why that happened.

Is that possible? The code will download the original model from the Internet. I don't think we could have different results unless they changed the file on the Internet.

Yes, this is possible for perf tuning. Per tuning searches through different settings for onnx runtime inference session and chooses the one with the best latency. The latency depends on many factors such as your hardware, etc.

@wunianqing
Copy link
Author

Thanks for the explanation. I now understand why no model was generated for PerTuning.
But where's this error come from? How can I solve it? Do I need to create a new issue for this?

@jambayk
Copy link
Contributor

jambayk commented Aug 11, 2023

We can use this issue itself. Did you install Olive from pypi or use a specific tag?
There was a recent change in olive where the error log also shows the error trace https://github.com/microsoft/Olive/blob/main/olive/passes/onnx/perf_tuning.py#L154

Could you install olive from source https://microsoft.github.io/Olive/getstarted/installation.html#install-from-source
and run the example again. The log will have more information about why this exception was triggered.

@wunianqing
Copy link
Author

wunianqing commented Aug 11, 2023

I installed it via pip install olive-ai[directml]

@jambayk
Copy link
Contributor

jambayk commented Aug 11, 2023

Please try with pip install git+https://github.com/microsoft/Olive#egg=olive-ai[directml] to use the latest Olive code with better logging. We haven't released a new version to pypi in the past couple of months so it is behind main by a lot.

@wunianqing
Copy link
Author

pip install git+https://github.com/microsoft/Olive#egg=olive-ai[directml] solves the problem.

[2023-08-14 11:08:18,424] [DEBUG] [engine.py:942:_run_pass] Loading model from cache ...
[2023-08-14 11:08:18,434] [INFO] [engine.py:934:_run_pass] Running pass OrtPerfTuning
[2023-08-14 11:08:18,438] [DEBUG] [engine.py:796:_prepare_non_local_model] Model path is None, local or string name. No need to prepare
[2023-08-14 11:08:19,638] [INFO] [perf_tuning.py:89:tune_onnx_model] Run tuning for: [('provider', 'DmlExecutionProvider'), ('execution_mode', 'ORT_SEQUENTIAL'), ('ort_opt_level', 99), ('io_bind', False)]
[2023-08-14 11:08:30,341] [DEBUG] [perf_tuning.py:95:tune_onnx_model] Tuning result: 2.41861
[2023-08-14 11:08:30,341] [DEBUG] [perf_tuning.py:95:tune_onnx_model] Tuning result: 2.43242
[2023-08-14 11:08:30,351] [DEBUG] [perf_tuning.py:95:tune_onnx_model] Tuning result: 2.4001
[2023-08-14 11:08:30,351] [DEBUG] [perf_tuning.py:95:tune_onnx_model] Tuning result: 2.51
[2023-08-14 11:08:30,351] [DEBUG] [perf_tuning.py:95:tune_onnx_model] Tuning result: 2.501
[2023-08-14 11:08:30,351] [DEBUG] [perf_tuning.py:95:tune_onnx_model] Tuning result: 2.3979
[2023-08-14 11:08:30,351] [DEBUG] [perf_tuning.py:95:tune_onnx_model] Tuning result: 2.39171
[2023-08-14 11:08:30,351] [DEBUG] [perf_tuning.py:95:tune_onnx_model] Tuning result: 2.47895
[2023-08-14 11:08:30,351] [DEBUG] [perf_tuning.py:95:tune_onnx_model] Tuning result: 2.37015
[2023-08-14 11:08:30,359] [INFO] [perf_tuning.py:98:tune_onnx_model] Best result: {'test_name': "execution_provider_[('DmlExecutionProvider', {})]_session_options_{'execution_mode': 'ORT_SEQUENTIAL', 'graph_optimization_level': 99, 'extra_session_config': None, 'inter_op_num_threads': 6, 'intra_op_num_threads': 9}__io_bind_False", 'execution_provider': [('DmlExecutionProvider', {})], 'session_options': {'execution_mode': 'ORT_SEQUENTIAL', 'graph_optimization_level': 99, 'extra_session_config': None, 'inter_op_num_threads': 6, 'intra_op_num_threads': 9}, 'io_bind': False, 'latency_ms': 2.37015}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants