Crashes in the middle of the optimization process (KeyError: 'throughput') #90

shonigs · 2022-05-19T10:24:27Z

Hi,
The program crashes while optimizing -

Steps to reproduce
installation

wget https://olivewheels.blob.core.windows.net/repo/onnxruntime_olive-0.4.0-py3-none-any.whl
pip install onnxruntime_olive-0.4.0-py3-none-any.whl
pip install --extra-index-url https://olivewheels.azureedge.net/test mlperf_loadgen
pip install --extra-index-url https://olivewheels.azureedge.net/test onnxruntime_gpu_tensorrt==1.11.0

Use

from olive.optimization_config import OptimizationConfig
from olive.optimize import optimize

opt_config = OptimizationConfig(
    model_path="models.onnx",
    result_path="opt_throughput_result",
    throughput_tuning_enabled=True,
    inputs_spec={
        "input": [
            -1,
            3,
            512,
            512,
        ]
    },
    max_latency_percentile=0.95,
    max_latency_ms=1000,
    threads_num=4,
    dynamic_batching_size=32,
    min_duration_sec=10,
)
if __name__ == "__main__":
    result = optimize(opt_config)

This runs for sometime, then crashes

2022-05-19 09:19:09,930 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:19:09,943 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:19:11,625 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:19:11,638 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:19:13,204 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:19:13,224 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:21:07,504 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:21:07,675 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:21:14,154 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:21:14,179 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:24:23,212 - olive.optimization.tuning_process - ERROR - Optimization failed for tuning combo (None, None, None, 'TensorrtExecutionProvider', <ExecutionMode.ORT_SEQUENTIAL: 0>, 99)
2022-05-19 09:24:28,503 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:24:28,809 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:24:34,735 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:24:34,761 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:27:43,921 - olive.optimization.tuning_process - ERROR - Optimization failed for tuning combo (None, None, None, 'TensorrtExecutionProvider', <ExecutionMode.ORT_PARALLEL: 1>, 99)
2022-05-19 09:27:49,552 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:27:49,774 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:27:55,796 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:27:55,822 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:29:40,752 - olive.optimization.tuning_process - ERROR - Optimization failed for tuning combo (None, None, None, 'CUDAExecutionProvider', <ExecutionMode.ORT_SEQUENTIAL: 0>, 99)
2022-05-19 09:29:47,356 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:29:47,603 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:29:52,975 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:29:53,001 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:31:38,742 - olive.optimization.tuning_process - ERROR - Optimization failed for tuning combo (None, None, None, 'CUDAExecutionProvider', <ExecutionMode.ORT_PARALLEL: 1>, 99)
2022-05-19 09:31:44,725 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:31:44,947 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:31:50,856 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:31:50,884 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:34:16,662 - olive.optimization.tuning_process - ERROR - Optimization failed for tuning combo (None, None, None, 'CPUExecutionProvider', <ExecutionMode.ORT_SEQUENTIAL: 0>, 99)
2022-05-19 09:34:22,604 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:34:22,820 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:34:28,909 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:34:28,934 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:36:22,542 - olive.optimization.tuning_process - ERROR - Optimization failed for tuning combo (None, None, None, 'CPUExecutionProvider', <ExecutionMode.ORT_PARALLEL: 1>, 99)
Traceback (most recent call last):
  File "/cnvrg/onnx_opt/onnx_optimization.py", line 23, in <module>
    result = optimize(opt_config)
  File "/usr/local/lib/python3.8/dist-packages/olive/optimize.py", line 36, in optimize
    olive_result = parse_tuning_result(optimization_config, *tuning_results, pretuning_inference_result)
  File "/usr/local/lib/python3.8/dist-packages/olive/optimize.py", line 59, in parse_tuning_result
    best_test_name = max(tuning_results, key=lambda x: x["throughput"])["test_name"]
  File "/usr/local/lib/python3.8/dist-packages/olive/optimize.py", line 59, in <lambda>
    best_test_name = max(tuning_results, key=lambda x: x["throughput"])["test_name"]
KeyError: 'throughput'

I am not sure about the exact issue but could this be maybe be in a try-except so the whole process doesn't fail?

P.S. is there any details about the environment that I should add?

The text was updated successfully, but these errors were encountered:

leqiao-1 · 2022-05-24T02:49:10Z

Hi @shonigs , I tried with the model in notebook tutorials and no issues appeared. I am not sure if the issue is related to your ONNX model. Could you please share the model you used ? Thanks.

kbraun-axio · 2022-08-03T14:15:41Z

Hi, I am getting the same error: KeyError: 'throughput'.

The complete error log is:

ERROR conda.cli.main_run:execute(41): `conda run olive optimize --model_path onnx-object-detection-model.onnx --throughput_tuning_enabled --max_latency_percentile 0.95 --max_latency_ms 100 --threads_num 1 --dynamic_batching_size 1 --min_duration_sec 10 --providers_list cpu` failed. (See above for error)
2022-08-03 12:54:12,827 - olive.__main__ - WARNING - OLive will call "olive setup" to setup environment first
2022-08-03 12:54:13,474 - olive.optimization_config - INFO - Checking the model file...
2022-08-03 12:54:14,821 - olive.optimization_config - INFO - Providers will be tested for optimization: ['CPUExecutionProvider']
2022-08-03 13:06:48,111 - olive.optimization.tuning_process - ERROR - Optimization failed for tuning combo (None, None, None, 'CPUExecutionProvider', <ExecutionMode.ORT_SEQUENTIAL: 0>, 99)
2022-08-03 13:44:02,303 - olive.optimization.tuning_process - ERROR - Optimization failed for tuning combo (None, None, None, 'CPUExecutionProvider', <ExecutionMode.ORT_PARALLEL: 1>, 99)
Traceback (most recent call last):
  File "/home/axio/miniconda3/envs/oonxoptimizer/bin/olive", line 8, in <module>
    sys.exit(main())
  File "/home/axio/miniconda3/envs/oonxoptimizer/lib/python3.7/site-packages/olive/__main__.py", line 438, in main
    options.func(options)
  File "/home/axio/miniconda3/envs/oonxoptimizer/lib/python3.7/site-packages/olive/__main__.py", line 322, in model_opt
    optimize(opt_config)
  File "/home/axio/miniconda3/envs/oonxoptimizer/lib/python3.7/site-packages/olive/optimize.py", line 36, in optimize
    olive_result = parse_tuning_result(optimization_config, *tuning_results, pretuning_inference_result)
  File "/home/axio/miniconda3/envs/oonxoptimizer/lib/python3.7/site-packages/olive/optimize.py", line 59, in parse_tuning_result
    best_test_name = max(tuning_results, key=lambda x: x["throughput"])["test_name"]
  File "/home/axio/miniconda3/envs/oonxoptimizer/lib/python3.7/site-packages/olive/optimize.py", line 59, in <lambda>
    best_test_name = max(tuning_results, key=lambda x: x["throughput"])["test_name"]
KeyError: 'throughput'

I am executing the optimization with conda run -n onnxoptimizer olive optimize --model_path onnx-object-detection-model.onnx --throughput_tuning_enabled --max_latency_percentile 0.95 --max_latency_ms 100 --threads_num 1 --dynamic_batching_size 1 --min_duration_sec 10 --providers_list cpu >& log.txt

The above error message is the contents of log.txt (see final part of the execution command above).

Please find my ONNX model here: https://get.hidrive.com/2qErePEy (Link valid until August 10, 2022)

leqiao-1 · 2022-08-04T05:16:03Z

Hi @kbraun-axio ,
I think this error happended because the max_latency_ms is too small for cpu infernece.
You can augument max_latency_ms, or change the execution provider from cpu to cuda.
Here is test result on my local machine with command olive optimize --model_path onnx-object-detection-model.onnx --throughput_tuning_enabled --max_latency_percentile 0.95 --max_latency_ms 100 --threads_num 1 --dynamic_batching_size 1 --min_duration_sec 10 --providers_list cuda >& log_olive.txt

log_olive.txt

kbraun-axio · 2022-08-04T15:05:46Z

Hi @leqiao-1,
Thanks for your reply and the log output.
I will increase the max_latency_ms and try running the optimization again. I will post the results here.
Unfortunately, our inference machine does not have a Nvidia GPU (we only use a Nvidia GPU in our training server). Therefore, I cannot set the execution provider to CUDA.

kbraun-axio · 2022-08-10T17:53:31Z

Hi @leqiao-1,

Today, I tried to run the optimization again. This time, I increased the max_latency_ms to 10,000. However, I got the same error.
I attached the output log and the olive_opt_results folder (without the optimized model because it is too large) for you.

Do you think max_latency_ms of 10,000 is still not enough?

Inference with the onnx runtime and the same onnx model that I am trying to optimize takes about 7.5 seconds.

log_olive.txt
olive_opt_result.zip

leqiao-1 · 2022-08-11T06:54:37Z

Hi @kbraun-axio
The latency depends on the machine. On my side, the inference takes about 400ms with cpu. If you want to try, you can still increase the max_latency_ms. However, enev if it works, it may take long time to run the troughput optimization, since the latency is too long.

kbraun-axio · 2022-08-11T10:18:53Z

Okay, thank you. The machine on which we want to run the inference is a 6 core AMD CPU with 8 GB RAM from 2012. It runs in a manufacturing / shop floor environment. They do not have the newest hardware. But maybe it would be better to use a more powerful machine, like an Nvidia Jetson device, which supports CUDA.

Besides that, I realized the optimization takes lots of RAM. Watching the processes with htop showed a memory consmption of up to 12GB for the python process running OLive. But the machine only has 8GB RAM, so Ubuntu started to use swap memory from the hard disk, which is very slow. Is that intended or is 8GB RAM too less for OLive?

leqiao-1 · 2022-08-17T06:19:58Z

Hi @kbraun-axio,
Are you using onnxruntime gpu package with --providers_list cpu ? I can reproduce memory consmption issue in this way.

If so, it's maybe because when checking model input info with ort infenrece sessions, OLive will try to create session with cuda. I think it's a bug in OLive, and we will fix it. As a workaround, you can uninstall onnxruntime gpu package and install the cpu version.

If not, please let me know your onnxruntime package version with pip list. I will check if I can run into the same issue.

kbraun-axio · 2022-08-25T15:07:41Z

Hi @leqiao-1,
Yes I was running the gpu packge with --providers_list cpu. My colleque uninstalled the package and installed the default (CPU) package. Now the memory consumption is in the normal range and not too high. Thanks for your hint.

But the other issue, the KeyError: 'throughput', persists even with the cpu package and even if we set the max_latency_ms to higher values. Maybe it fails because the system is too old, it is from 2012.

leqiao-1 · 2022-08-26T10:05:35Z

Hi @kbraun-axio
It might be possible, since the inference latency is very high on your side.

PasaOpasen · 2022-12-26T21:06:04Z

I have same issue with log:

2022-12-26 23:59:42,091 - olive.optimization_config - INFO - Checking the model file...
2022-12-26 23:59:42,547 - olive.optimization_config - INFO - Providers will be tested for optimization: ['CPUExecutionProvider', 'DnnlExecutionProvider']
2022-12-26 23:59:52,402 - olive.optimization.tuning_process - ERROR - Optimization failed for tuning combo (None, None, None, 'CPUExecutionProvider', <ExecutionMode.ORT_SEQUENTIAL: 0>, 99)
2022-12-26 23:59:56,936 - olive.optimization.tuning_process - ERROR - Optimization failed for tuning combo (None, None, None, 'DnnlExecutionProvider', <ExecutionMode.ORT_SEQUENTIAL: 0>, 99)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
j:\aprbot\tmp\Optimize_ONNX_Models_Throughput_with_OLive.ipynb Cell 9 in <cell line: 27>()
      [1](vscode-notebook-cell:/j%3A/tmp/Optimize_ONNX_Models_Throughput_with_OLive.ipynb#X12sZmlsZQ%3D%3D?line=0) opt_config = OptimizationConfig(
      [2](vscode-notebook-cell:/j%3A/tmp/Optimize_ONNX_Models_Throughput_with_OLive.ipynb#X12sZmlsZQ%3D%3D?line=1) 
      [3](vscode-notebook-cell:/j%3A/tmp/Optimize_ONNX_Models_Throughput_with_OLive.ipynb#X12sZmlsZQ%3D%3D?line=2)     model_path = "./craft.onnx",
   (...)
     [24](vscode-notebook-cell:/j%3A/tmp/Optimize_ONNX_Models_Throughput_with_OLive.ipynb#X12sZmlsZQ%3D%3D?line=23)     test_num = 200
     [25](vscode-notebook-cell:/j%3A/tmp/Optimize_ONNX_Models_Throughput_with_OLive.ipynb#X12sZmlsZQ%3D%3D?line=24) )
---> [27](vscode-notebook-cell:/j%3A/tmp/Optimize_ONNX_Models_Throughput_with_OLive.ipynb#X12sZmlsZQ%3D%3D?line=26) result = optimize(opt_config)

File c:\Users\qtckp\anaconda3\envs\lib\site-packages\olive\optimize.py:36, in optimize(optimization_config)
     32     quantization_optimize(optimization_config)
     34 tuning_results = tune_onnx_model(optimization_config)
---> 36 olive_result = parse_tuning_result(optimization_config, *tuning_results, pretuning_inference_result)
     38 result_json_path = os.path.join(optimization_config.result_path, "olive_result.json")
     40 with open(result_json_path, 'w') as f:

File c:\Users\qtckp\anaconda3\envs\lib\site-packages\olive\optimize.py:59, in parse_tuning_result(optimization_config, *tuning_results)
     57 def parse_tuning_result(optimization_config, *tuning_results):
     58     if optimization_config.throughput_tuning_enabled:
---> 59         best_test_name = max(tuning_results, key=lambda x: x["throughput"])["test_name"]
     60     else:
     61         best_test_name = min(tuning_results, key=lambda x: x["latency_ms"]["avg"])["test_name"]

File c:\Users\qtckp\anaconda3\envs\lib\site-packages\olive\optimize.py:59, in parse_tuning_result.<locals>.<lambda>(x)
     57 def parse_tuning_result(optimization_config, *tuning_results):
     58     if optimization_config.throughput_tuning_enabled:
---> 59         best_test_name = max(tuning_results, key=lambda x: x["throughput"])["test_name"]
     60     else:
     61         best_test_name = min(tuning_results, key=lambda x: x["latency_ms"]["avg"])["test_name"]

KeyError: 'throughput'

Running by:

opt_config = OptimizationConfig(

    model_path = "./model.onnx",
    sample_input_data_path="./input.npz",
    result_path = "olive_opt_latency_result",

    throughput_tuning_enabled=True,
    openmp_enabled=False,
    max_latency_percentile = 0.95,
    max_latency_ms = 1000000,
    threads_num = 1,
    min_duration_sec=10000,

    providers_list = ["cpu", "dnnl"],
    inter_thread_num_list = [1],
    intra_thread_num_list=[1],
    execution_mode_list = ["sequential"],
    ort_opt_level_list=['all'],

    concurrency_num=4,

    warmup_num = 20,
    test_num = 200
)

result = optimize(opt_config)

Model is huge and inference is over 15secs but what do I wrong ? What means None in tuning combo (None, None, None, 'CPUExecutionProvider', <ExecutionMode.ORT_SEQUENTIAL: 0>, 99) ? What over params should I set?

input_spec output_names are so necessary? What shape I should write in input spec if the model has dynamic input like [batches, 3, height, width] ?

leqiao-1 · 2022-12-27T03:11:44Z

Hi @PasaOpasen,
Q: What means None in tuning combo (None, None, None, 'CPUExecutionProvider', <ExecutionMode.ORT_SEQUENTIAL: 0>, 99)
A: It means there is no validate inference process within the max_latency_ms. It might because the inference latency is too long, or the input data are not validate. You can try to increase max_latency_ms, or share the model so that I can have a check.

Q: input_spec output_names
A: If you provide the sample_input_data_path, or there are not dynamic input shapes, these two arguments are not necessary. If you have inputs with dynamic shapes, like [batches, 3, height, width], you need to provide input spec. batches, height, width should be set to int with possible numbers in real inference senario.

PasaOpasen · 2022-12-27T09:51:37Z

@leqiao-1 Thank u for fast response!

Can u please try to do anything with this model https://github.com/PasaOpasen/_olive_craft ?

I tried several configurations but nothing changed. Its inference is about 15sec with 2 cores and the optimization works too long with huge test_num or warmup_num and gives almost no output

Also the optimization uses 6-8 cores with concurrency_num=1 and all my 12 cores with concurrency_num=2 and all my 16GB memory with concurrency_num>2

leqiao-1 · 2023-03-23T06:46:42Z

If you have any further concerns or questions, please reopen this issue.

leqiao-1 closed this as completed Mar 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crashes in the middle of the optimization process (KeyError: 'throughput') #90

Crashes in the middle of the optimization process (KeyError: 'throughput') #90

shonigs commented May 19, 2022 •

edited

Loading

leqiao-1 commented May 24, 2022

kbraun-axio commented Aug 3, 2022 •

edited

Loading

leqiao-1 commented Aug 4, 2022

kbraun-axio commented Aug 4, 2022

kbraun-axio commented Aug 10, 2022 •

edited

Loading

leqiao-1 commented Aug 11, 2022

kbraun-axio commented Aug 11, 2022

leqiao-1 commented Aug 17, 2022

kbraun-axio commented Aug 25, 2022

leqiao-1 commented Aug 26, 2022

PasaOpasen commented Dec 26, 2022 •

edited

Loading

leqiao-1 commented Dec 27, 2022

PasaOpasen commented Dec 27, 2022

leqiao-1 commented Mar 23, 2023

Crashes in the middle of the optimization process (KeyError: 'throughput') #90

Crashes in the middle of the optimization process (KeyError: 'throughput') #90

Comments

shonigs commented May 19, 2022 • edited Loading

Hi, The program crashes while optimizing -

leqiao-1 commented May 24, 2022

kbraun-axio commented Aug 3, 2022 • edited Loading

leqiao-1 commented Aug 4, 2022

kbraun-axio commented Aug 4, 2022

kbraun-axio commented Aug 10, 2022 • edited Loading

leqiao-1 commented Aug 11, 2022

kbraun-axio commented Aug 11, 2022

leqiao-1 commented Aug 17, 2022

kbraun-axio commented Aug 25, 2022

leqiao-1 commented Aug 26, 2022

PasaOpasen commented Dec 26, 2022 • edited Loading

leqiao-1 commented Dec 27, 2022

PasaOpasen commented Dec 27, 2022

leqiao-1 commented Mar 23, 2023

shonigs commented May 19, 2022 •

edited

Loading

Hi,
The program crashes while optimizing -

kbraun-axio commented Aug 3, 2022 •

edited

Loading

kbraun-axio commented Aug 10, 2022 •

edited

Loading

PasaOpasen commented Dec 26, 2022 •

edited

Loading