Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashes in the middle of the optimization process (KeyError: 'throughput') #90

Closed
shonigs opened this issue May 19, 2022 · 14 comments
Closed

Comments

@shonigs
Copy link

shonigs commented May 19, 2022

Hi,
The program crashes while optimizing -

Steps to reproduce
installation

wget https://olivewheels.blob.core.windows.net/repo/onnxruntime_olive-0.4.0-py3-none-any.whl
pip install onnxruntime_olive-0.4.0-py3-none-any.whl
pip install --extra-index-url https://olivewheels.azureedge.net/test mlperf_loadgen
pip install --extra-index-url https://olivewheels.azureedge.net/test onnxruntime_gpu_tensorrt==1.11.0

Use

from olive.optimization_config import OptimizationConfig
from olive.optimize import optimize

opt_config = OptimizationConfig(
    model_path="models.onnx",
    result_path="opt_throughput_result",
    throughput_tuning_enabled=True,
    inputs_spec={
        "input": [
            -1,
            3,
            512,
            512,
        ]
    },
    max_latency_percentile=0.95,
    max_latency_ms=1000,
    threads_num=4,
    dynamic_batching_size=32,
    min_duration_sec=10,
)
if __name__ == "__main__":
    result = optimize(opt_config)

This runs for sometime, then crashes

2022-05-19 09:19:09,930 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:19:09,943 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:19:11,625 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:19:11,638 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:19:13,204 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:19:13,224 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:21:07,504 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:21:07,675 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:21:14,154 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:21:14,179 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:24:23,212 - olive.optimization.tuning_process - ERROR - Optimization failed for tuning combo (None, None, None, 'TensorrtExecutionProvider', <ExecutionMode.ORT_SEQUENTIAL: 0>, 99)
2022-05-19 09:24:28,503 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:24:28,809 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:24:34,735 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:24:34,761 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:27:43,921 - olive.optimization.tuning_process - ERROR - Optimization failed for tuning combo (None, None, None, 'TensorrtExecutionProvider', <ExecutionMode.ORT_PARALLEL: 1>, 99)
2022-05-19 09:27:49,552 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:27:49,774 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:27:55,796 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:27:55,822 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:29:40,752 - olive.optimization.tuning_process - ERROR - Optimization failed for tuning combo (None, None, None, 'CUDAExecutionProvider', <ExecutionMode.ORT_SEQUENTIAL: 0>, 99)
2022-05-19 09:29:47,356 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:29:47,603 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:29:52,975 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:29:53,001 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:31:38,742 - olive.optimization.tuning_process - ERROR - Optimization failed for tuning combo (None, None, None, 'CUDAExecutionProvider', <ExecutionMode.ORT_PARALLEL: 1>, 99)
2022-05-19 09:31:44,725 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:31:44,947 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:31:50,856 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:31:50,884 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:34:16,662 - olive.optimization.tuning_process - ERROR - Optimization failed for tuning combo (None, None, None, 'CPUExecutionProvider', <ExecutionMode.ORT_SEQUENTIAL: 0>, 99)
2022-05-19 09:34:22,604 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:34:22,820 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:34:28,909 - olive.optimization_config - INFO - Checking the model file...
2022-05-19 09:34:28,934 - olive.optimization_config - INFO - Providers will be tested for optimization: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
2022-05-19 09:36:22,542 - olive.optimization.tuning_process - ERROR - Optimization failed for tuning combo (None, None, None, 'CPUExecutionProvider', <ExecutionMode.ORT_PARALLEL: 1>, 99)
Traceback (most recent call last):
  File "/cnvrg/onnx_opt/onnx_optimization.py", line 23, in <module>
    result = optimize(opt_config)
  File "/usr/local/lib/python3.8/dist-packages/olive/optimize.py", line 36, in optimize
    olive_result = parse_tuning_result(optimization_config, *tuning_results, pretuning_inference_result)
  File "/usr/local/lib/python3.8/dist-packages/olive/optimize.py", line 59, in parse_tuning_result
    best_test_name = max(tuning_results, key=lambda x: x["throughput"])["test_name"]
  File "/usr/local/lib/python3.8/dist-packages/olive/optimize.py", line 59, in <lambda>
    best_test_name = max(tuning_results, key=lambda x: x["throughput"])["test_name"]
KeyError: 'throughput'

I am not sure about the exact issue but could this be maybe be in a try-except so the whole process doesn't fail?

P.S. is there any details about the environment that I should add?

@leqiao-1
Copy link
Contributor

Hi @shonigs , I tried with the model in notebook tutorials and no issues appeared. I am not sure if the issue is related to your ONNX model. Could you please share the model you used ? Thanks.

@kbraun-axio
Copy link

kbraun-axio commented Aug 3, 2022

Hi, I am getting the same error: KeyError: 'throughput'.

The complete error log is:

ERROR conda.cli.main_run:execute(41): `conda run olive optimize --model_path onnx-object-detection-model.onnx --throughput_tuning_enabled --max_latency_percentile 0.95 --max_latency_ms 100 --threads_num 1 --dynamic_batching_size 1 --min_duration_sec 10 --providers_list cpu` failed. (See above for error)
2022-08-03 12:54:12,827 - olive.__main__ - WARNING - OLive will call "olive setup" to setup environment first
2022-08-03 12:54:13,474 - olive.optimization_config - INFO - Checking the model file...
2022-08-03 12:54:14,821 - olive.optimization_config - INFO - Providers will be tested for optimization: ['CPUExecutionProvider']
2022-08-03 13:06:48,111 - olive.optimization.tuning_process - ERROR - Optimization failed for tuning combo (None, None, None, 'CPUExecutionProvider', <ExecutionMode.ORT_SEQUENTIAL: 0>, 99)
2022-08-03 13:44:02,303 - olive.optimization.tuning_process - ERROR - Optimization failed for tuning combo (None, None, None, 'CPUExecutionProvider', <ExecutionMode.ORT_PARALLEL: 1>, 99)
Traceback (most recent call last):
  File "/home/axio/miniconda3/envs/oonxoptimizer/bin/olive", line 8, in <module>
    sys.exit(main())
  File "/home/axio/miniconda3/envs/oonxoptimizer/lib/python3.7/site-packages/olive/__main__.py", line 438, in main
    options.func(options)
  File "/home/axio/miniconda3/envs/oonxoptimizer/lib/python3.7/site-packages/olive/__main__.py", line 322, in model_opt
    optimize(opt_config)
  File "/home/axio/miniconda3/envs/oonxoptimizer/lib/python3.7/site-packages/olive/optimize.py", line 36, in optimize
    olive_result = parse_tuning_result(optimization_config, *tuning_results, pretuning_inference_result)
  File "/home/axio/miniconda3/envs/oonxoptimizer/lib/python3.7/site-packages/olive/optimize.py", line 59, in parse_tuning_result
    best_test_name = max(tuning_results, key=lambda x: x["throughput"])["test_name"]
  File "/home/axio/miniconda3/envs/oonxoptimizer/lib/python3.7/site-packages/olive/optimize.py", line 59, in <lambda>
    best_test_name = max(tuning_results, key=lambda x: x["throughput"])["test_name"]
KeyError: 'throughput'

I am executing the optimization with conda run -n onnxoptimizer olive optimize --model_path onnx-object-detection-model.onnx --throughput_tuning_enabled --max_latency_percentile 0.95 --max_latency_ms 100 --threads_num 1 --dynamic_batching_size 1 --min_duration_sec 10 --providers_list cpu >& log.txt

The above error message is the contents of log.txt (see final part of the execution command above).

Please find my ONNX model here: https://get.hidrive.com/2qErePEy (Link valid until August 10, 2022)

@leqiao-1
Copy link
Contributor

leqiao-1 commented Aug 4, 2022

Hi @kbraun-axio ,
I think this error happended because the max_latency_ms is too small for cpu infernece.
You can augument max_latency_ms, or change the execution provider from cpu to cuda.
Here is test result on my local machine with command olive optimize --model_path onnx-object-detection-model.onnx --throughput_tuning_enabled --max_latency_percentile 0.95 --max_latency_ms 100 --threads_num 1 --dynamic_batching_size 1 --min_duration_sec 10 --providers_list cuda >& log_olive.txt

log_olive.txt

@kbraun-axio
Copy link

Hi @leqiao-1,
Thanks for your reply and the log output.
I will increase the max_latency_ms and try running the optimization again. I will post the results here.
Unfortunately, our inference machine does not have a Nvidia GPU (we only use a Nvidia GPU in our training server). Therefore, I cannot set the execution provider to CUDA.

@kbraun-axio
Copy link

kbraun-axio commented Aug 10, 2022

Hi @leqiao-1,

Today, I tried to run the optimization again. This time, I increased the max_latency_ms to 10,000. However, I got the same error.
I attached the output log and the olive_opt_results folder (without the optimized model because it is too large) for you.

Do you think max_latency_ms of 10,000 is still not enough?

Inference with the onnx runtime and the same onnx model that I am trying to optimize takes about 7.5 seconds.

log_olive.txt
olive_opt_result.zip

@leqiao-1
Copy link
Contributor

Hi @kbraun-axio
The latency depends on the machine. On my side, the inference takes about 400ms with cpu. If you want to try, you can still increase the max_latency_ms. However, enev if it works, it may take long time to run the troughput optimization, since the latency is too long.

@kbraun-axio
Copy link

Okay, thank you. The machine on which we want to run the inference is a 6 core AMD CPU with 8 GB RAM from 2012. It runs in a manufacturing / shop floor environment. They do not have the newest hardware. But maybe it would be better to use a more powerful machine, like an Nvidia Jetson device, which supports CUDA.

Besides that, I realized the optimization takes lots of RAM. Watching the processes with htop showed a memory consmption of up to 12GB for the python process running OLive. But the machine only has 8GB RAM, so Ubuntu started to use swap memory from the hard disk, which is very slow. Is that intended or is 8GB RAM too less for OLive?

@leqiao-1
Copy link
Contributor

Hi @kbraun-axio,
Are you using onnxruntime gpu package with --providers_list cpu ? I can reproduce memory consmption issue in this way.

If so, it's maybe because when checking model input info with ort infenrece sessions, OLive will try to create session with cuda. I think it's a bug in OLive, and we will fix it. As a workaround, you can uninstall onnxruntime gpu package and install the cpu version.

If not, please let me know your onnxruntime package version with pip list. I will check if I can run into the same issue.

@kbraun-axio
Copy link

Hi @leqiao-1,
Yes I was running the gpu packge with --providers_list cpu. My colleque uninstalled the package and installed the default (CPU) package. Now the memory consumption is in the normal range and not too high. Thanks for your hint.

But the other issue, the KeyError: 'throughput', persists even with the cpu package and even if we set the max_latency_ms to higher values. Maybe it fails because the system is too old, it is from 2012.

@leqiao-1
Copy link
Contributor

Hi @kbraun-axio
It might be possible, since the inference latency is very high on your side.

@PasaOpasen
Copy link

PasaOpasen commented Dec 26, 2022

I have same issue with log:

2022-12-26 23:59:42,091 - olive.optimization_config - INFO - Checking the model file...
2022-12-26 23:59:42,547 - olive.optimization_config - INFO - Providers will be tested for optimization: ['CPUExecutionProvider', 'DnnlExecutionProvider']
2022-12-26 23:59:52,402 - olive.optimization.tuning_process - ERROR - Optimization failed for tuning combo (None, None, None, 'CPUExecutionProvider', <ExecutionMode.ORT_SEQUENTIAL: 0>, 99)
2022-12-26 23:59:56,936 - olive.optimization.tuning_process - ERROR - Optimization failed for tuning combo (None, None, None, 'DnnlExecutionProvider', <ExecutionMode.ORT_SEQUENTIAL: 0>, 99)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
j:\aprbot\tmp\Optimize_ONNX_Models_Throughput_with_OLive.ipynb Cell 9 in <cell line: 27>()
      [1](vscode-notebook-cell:/j%3A/tmp/Optimize_ONNX_Models_Throughput_with_OLive.ipynb#X12sZmlsZQ%3D%3D?line=0) opt_config = OptimizationConfig(
      [2](vscode-notebook-cell:/j%3A/tmp/Optimize_ONNX_Models_Throughput_with_OLive.ipynb#X12sZmlsZQ%3D%3D?line=1) 
      [3](vscode-notebook-cell:/j%3A/tmp/Optimize_ONNX_Models_Throughput_with_OLive.ipynb#X12sZmlsZQ%3D%3D?line=2)     model_path = "./craft.onnx",
   (...)
     [24](vscode-notebook-cell:/j%3A/tmp/Optimize_ONNX_Models_Throughput_with_OLive.ipynb#X12sZmlsZQ%3D%3D?line=23)     test_num = 200
     [25](vscode-notebook-cell:/j%3A/tmp/Optimize_ONNX_Models_Throughput_with_OLive.ipynb#X12sZmlsZQ%3D%3D?line=24) )
---> [27](vscode-notebook-cell:/j%3A/tmp/Optimize_ONNX_Models_Throughput_with_OLive.ipynb#X12sZmlsZQ%3D%3D?line=26) result = optimize(opt_config)

File c:\Users\qtckp\anaconda3\envs\lib\site-packages\olive\optimize.py:36, in optimize(optimization_config)
     32     quantization_optimize(optimization_config)
     34 tuning_results = tune_onnx_model(optimization_config)
---> 36 olive_result = parse_tuning_result(optimization_config, *tuning_results, pretuning_inference_result)
     38 result_json_path = os.path.join(optimization_config.result_path, "olive_result.json")
     40 with open(result_json_path, 'w') as f:

File c:\Users\qtckp\anaconda3\envs\lib\site-packages\olive\optimize.py:59, in parse_tuning_result(optimization_config, *tuning_results)
     57 def parse_tuning_result(optimization_config, *tuning_results):
     58     if optimization_config.throughput_tuning_enabled:
---> 59         best_test_name = max(tuning_results, key=lambda x: x["throughput"])["test_name"]
     60     else:
     61         best_test_name = min(tuning_results, key=lambda x: x["latency_ms"]["avg"])["test_name"]

File c:\Users\qtckp\anaconda3\envs\lib\site-packages\olive\optimize.py:59, in parse_tuning_result.<locals>.<lambda>(x)
     57 def parse_tuning_result(optimization_config, *tuning_results):
     58     if optimization_config.throughput_tuning_enabled:
---> 59         best_test_name = max(tuning_results, key=lambda x: x["throughput"])["test_name"]
     60     else:
     61         best_test_name = min(tuning_results, key=lambda x: x["latency_ms"]["avg"])["test_name"]

KeyError: 'throughput'

Running by:

opt_config = OptimizationConfig(

    model_path = "./model.onnx",
    sample_input_data_path="./input.npz",
    result_path = "olive_opt_latency_result",

    throughput_tuning_enabled=True,
    openmp_enabled=False,
    max_latency_percentile = 0.95,
    max_latency_ms = 1000000,
    threads_num = 1,
    min_duration_sec=10000,

    providers_list = ["cpu", "dnnl"],
    inter_thread_num_list = [1],
    intra_thread_num_list=[1],
    execution_mode_list = ["sequential"],
    ort_opt_level_list=['all'],

    concurrency_num=4,

    warmup_num = 20,
    test_num = 200
)

result = optimize(opt_config)

Model is huge and inference is over 15secs but what do I wrong ? What means None in tuning combo (None, None, None, 'CPUExecutionProvider', <ExecutionMode.ORT_SEQUENTIAL: 0>, 99) ? What over params should I set?

input_spec output_names are so necessary? What shape I should write in input spec if the model has dynamic input like [batches, 3, height, width] ?

@leqiao-1
Copy link
Contributor

Hi @PasaOpasen,
Q: What means None in tuning combo (None, None, None, 'CPUExecutionProvider', <ExecutionMode.ORT_SEQUENTIAL: 0>, 99)
A: It means there is no validate inference process within the max_latency_ms. It might because the inference latency is too long, or the input data are not validate. You can try to increase max_latency_ms, or share the model so that I can have a check.

Q: input_spec output_names
A: If you provide the sample_input_data_path, or there are not dynamic input shapes, these two arguments are not necessary. If you have inputs with dynamic shapes, like [batches, 3, height, width], you need to provide input spec. batches, height, width should be set to int with possible numbers in real inference senario.

@PasaOpasen
Copy link

@leqiao-1 Thank u for fast response!

Can u please try to do anything with this model https://github.com/PasaOpasen/_olive_craft ?

I tried several configurations but nothing changed. Its inference is about 15sec with 2 cores and the optimization works too long with huge test_num or warmup_num and gives almost no output

Also the optimization uses 6-8 cores with concurrency_num=1 and all my 12 cores with concurrency_num=2 and all my 16GB memory with concurrency_num>2

@leqiao-1
Copy link
Contributor

If you have any further concerns or questions, please reopen this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants