Skip to content

Seems not support running mlc inferenece on multiple GPUs? #2157

@Bob123Yang

Description

@Bob123Yang

Hi @arjunsuresh,

it's successful to build the docker container by running the be below command but still report the errors.

mlcr run-mlperf,inference,_find-performance,_full,_r5.0-dev
--model=retinanet
--implementation=nvidia
--framework=tensorrt
--category=edge
--scenario=Offline
--execution_mode=test
--device=cuda
--docker --quiet
--test_query_count=500

I'm not sure whether the below content is the original beginning of all the failures, if yes, how can I run this inference benchmark on the multiple GPUs such as NVIDIA A6000*2? Thanks.

[W] [TRT] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.

***************************************************************************
CM script::benchmark-program/run.sh

Run Directory: /home/mlcuser/MLC/repos/local/cache/get-git-repo_918692f6/repo/closed/NVIDIA

CMD: make run_harness RUN_ARGS=' --benchmarks=retinanet --scenarios=offline  --test_mode=PerformanceOnly  --offline_expected_qps=1 --user_conf_path=/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/f29caee5e0e047f3811f7c0ce5f822ff.conf --mlperf_conf_path=/home/mlcuser/MLC/repos/local/cache/get-git-repo_da4c73f6/inference/mlperf.conf --gpu_batch_size=4 --use_deque_limit --no_audit_verify  ' 2>&1 | tee '/home/mlcuser/MLC/repos/local/cache/get-mlperf-inference-results-dir_c369e3b3/test_results/2e6ba58d1633-nvidia_original-gpu-tensorrt-vdefault-default_config/retinanet/offline/performance/run_1/console.out'; echo \${PIPESTATUS[0]} > exitstatus

[2025-03-17 09:54:45,591 module.py:5098 DEBUG] -     - Running native script "/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/benchmark-program/run-ubuntu.sh" from temporal script "tmp-run.sh" in "/home/mlcuser" ...
[2025-03-17 09:54:45,591 module.py:5105 INFO] -          ! cd /home/mlcuser
[2025-03-17 09:54:45,591 module.py:5106 INFO] -          ! call /home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/benchmark-program/run-ubuntu.sh from tmp-run.sh

make run_harness RUN_ARGS=' --benchmarks=retinanet --scenarios=offline  --test_mode=PerformanceOnly  --offline_expected_qps=1 --user_conf_path=/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/f29caee5e0e047f3811f7c0ce5f822ff.conf --mlperf_conf_path=/home/mlcuser/MLC/repos/local/cache/get-git-repo_da4c73f6/inference/mlperf.conf --gpu_batch_size=4 --use_deque_limit --no_audit_verify  ' 2>&1 | tee '/home/mlcuser/MLC/repos/local/cache/get-mlperf-inference-results-dir_c369e3b3/test_results/2e6ba58d1633-nvidia_original-gpu-tensorrt-vdefault-default_config/retinanet/offline/performance/run_1/console.out'; echo ${PIPESTATUS[0]} > exitstatus
[2025-03-17 09:54:50,687 main.py:229 INFO] Detected system ID: KnownSystem.Nvidia_2e6ba58d1633
[2025-03-17 09:54:50,840 harness.py:249 INFO] The harness will load 2 plugins: ['build/plugins/NMSOptPlugin/libnmsoptplugin.so', 'build/plugins/retinanetConcatPlugin/libretinanetconcatplugin.so']
[2025-03-17 09:54:50,840 generate_conf_files.py:107 INFO] Generated measurements/ entries for Nvidia_2e6ba58d1633_TRT/retinanet/Offline
[2025-03-17 09:54:50,840 __init__.py:46 INFO] Running command: ./build/bin/harness_default --plugins="build/plugins/NMSOptPlugin/libnmsoptplugin.so,build/plugins/retinanetConcatPlugin/libretinanetconcatplugin.so" --logfile_outdir="/home/mlcuser/MLC/repos/local/cache/get-mlperf-inference-results-dir_c369e3b3/test_results/2e6ba58d1633-nvidia_original-gpu-tensorrt-vdefault-default_config/retinanet/offline/performance/run_1" --logfile_prefix="mlperf_log_" --performance_sample_count=64 --test_mode="PerformanceOnly" --use_deque_limit=true --gpu_batch_size=4 --map_path="data_maps/open-images-v6-mlperf/val_map.txt" --mlperf_conf_path="/home/mlcuser/MLC/repos/local/cache/get-git-repo_da4c73f6/inference/mlperf.conf" --tensor_path="build/preprocessed_data/open-images-v6-mlperf/validation/Retinanet/int8_linear" --use_graphs=false --user_conf_path="/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/f29caee5e0e047f3811f7c0ce5f822ff.conf" --gpu_engines="./build/engines/Nvidia_2e6ba58d1633/retinanet/Offline/retinanet-Offline-gpu-b4-int8.lwis_k_99_MaxP.plan" --max_dlas=0 --scenario Offline --model retinanet --response_postprocess openimageeffnms
[2025-03-17 09:54:50,840 __init__.py:53 INFO] Overriding Environment
benchmark : Benchmark.Retinanet
buffer_manager_thread_count : 0
data_dir : /home/mlcuser/MLC/repos/local/cache/get-mlperf-inference-nvidia-scratch-space_17c7d3bd/data
gpu_batch_size : 4
input_dtype : int8
input_format : linear
log_dir : /home/mlcuser/MLC/repos/local/cache/get-git-repo_918692f6/repo/closed/NVIDIA/build/logs/2025.03.17-09.54.48
map_path : data_maps/open-images-v6-mlperf/val_map.txt
mlperf_conf_path : /home/mlcuser/MLC/repos/local/cache/get-git-repo_da4c73f6/inference/mlperf.conf
offline_expected_qps : 1.0
precision : int8
preprocessed_data_dir : /home/mlcuser/MLC/repos/local/cache/get-mlperf-inference-nvidia-scratch-space_17c7d3bd/preprocessed_data
scenario : Scenario.Offline
system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='Intel(R) Xeon(R) Platinum 8480+', architecture=<CPUArchitecture.x86_64: AliasedName(name='x86_64', aliases=(), patterns=())>, core_count=56, threads_per_core=2): 2}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=1.056300396, byte_suffix=<ByteSuffix.TB: (1000, 4)>, _num_bytes=1056300396000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='NVIDIA RTX A6000', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=47.98828125, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=51527024640), max_power_limit=300.0, pci_id='0x223010DE', compute_sm=86): 2})), numa_conf=None, system_id='Nvidia_2e6ba58d1633')
tensor_path : build/preprocessed_data/open-images-v6-mlperf/validation/Retinanet/int8_linear
test_mode : PerformanceOnly
use_deque_limit : True
use_graphs : False
user_conf_path : /home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/f29caee5e0e047f3811f7c0ce5f822ff.conf
system_id : Nvidia_2e6ba58d1633
config_name : Nvidia_2e6ba58d1633_retinanet_Offline
workload_setting : WorkloadSetting(HarnessType.LWIS, AccuracyTarget.k_99, PowerSetting.MaxP)
optimization_level : plugin-enabled
num_profiles : 1
config_ver : lwis_k_99_MaxP
accuracy_level : 99%
inference_server : lwis
skip_file_checks : False
power_limit : None
cpu_freq : None
&&&& RUNNING Default_Harness # ./build/bin/harness_default
[I] mlperf.conf path: /home/mlcuser/MLC/repos/local/cache/get-git-repo_da4c73f6/inference/mlperf.conf
[I] user.conf path: /home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/f29caee5e0e047f3811f7c0ce5f822ff.conf
Creating QSL.
Finished Creating QSL.
Setting up SUT.
[I] [TRT] Loaded engine size: 74 MiB
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 134, GPU 1085 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 136, GPU 1095 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +68, now: CPU 0, GPU 68 (MiB)
[I] Device:0.GPU: [0] ./build/engines/Nvidia_2e6ba58d1633/retinanet/Offline/retinanet-Offline-gpu-b4-int8.lwis_k_99_MaxP.plan has been successfully loaded.
[I] [TRT] Loaded engine size: 74 MiB
[W] [TRT] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 175, GPU 374 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 176, GPU 384 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +69, now: CPU 0, GPU 137 (MiB)
[I] Device:1.GPU: [0] ./build/engines/Nvidia_2e6ba58d1633/retinanet/Offline/retinanet-Offline-gpu-b4-int8.lwis_k_99_MaxP.plan has been successfully loaded.
[E] [TRT] 3: [runtime.cpp::~Runtime::401] Error Code 3: API Usage Error (Parameter check failed at: runtime/rt/runtime.cpp::~Runtime::401, condition: mEngineCounter.use_count() == 1 Destroying a runtime before destroying deserialized engines created by the runtime leads to undefined behavior.)
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 102, GPU 1097 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 102, GPU 1105 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +3056, now: CPU 1, GPU 3193 (MiB)
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 103, GPU 4169 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 103, GPU 4179 (MiB)
[I] [TRT] Could not set default profile 0 for execution context. Profile index must be set explicitly.
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +3055, now: CPU 1, GPU 6248 (MiB)
[E] [TRT] 3: [executionContext.cpp::setOptimizationProfileInternal::1328] Error Code 3: Internal Error (Profile 0 has been chosen by another IExecutionContext. Use another profileIndex or destroy the IExecutionContext that use this profile.)
F0317 09:54:52.402531  8377 lwis.cpp:245] Check failed: context->setOptimizationProfile(profileIdx) == true (0 vs. 1) 
*** Check failure stack trace: ***
    @     0x79b8bdfa81c3  google::LogMessage::Fail()
    @     0x79b8bdfad25b  google::LogMessage::SendToLog()
    @     0x79b8bdfa7ebf  google::LogMessage::Flush()
    @     0x79b8bdfa86ef  google::LogMessageFatal::~LogMessageFatal()
    @     0x5619743e2b1c  lwis::Device::Setup()
    @     0x5619743e4ceb  lwis::Server::Setup()
    @     0x5619743409d0  doInference()
    @     0x56197433e190  main
    @     0x79b8abb74083  __libc_start_main
    @     0x56197433e71e  _start
Aborted (core dumped)
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/mlcuser/MLC/repos/local/cache/get-git-repo_918692f6/repo/closed/NVIDIA/code/main.py", line 231, in <module>
    main(main_args, DETECTED_SYSTEM)
  File "/home/mlcuser/MLC/repos/local/cache/get-git-repo_918692f6/repo/closed/NVIDIA/code/main.py", line 144, in main
    dispatch_action(main_args, config_dict, workload_setting)
  File "/home/mlcuser/MLC/repos/local/cache/get-git-repo_918692f6/repo/closed/NVIDIA/code/main.py", line 202, in dispatch_action
    handler.run()
  File "/home/mlcuser/MLC/repos/local/cache/get-git-repo_918692f6/repo/closed/NVIDIA/code/actionhandler/base.py", line 82, in run
    self.handle_failure()
  File "/home/mlcuser/MLC/repos/local/cache/get-git-repo_918692f6/repo/closed/NVIDIA/code/actionhandler/run_harness.py", line 193, in handle_failure
    raise RuntimeError("Run harness failed!")
RuntimeError: Run harness failed!
Traceback (most recent call last):
  File "/home/mlcuser/MLC/repos/local/cache/get-git-repo_918692f6/repo/closed/NVIDIA/code/actionhandler/run_harness.py", line 161, in handle
    result_data = self.harness.run_harness(flag_dict=self.harness_flag_dict, skip_generate_measurements=True)
  File "/home/mlcuser/MLC/repos/local/cache/get-git-repo_918692f6/repo/closed/NVIDIA/code/common/harness.py", line 352, in run_harness
    output = run_command(self._construct_terminal_command(argstr), get_output=True, custom_env=self.env_vars)
  File "/home/mlcuser/MLC/repos/local/cache/get-git-repo_918692f6/repo/closed/NVIDIA/code/common/__init__.py", line 67, in run_command
    raise subprocess.CalledProcessError(ret, cmd)
subprocess.CalledProcessError: Command './build/bin/harness_default --plugins="build/plugins/NMSOptPlugin/libnmsoptplugin.so,build/plugins/retinanetConcatPlugin/libretinanetconcatplugin.so" --logfile_outdir="/home/mlcuser/MLC/repos/local/cache/get-mlperf-inference-results-dir_c369e3b3/test_results/2e6ba58d1633-nvidia_original-gpu-tensorrt-vdefault-default_config/retinanet/offline/performance/run_1" --logfile_prefix="mlperf_log_" --performance_sample_count=64 --test_mode="PerformanceOnly" --use_deque_limit=true --gpu_batch_size=4 --map_path="data_maps/open-images-v6-mlperf/val_map.txt" --mlperf_conf_path="/home/mlcuser/MLC/repos/local/cache/get-git-repo_da4c73f6/inference/mlperf.conf" --tensor_path="build/preprocessed_data/open-images-v6-mlperf/validation/Retinanet/int8_linear" --use_graphs=false --user_conf_path="/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/f29caee5e0e047f3811f7c0ce5f822ff.conf" --gpu_engines="./build/engines/Nvidia_2e6ba58d1633/retinanet/Offline/retinanet-Offline-gpu-b4-int8.lwis_k_99_MaxP.plan" --max_dlas=0 --scenario Offline --model retinanet --response_postprocess openimageeffnms' returned non-zero exit status 134.
make: *** [Makefile:45: run_harness] Error 1
Traceback (most recent call last):
  File "/home/mlcuser/.local/bin/mlcr", line 8, in <module>
    sys.exit(mlcr())
  File "/home/mlcuser/.local/lib/python3.8/site-packages/mlc/main.py", line 86, in mlcr
    main()
  File "/home/mlcuser/.local/lib/python3.8/site-packages/mlc/main.py", line 173, in main
    res = method(run_args)
  File "/home/mlcuser/.local/lib/python3.8/site-packages/mlc/script_action.py", line 141, in run
    return self.call_script_module_function("run", run_args)
  File "/home/mlcuser/.local/lib/python3.8/site-packages/mlc/script_action.py", line 121, in call_script_module_function
    result = automation_instance.run(run_args)  # Pass args to the run method
  File "/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 225, in run
    r = self._run(i)
  File "/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 1850, in _run
    r = self._call_run_deps(prehook_deps, self.local_env_keys, local_env_keys_from_meta, env, state, const, const_state, add_deps_recursive,
  File "/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 3300, in _call_run_deps
    r = script._run_deps(deps, local_env_keys, env, state, const, const_state, add_deps_recursive, recursion_spaces,
  File "/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 3470, in _run_deps
    r = self.action_object.access(ii)
  File "/home/mlcuser/.local/lib/python3.8/site-packages/mlc/action.py", line 56, in access
    result = method(options)
  File "/home/mlcuser/.local/lib/python3.8/site-packages/mlc/script_action.py", line 141, in run
    return self.call_script_module_function("run", run_args)
  File "/home/mlcuser/.local/lib/python3.8/site-packages/mlc/script_action.py", line 121, in call_script_module_function
    result = automation_instance.run(run_args)  # Pass args to the run method
  File "/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 225, in run
    r = self._run(i)
  File "/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 1893, in _run
    r = self._run_deps(post_deps, clean_env_keys_post_deps, env, state, const, const_state, add_deps_recursive, recursion_spaces,
  File "/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 3470, in _run_deps
    r = self.action_object.access(ii)
  File "/home/mlcuser/.local/lib/python3.8/site-packages/mlc/action.py", line 56, in access
    result = method(options)
  File "/home/mlcuser/.local/lib/python3.8/site-packages/mlc/script_action.py", line 141, in run
    return self.call_script_module_function("run", run_args)
  File "/home/mlcuser/.local/lib/python3.8/site-packages/mlc/script_action.py", line 121, in call_script_module_function
    result = automation_instance.run(run_args)  # Pass args to the run method
  File "/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 225, in run
    r = self._run(i)
  File "/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 1893, in _run
    r = self._run_deps(post_deps, clean_env_keys_post_deps, env, state, const, const_state, add_deps_recursive, recursion_spaces,
  File "/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 3470, in _run_deps
    r = self.action_object.access(ii)
  File "/home/mlcuser/.local/lib/python3.8/site-packages/mlc/action.py", line 56, in access
    result = method(options)
  File "/home/mlcuser/.local/lib/python3.8/site-packages/mlc/script_action.py", line 141, in run
    return self.call_script_module_function("run", run_args)
  File "/home/mlcuser/.local/lib/python3.8/site-packages/mlc/script_action.py", line 131, in call_script_module_function
    raise ScriptExecutionError(f"Script {function_name} execution failed. Error : {error}")
mlc.script_action.ScriptExecutionError: Script run execution failed. Error : MLC script failed (name = benchmark-program, return code = 512)


^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Please file an issue at https://github.com/mlcommons/mlperf-automations/issues along with the full MLC command being run and the relevant
or full console log.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions