Seems not support running mlc inferenece on multiple GPUs?

Hi @arjunsuresh,

it's successful to build the docker container by running the be below command but still report the errors.

mlcr run-mlperf,inference,_find-performance,_full,_r5.0-dev \
   --model=retinanet \
   --implementation=nvidia \
   --framework=tensorrt \
   --category=edge \
   --scenario=Offline \
   --execution_mode=test \
   --device=cuda  \
   --docker --quiet \
   --test_query_count=500

I'm not sure whether the below content is the original beginning of all the failures, if yes, how can I run this inference benchmark on the multiple GPUs such as NVIDIA A6000*2? Thanks.

**_[W] [TRT] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors._**


<pre>***************************************************************************
CM script::benchmark-program/run.sh

Run Directory: /home/mlcuser/MLC/repos/local/cache/get-git-repo_918692f6/repo/closed/NVIDIA

CMD: make run_harness RUN_ARGS=&apos; --benchmarks=retinanet --scenarios=offline  --test_mode=PerformanceOnly  --offline_expected_qps=1 --user_conf_path=/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/f29caee5e0e047f3811f7c0ce5f822ff.conf --mlperf_conf_path=/home/mlcuser/MLC/repos/local/cache/get-git-repo_da4c73f6/inference/mlperf.conf --gpu_batch_size=4 --use_deque_limit --no_audit_verify  &apos; 2&gt;&amp;1 | tee &apos;/home/mlcuser/MLC/repos/local/cache/get-mlperf-inference-results-dir_c369e3b3/test_results/2e6ba58d1633-nvidia_original-gpu-tensorrt-vdefault-default_config/retinanet/offline/performance/run_1/console.out&apos;; echo \${PIPESTATUS[0]} &gt; exitstatus

[2025-03-17 09:54:45,591 module.py:5098 DEBUG] -     - Running native script &quot;/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/benchmark-program/run-ubuntu.sh&quot; from temporal script &quot;tmp-run.sh&quot; in &quot;/home/mlcuser&quot; ...
[2025-03-17 09:54:45,591 module.py:5105 <font color="#26A269">INFO</font>] -          ! cd /home/mlcuser
[2025-03-17 09:54:45,591 module.py:5106 <font color="#26A269">INFO</font>] -          ! call /home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/benchmark-program/run-ubuntu.sh from tmp-run.sh

make run_harness RUN_ARGS=&apos; --benchmarks=retinanet --scenarios=offline  --test_mode=PerformanceOnly  --offline_expected_qps=1 --user_conf_path=/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/f29caee5e0e047f3811f7c0ce5f822ff.conf --mlperf_conf_path=/home/mlcuser/MLC/repos/local/cache/get-git-repo_da4c73f6/inference/mlperf.conf --gpu_batch_size=4 --use_deque_limit --no_audit_verify  &apos; 2&gt;&amp;1 | tee &apos;/home/mlcuser/MLC/repos/local/cache/get-mlperf-inference-results-dir_c369e3b3/test_results/2e6ba58d1633-nvidia_original-gpu-tensorrt-vdefault-default_config/retinanet/offline/performance/run_1/console.out&apos;; echo ${PIPESTATUS[0]} &gt; exitstatus
[2025-03-17 09:54:50,687 main.py:229 INFO] Detected system ID: KnownSystem.Nvidia_2e6ba58d1633
[2025-03-17 09:54:50,840 harness.py:249 INFO] The harness will load 2 plugins: [&apos;build/plugins/NMSOptPlugin/libnmsoptplugin.so&apos;, &apos;build/plugins/retinanetConcatPlugin/libretinanetconcatplugin.so&apos;]
[2025-03-17 09:54:50,840 generate_conf_files.py:107 INFO] Generated measurements/ entries for Nvidia_2e6ba58d1633_TRT/retinanet/Offline
[2025-03-17 09:54:50,840 __init__.py:46 INFO] Running command: ./build/bin/harness_default --plugins=&quot;build/plugins/NMSOptPlugin/libnmsoptplugin.so,build/plugins/retinanetConcatPlugin/libretinanetconcatplugin.so&quot; --logfile_outdir=&quot;/home/mlcuser/MLC/repos/local/cache/get-mlperf-inference-results-dir_c369e3b3/test_results/2e6ba58d1633-nvidia_original-gpu-tensorrt-vdefault-default_config/retinanet/offline/performance/run_1&quot; --logfile_prefix=&quot;mlperf_log_&quot; --performance_sample_count=64 --test_mode=&quot;PerformanceOnly&quot; --use_deque_limit=true --gpu_batch_size=4 --map_path=&quot;data_maps/open-images-v6-mlperf/val_map.txt&quot; --mlperf_conf_path=&quot;/home/mlcuser/MLC/repos/local/cache/get-git-repo_da4c73f6/inference/mlperf.conf&quot; --tensor_path=&quot;build/preprocessed_data/open-images-v6-mlperf/validation/Retinanet/int8_linear&quot; --use_graphs=false --user_conf_path=&quot;/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/f29caee5e0e047f3811f7c0ce5f822ff.conf&quot; --gpu_engines=&quot;./build/engines/Nvidia_2e6ba58d1633/retinanet/Offline/retinanet-Offline-gpu-b4-int8.lwis_k_99_MaxP.plan&quot; --max_dlas=0 --scenario Offline --model retinanet --response_postprocess openimageeffnms
[2025-03-17 09:54:50,840 __init__.py:53 INFO] Overriding Environment
benchmark : Benchmark.Retinanet
buffer_manager_thread_count : 0
data_dir : /home/mlcuser/MLC/repos/local/cache/get-mlperf-inference-nvidia-scratch-space_17c7d3bd/data
gpu_batch_size : 4
input_dtype : int8
input_format : linear
log_dir : /home/mlcuser/MLC/repos/local/cache/get-git-repo_918692f6/repo/closed/NVIDIA/build/logs/2025.03.17-09.54.48
map_path : data_maps/open-images-v6-mlperf/val_map.txt
mlperf_conf_path : /home/mlcuser/MLC/repos/local/cache/get-git-repo_da4c73f6/inference/mlperf.conf
offline_expected_qps : 1.0
precision : int8
preprocessed_data_dir : /home/mlcuser/MLC/repos/local/cache/get-mlperf-inference-nvidia-scratch-space_17c7d3bd/preprocessed_data
scenario : Scenario.Offline
system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name=&apos;Intel(R) Xeon(R) Platinum 8480+&apos;, architecture=&lt;CPUArchitecture.x86_64: AliasedName(name=&apos;x86_64&apos;, aliases=(), patterns=())&gt;, core_count=56, threads_per_core=2): 2}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=1.056300396, byte_suffix=&lt;ByteSuffix.TB: (1000, 4)&gt;, _num_bytes=1056300396000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(&lt;class &apos;int&apos;&gt;, {GPU(name=&apos;NVIDIA RTX A6000&apos;, accelerator_type=&lt;AcceleratorType.Discrete: AliasedName(name=&apos;Discrete&apos;, aliases=(), patterns=())&gt;, vram=Memory(quantity=47.98828125, byte_suffix=&lt;ByteSuffix.GiB: (1024, 3)&gt;, _num_bytes=51527024640), max_power_limit=300.0, pci_id=&apos;0x223010DE&apos;, compute_sm=86): 2})), numa_conf=None, system_id=&apos;Nvidia_2e6ba58d1633&apos;)
tensor_path : build/preprocessed_data/open-images-v6-mlperf/validation/Retinanet/int8_linear
test_mode : PerformanceOnly
use_deque_limit : True
use_graphs : False
user_conf_path : /home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/f29caee5e0e047f3811f7c0ce5f822ff.conf
system_id : Nvidia_2e6ba58d1633
config_name : Nvidia_2e6ba58d1633_retinanet_Offline
workload_setting : WorkloadSetting(HarnessType.LWIS, AccuracyTarget.k_99, PowerSetting.MaxP)
optimization_level : plugin-enabled
num_profiles : 1
config_ver : lwis_k_99_MaxP
accuracy_level : 99%
inference_server : lwis
skip_file_checks : False
power_limit : None
cpu_freq : None
&amp;&amp;&amp;&amp; RUNNING Default_Harness # ./build/bin/harness_default
[I] mlperf.conf path: /home/mlcuser/MLC/repos/local/cache/get-git-repo_da4c73f6/inference/mlperf.conf
[I] user.conf path: /home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/f29caee5e0e047f3811f7c0ce5f822ff.conf
Creating QSL.
Finished Creating QSL.
Setting up SUT.
[I] [TRT] Loaded engine size: 74 MiB
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 134, GPU 1085 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 136, GPU 1095 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +68, now: CPU 0, GPU 68 (MiB)
[I] Device:0.GPU: [0] ./build/engines/Nvidia_2e6ba58d1633/retinanet/Offline/retinanet-Offline-gpu-b4-int8.lwis_k_99_MaxP.plan has been successfully loaded.
[I] [TRT] Loaded engine size: 74 MiB
[W] [TRT] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 175, GPU 374 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 176, GPU 384 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +69, now: CPU 0, GPU 137 (MiB)
[I] Device:1.GPU: [0] ./build/engines/Nvidia_2e6ba58d1633/retinanet/Offline/retinanet-Offline-gpu-b4-int8.lwis_k_99_MaxP.plan has been successfully loaded.
[E] [TRT] 3: [runtime.cpp::~Runtime::401] Error Code 3: API Usage Error (Parameter check failed at: runtime/rt/runtime.cpp::~Runtime::401, condition: mEngineCounter.use_count() == 1 Destroying a runtime before destroying deserialized engines created by the runtime leads to undefined behavior.)
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 102, GPU 1097 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 102, GPU 1105 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +3056, now: CPU 1, GPU 3193 (MiB)
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 103, GPU 4169 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 103, GPU 4179 (MiB)
[I] [TRT] Could not set default profile 0 for execution context. Profile index must be set explicitly.
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +3055, now: CPU 1, GPU 6248 (MiB)
[E] [TRT] 3: [executionContext.cpp::setOptimizationProfileInternal::1328] Error Code 3: Internal Error (Profile 0 has been chosen by another IExecutionContext. Use another profileIndex or destroy the IExecutionContext that use this profile.)
F0317 09:54:52.402531  8377 lwis.cpp:245] Check failed: context-&gt;setOptimizationProfile(profileIdx) == true (0 vs. 1) 
*** Check failure stack trace: ***
    @     0x79b8bdfa81c3  google::LogMessage::Fail()
    @     0x79b8bdfad25b  google::LogMessage::SendToLog()
    @     0x79b8bdfa7ebf  google::LogMessage::Flush()
    @     0x79b8bdfa86ef  google::LogMessageFatal::~LogMessageFatal()
    @     0x5619743e2b1c  lwis::Device::Setup()
    @     0x5619743e4ceb  lwis::Server::Setup()
    @     0x5619743409d0  doInference()
    @     0x56197433e190  main
    @     0x79b8abb74083  __libc_start_main
    @     0x56197433e71e  _start
Aborted (core dumped)
Traceback (most recent call last):
  File &quot;/usr/lib/python3.8/runpy.py&quot;, line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File &quot;/usr/lib/python3.8/runpy.py&quot;, line 87, in _run_code
    exec(code, run_globals)
  File &quot;/home/mlcuser/MLC/repos/local/cache/get-git-repo_918692f6/repo/closed/NVIDIA/code/main.py&quot;, line 231, in &lt;module&gt;
    main(main_args, DETECTED_SYSTEM)
  File &quot;/home/mlcuser/MLC/repos/local/cache/get-git-repo_918692f6/repo/closed/NVIDIA/code/main.py&quot;, line 144, in main
    dispatch_action(main_args, config_dict, workload_setting)
  File &quot;/home/mlcuser/MLC/repos/local/cache/get-git-repo_918692f6/repo/closed/NVIDIA/code/main.py&quot;, line 202, in dispatch_action
    handler.run()
  File &quot;/home/mlcuser/MLC/repos/local/cache/get-git-repo_918692f6/repo/closed/NVIDIA/code/actionhandler/base.py&quot;, line 82, in run
    self.handle_failure()
  File &quot;/home/mlcuser/MLC/repos/local/cache/get-git-repo_918692f6/repo/closed/NVIDIA/code/actionhandler/run_harness.py&quot;, line 193, in handle_failure
    raise RuntimeError(&quot;Run harness failed!&quot;)
RuntimeError: Run harness failed!
Traceback (most recent call last):
  File &quot;/home/mlcuser/MLC/repos/local/cache/get-git-repo_918692f6/repo/closed/NVIDIA/code/actionhandler/run_harness.py&quot;, line 161, in handle
    result_data = self.harness.run_harness(flag_dict=self.harness_flag_dict, skip_generate_measurements=True)
  File &quot;/home/mlcuser/MLC/repos/local/cache/get-git-repo_918692f6/repo/closed/NVIDIA/code/common/harness.py&quot;, line 352, in run_harness
    output = run_command(self._construct_terminal_command(argstr), get_output=True, custom_env=self.env_vars)
  File &quot;/home/mlcuser/MLC/repos/local/cache/get-git-repo_918692f6/repo/closed/NVIDIA/code/common/__init__.py&quot;, line 67, in run_command
    raise subprocess.CalledProcessError(ret, cmd)
subprocess.CalledProcessError: Command &apos;./build/bin/harness_default --plugins=&quot;build/plugins/NMSOptPlugin/libnmsoptplugin.so,build/plugins/retinanetConcatPlugin/libretinanetconcatplugin.so&quot; --logfile_outdir=&quot;/home/mlcuser/MLC/repos/local/cache/get-mlperf-inference-results-dir_c369e3b3/test_results/2e6ba58d1633-nvidia_original-gpu-tensorrt-vdefault-default_config/retinanet/offline/performance/run_1&quot; --logfile_prefix=&quot;mlperf_log_&quot; --performance_sample_count=64 --test_mode=&quot;PerformanceOnly&quot; --use_deque_limit=true --gpu_batch_size=4 --map_path=&quot;data_maps/open-images-v6-mlperf/val_map.txt&quot; --mlperf_conf_path=&quot;/home/mlcuser/MLC/repos/local/cache/get-git-repo_da4c73f6/inference/mlperf.conf&quot; --tensor_path=&quot;build/preprocessed_data/open-images-v6-mlperf/validation/Retinanet/int8_linear&quot; --use_graphs=false --user_conf_path=&quot;/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/f29caee5e0e047f3811f7c0ce5f822ff.conf&quot; --gpu_engines=&quot;./build/engines/Nvidia_2e6ba58d1633/retinanet/Offline/retinanet-Offline-gpu-b4-int8.lwis_k_99_MaxP.plan&quot; --max_dlas=0 --scenario Offline --model retinanet --response_postprocess openimageeffnms&apos; returned non-zero exit status 134.
make: *** [Makefile:45: run_harness] Error 1
Traceback (most recent call last):
  File &quot;/home/mlcuser/.local/bin/mlcr&quot;, line 8, in &lt;module&gt;
    sys.exit(mlcr())
  File &quot;/home/mlcuser/.local/lib/python3.8/site-packages/mlc/main.py&quot;, line 86, in mlcr
    main()
  File &quot;/home/mlcuser/.local/lib/python3.8/site-packages/mlc/main.py&quot;, line 173, in main
    res = method(run_args)
  File &quot;/home/mlcuser/.local/lib/python3.8/site-packages/mlc/script_action.py&quot;, line 141, in run
    return self.call_script_module_function(&quot;run&quot;, run_args)
  File &quot;/home/mlcuser/.local/lib/python3.8/site-packages/mlc/script_action.py&quot;, line 121, in call_script_module_function
    result = automation_instance.run(run_args)  # Pass args to the run method
  File &quot;/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py&quot;, line 225, in run
    r = self._run(i)
  File &quot;/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py&quot;, line 1850, in _run
    r = self._call_run_deps(prehook_deps, self.local_env_keys, local_env_keys_from_meta, env, state, const, const_state, add_deps_recursive,
  File &quot;/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py&quot;, line 3300, in _call_run_deps
    r = script._run_deps(deps, local_env_keys, env, state, const, const_state, add_deps_recursive, recursion_spaces,
  File &quot;/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py&quot;, line 3470, in _run_deps
    r = self.action_object.access(ii)
  File &quot;/home/mlcuser/.local/lib/python3.8/site-packages/mlc/action.py&quot;, line 56, in access
    result = method(options)
  File &quot;/home/mlcuser/.local/lib/python3.8/site-packages/mlc/script_action.py&quot;, line 141, in run
    return self.call_script_module_function(&quot;run&quot;, run_args)
  File &quot;/home/mlcuser/.local/lib/python3.8/site-packages/mlc/script_action.py&quot;, line 121, in call_script_module_function
    result = automation_instance.run(run_args)  # Pass args to the run method
  File &quot;/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py&quot;, line 225, in run
    r = self._run(i)
  File &quot;/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py&quot;, line 1893, in _run
    r = self._run_deps(post_deps, clean_env_keys_post_deps, env, state, const, const_state, add_deps_recursive, recursion_spaces,
  File &quot;/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py&quot;, line 3470, in _run_deps
    r = self.action_object.access(ii)
  File &quot;/home/mlcuser/.local/lib/python3.8/site-packages/mlc/action.py&quot;, line 56, in access
    result = method(options)
  File &quot;/home/mlcuser/.local/lib/python3.8/site-packages/mlc/script_action.py&quot;, line 141, in run
    return self.call_script_module_function(&quot;run&quot;, run_args)
  File &quot;/home/mlcuser/.local/lib/python3.8/site-packages/mlc/script_action.py&quot;, line 121, in call_script_module_function
    result = automation_instance.run(run_args)  # Pass args to the run method
  File &quot;/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py&quot;, line 225, in run
    r = self._run(i)
  File &quot;/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py&quot;, line 1893, in _run
    r = self._run_deps(post_deps, clean_env_keys_post_deps, env, state, const, const_state, add_deps_recursive, recursion_spaces,
  File &quot;/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py&quot;, line 3470, in _run_deps
    r = self.action_object.access(ii)
  File &quot;/home/mlcuser/.local/lib/python3.8/site-packages/mlc/action.py&quot;, line 56, in access
    result = method(options)
  File &quot;/home/mlcuser/.local/lib/python3.8/site-packages/mlc/script_action.py&quot;, line 141, in run
    return self.call_script_module_function(&quot;run&quot;, run_args)
  File &quot;/home/mlcuser/.local/lib/python3.8/site-packages/mlc/script_action.py&quot;, line 131, in call_script_module_function
    raise ScriptExecutionError(f&quot;Script {function_name} execution failed. Error : {error}&quot;)
mlc.script_action.ScriptExecutionError: Script run execution failed. Error : MLC script failed (name = benchmark-program, return code = 512)


^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Please file an issue at https://github.com/mlcommons/mlperf-automations/issues along with the full MLC command being run and the relevant
or full console log.
</pre>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Seems not support running mlc inferenece on multiple GPUs? #2157

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Seems not support running mlc inferenece on multiple GPUs? #2157

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions