triton server onnxbackend 部署报错 #50

xiaoFine · 2023-02-09T02:46:30Z

tritonserver image version: nvcr.io/nvidia/tritonserver:22.05-py3
model: ViT-H-14
bash error: creating server: Invalid argument - load failed for model 'clip-image-onnx': version 1 is at UNAVAILABLE state: Internal: onnx runtime error 6: Exception during initialization: /workspace/onnxruntime/onnxruntime/core/optimizer/optimizer_execution_frame.cc:78 onnxruntime::OptimizerExecutionFrame::Info::Info(const std::vector<const onnxruntime::Node*>&, const InitializedTensorSet&, const onnxruntime::Path&, const onnxruntime::IExecutionProvider&, const std::function<bool(const std::__cxx11::basic_string<char>&)>&) [ONNXRuntimeError] : 1 : FAIL : tensorprotoutils.cc:622 GetExtDataFromTensorProto External initializer: visual.transformer.resblocks.31.mlp.c_proj.weight offset: 1251033600 size to read: 13107200 given file_length: 391708810 are out of bounds or can not be read in full.

The text was updated successfully, but these errors were encountered:

xiaoFine · 2023-02-09T02:50:47Z

完整日志：
`

=============================
== Triton Inference Server ==

NVIDIA Release 23.01 (build 52277748)
Triton Server Version 2.30.0

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

NOTE: CUDA Forward Compatibility mode ENABLED.
Using CUDA 11.7 driver version 525.85.11 with kernel driver version 515.86.01.
See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.

Warning: '--strict-model-config' has been deprecated! Please use '--disable-auto-complete-config' instead.
I0209 02:48:01.613148 1 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f9e60000000' with size 268435456
I0209 02:48:01.614967 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0209 02:48:01.620605 1 model_config_utils.cc:646] Server side auto-completed config: name: "clip-image"
max_batch_size: 2
input {
name: "INPUT"
data_type: TYPE_STRING
dims: 1
reshape {
}
}
output {
name: "FEATURE"
data_type: TYPE_FP64
dims: -1
dims: 1024
}
instance_group {
count: 1
kind: KIND_CPU
}
default_model_filename: "model.py"
dynamic_batching {
}
parameters {
key: "EXECUTION_ENV_PATH"
value {
string_value: "$$TRITON_MODEL_DIRECTORY/env.tar.gz"
}
}
backend: "python"

I0209 02:48:01.620967 1 model_config_utils.cc:646] Server side auto-completed config: name: "clip-image-onnx"
platform: "onnxruntime_onnx"
default_model_filename: "model.onnx"
backend: "onnxruntime"

I0209 02:48:01.621045 1 model_lifecycle.cc:459] loading: clip-image:1
I0209 02:48:01.621085 1 model_lifecycle.cc:459] loading: clip-image-onnx:1
I0209 02:48:01.621181 1 backend_model.cc:348] Adding default backend config setting: default-max-batch-size,4
I0209 02:48:01.621208 1 backend_model.cc:348] Adding default backend config setting: default-max-batch-size,4
I0209 02:48:01.621214 1 shared_library.cc:112] OpenLibraryHandle: /opt/tritonserver/backends/python/libtriton_python.so
I0209 02:48:01.622734 1 python_be.cc:1614] 'python' TRITONBACKEND API version: 1.11
I0209 02:48:01.622754 1 python_be.cc:1636] backend configuration:
{"cmdline":{"auto-complete-config":"true","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}}
I0209 02:48:01.622772 1 python_be.cc:1766] Shared memory configuration is shm-default-byte-size=67108864,shm-growth-byte-size=67108864,stub-timeout-seconds=30
I0209 02:48:01.622887 1 python_be.cc:2012] TRITONBACKEND_GetBackendAttribute: setting attributes
I0209 02:48:01.622927 1 shared_library.cc:112] OpenLibraryHandle: /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so
I0209 02:48:01.624288 1 onnxruntime.cc:2459] TRITONBACKEND_Initialize: onnxruntime
I0209 02:48:01.624311 1 onnxruntime.cc:2469] Triton TRITONBACKEND API version: 1.11
I0209 02:48:01.624317 1 onnxruntime.cc:2475] 'onnxruntime' TRITONBACKEND API version: 1.11
I0209 02:48:01.624322 1 onnxruntime.cc:2505] backend configuration:
{"cmdline":{"auto-complete-config":"true","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}}
I0209 02:48:01.636771 1 python_be.cc:1814] TRITONBACKEND_ModelInitialize: clip-image (version 1)
I0209 02:48:01.637211 1 model_config_utils.cc:1838] ModelConfig 64-bit fields:
I0209 02:48:01.637226 1 model_config_utils.cc:1840] ModelConfig::dynamic_batching::default_queue_policy::default_timeout_microseconds
I0209 02:48:01.637230 1 model_config_utils.cc:1840] ModelConfig::dynamic_batching::max_queue_delay_microseconds
I0209 02:48:01.637234 1 model_config_utils.cc:1840] ModelConfig::dynamic_batching::priority_queue_policy::value::default_timeout_microseconds
I0209 02:48:01.637238 1 model_config_utils.cc:1840] ModelConfig::ensemble_scheduling::step::model_version
I0209 02:48:01.637242 1 model_config_utils.cc:1840] ModelConfig::input::dims
I0209 02:48:01.637246 1 model_config_utils.cc:1840] ModelConfig::input::reshape::shape
I0209 02:48:01.637250 1 model_config_utils.cc:1840] ModelConfig::instance_group::secondary_devices::device_id
I0209 02:48:01.637254 1 model_config_utils.cc:1840] ModelConfig::model_warmup::inputs::value::dims
I0209 02:48:01.637258 1 model_config_utils.cc:1840] ModelConfig::optimization::cuda::graph_spec::graph_lower_bound::input::value::dim
I0209 02:48:01.637263 1 model_config_utils.cc:1840] ModelConfig::optimization::cuda::graph_spec::input::value::dim
I0209 02:48:01.637266 1 model_config_utils.cc:1840] ModelConfig::output::dims
I0209 02:48:01.637271 1 model_config_utils.cc:1840] ModelConfig::output::reshape::shape
I0209 02:48:01.637275 1 model_config_utils.cc:1840] ModelConfig::sequence_batching::direct::max_queue_delay_microseconds
I0209 02:48:01.637279 1 model_config_utils.cc:1840] ModelConfig::sequence_batching::max_sequence_idle_microseconds
I0209 02:48:01.637283 1 model_config_utils.cc:1840] ModelConfig::sequence_batching::oldest::max_queue_delay_microseconds
I0209 02:48:01.637287 1 model_config_utils.cc:1840] ModelConfig::sequence_batching::state::dims
I0209 02:48:01.637291 1 model_config_utils.cc:1840] ModelConfig::sequence_batching::state::initial_state::dims
I0209 02:48:01.637295 1 model_config_utils.cc:1840] ModelConfig::version_policy::specific::versions
I0209 02:48:01.637387 1 python_be.cc:1505] Using Python execution env /mnt/models/clip-image/env.tar.gz
I0209 02:48:24.295399 1 stub_launcher.cc:251] Starting Python backend stub: source /tmp/python_env_1zugRg/0/bin/activate && exec env LD_LIBRARY_PATH=/tmp/python_env_1zugRg/0/lib:$LD_LIBRARY_PATH /opt/tritonserver/backends/python/triton_python_backend_stub /mnt/models/clip-image/1/model.py triton_python_backend_shm_region_1 67108864 67108864 1 /opt/tritonserver/backends/python 336 clip-image
I0209 02:48:27.701705 1 python_be.cc:1594] model configuration:
{
"name": "clip-image",
"platform": "",
"backend": "python",
"version_policy": {
"latest": {
"num_versions": 1
}
},
"max_batch_size": 2,
"input": [
{
"name": "INPUT",
"data_type": "TYPE_STRING",
"format": "FORMAT_NONE",
"dims": [
1
],
"reshape": {
"shape": []
},
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": false
}
],
"output": [
{
"name": "FEATURE",
"data_type": "TYPE_FP64",
"dims": [
-1,
1024
],
"label_filename": "",
"is_shape_tensor": false
}
],
"batch_input": [],
"batch_output": [],
"optimization": {
"priority": "PRIORITY_DEFAULT",
"input_pinned_memory": {
"enable": true
},
"output_pinned_memory": {
"enable": true
},
"gather_kernel_buffer_threshold": 0,
"eager_batching": false
},
"dynamic_batching": {
"preferred_batch_size": [
2
],
"max_queue_delay_microseconds": 0,
"preserve_ordering": false,
"priority_levels": 0,
"default_priority_level": 0,
"priority_queue_policy": {}
},
"instance_group": [
{
"name": "clip-image_0",
"kind": "KIND_CPU",
"count": 1,
"gpus": [],
"secondary_devices": [],
"profile": [],
"passive": false,
"host_policy": ""
}
],
"default_model_filename": "model.py",
"cc_model_filenames": {},
"metric_tags": {},
"parameters": {
"EXECUTION_ENV_PATH": {
"string_value": "$$TRITON_MODEL_DIRECTORY/env.tar.gz"
}
},
"model_warmup": []
}
I0209 02:48:27.701805 1 onnxruntime.cc:2563] TRITONBACKEND_ModelInitialize: clip-image-onnx (version 1)
I0209 02:48:27.840702 1 onnxruntime.cc:553] CUDA Execution Accelerator is set for 'clip-image-onnx' on device 0
2023-02-09 02:48:27.840807714 [I:onnxruntime:, inference_session.cc:263 operator()] Flush-to-zero and denormal-as-zero are off
2023-02-09 02:48:27.840836956 [I:onnxruntime:, inference_session.cc:271 ConstructorCommon] Creating and using per session threadpools since use_per_session_threads_ is true
2023-02-09 02:48:27.840844398 [I:onnxruntime:, inference_session.cc:292 ConstructorCommon] Dynamic block base set to 0
2023-02-09 02:48:27.910012148 [I:onnxruntime:, inference_session.cc:1222 Initialize] Initializing session.
2023-02-09 02:48:27.910035307 [I:onnxruntime:, inference_session.cc:1259 Initialize] Adding default CPU execution provider.
2023-02-09 02:48:27.910047050 [I:onnxruntime:log, bfc_arena.cc:26 BFCArena] Creating BFCArena for Cuda with following configs: initial_chunk_size_bytes: 1048576 max_dead_bytes_per_chunk: 134217728 initial_growth_chunk_size_bytes: 2097152 memory limit: 18446744073709551615 arena_extend_strategy: 0
2023-02-09 02:48:27.910053411 [V:onnxruntime:log, bfc_arena.cc:62 BFCArena] Creating 21 bins of max chunk size 256 to 268435456
2023-02-09 02:48:27.910063800 [I:onnxruntime:log, bfc_arena.cc:26 BFCArena] Creating BFCArena for CudaPinned with following configs: initial_chunk_size_bytes: 1048576 max_dead_bytes_per_chunk: 134217728 initial_growth_chunk_size_bytes: 2097152 memory limit: 18446744073709551615 arena_extend_strategy: 0
2023-02-09 02:48:27.910070835 [V:onnxruntime:log, bfc_arena.cc:62 BFCArena] Creating 21 bins of max chunk size 256 to 268435456
2023-02-09 02:48:27.910079844 [I:onnxruntime:log, bfc_arena.cc:26 BFCArena] Creating BFCArena for CUDA_CPU with following configs: initial_chunk_size_bytes: 1048576 max_dead_bytes_per_chunk: 134217728 initial_growth_chunk_size_bytes: 2097152 memory limit: 18446744073709551615 arena_extend_strategy: 0
2023-02-09 02:48:27.910086551 [V:onnxruntime:log, bfc_arena.cc:62 BFCArena] Creating 21 bins of max chunk size 256 to 268435456
2023-02-09 02:48:27.910095675 [I:onnxruntime:log, bfc_arena.cc:26 BFCArena] Creating BFCArena for Cpu with following configs: initial_chunk_size_bytes: 1048576 max_dead_bytes_per_chunk: 134217728 initial_growth_chunk_size_bytes: 2097152 memory limit: 18446744073709551615 arena_extend_strategy: 0
2023-02-09 02:48:27.910102877 [V:onnxruntime:log, bfc_arena.cc:62 BFCArena] Creating 21 bins of max chunk size 256 to 268435456
2023-02-09 02:48:27.945209583 [E:onnxruntime:, inference_session.cc:1499 operator()] Exception during initialization: /workspace/onnxruntime/onnxruntime/core/optimizer/optimizer_execution_frame.cc:78 onnxruntime::OptimizerExecutionFrame::Info::Info(const std::vector<const onnxruntime::Node*>&, const InitializedTensorSet&, const onnxruntime::Path&, const onnxruntime::IExecutionProvider&, const std::function<bool(const std::__cxx11::basic_string&)>&) [ONNXRuntimeError] : 1 : FAIL : tensorprotoutils.cc:622 GetExtDataFromTensorProto External initializer: visual.transformer.resblocks.31.mlp.c_proj.weight offset: 1251033600 size to read: 13107200 given file_length: 391708810 are out of bounds or can not be read in full.

I0209 02:48:27.951284 1 onnxruntime.cc:2586] TRITONBACKEND_ModelFinalize: delete model state
I0209 02:48:27.951291 1 python_be.cc:1858] TRITONBACKEND_ModelInstanceInitialize: clip-image_0 (CPU device 0)
I0209 02:48:27.951331 1 backend_model_instance.cc:68] Creating instance clip-image_0 on CPU using artifact 'model.py'
E0209 02:48:27.951333 1 model_lifecycle.cc:597] failed to load 'clip-image-onnx' version 1: Internal: onnx runtime error 6: Exception during initialization: /workspace/onnxruntime/onnxruntime/core/optimizer/optimizer_execution_frame.cc:78 onnxruntime::OptimizerExecutionFrame::Info::Info(const std::vector<const onnxruntime::Node*>&, const InitializedTensorSet&, const onnxruntime::Path&, const onnxruntime::IExecutionProvider&, const std::function<bool(const std::__cxx11::basic_string&)>&) [ONNXRuntimeError] : 1 : FAIL : tensorprotoutils.cc:622 GetExtDataFromTensorProto External initializer: visual.transformer.resblocks.31.mlp.c_proj.weight offset: 1251033600 size to read: 13107200 given file_length: 391708810 are out of bounds or can not be read in full.

I0209 02:48:27.964180 1 stub_launcher.cc:251] Starting Python backend stub: source /tmp/python_env_1zugRg/0/bin/activate && exec env LD_LIBRARY_PATH=/tmp/python_env_1zugRg/0/lib:$LD_LIBRARY_PATH /opt/tritonserver/backends/python/triton_python_backend_stub /mnt/models/clip-image/1/model.py triton_python_backend_shm_region_2 67108864 67108864 1 /opt/tritonserver/backends/python 336 clip-image_0
I0209 02:48:29.337049 1 python_be.cc:1879] TRITONBACKEND_ModelInstanceInitialize: instance initialization successful clip-image_0 (device 0)
I0209 02:48:29.337230 1 backend_model_instance.cc:766] Starting backend thread for clip-image_0 at nice 0 on device 0...
I0209 02:48:29.337460 1 model_lifecycle.cc:694] successfully loaded 'clip-image' version 1
I0209 02:48:29.337467 1 dynamic_batch_scheduler.cc:284] Starting dynamic-batcher thread for clip-image at nice 0...
I0209 02:48:29.337577 1 server.cc:563]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0209 02:48:29.337638 1 server.cc:590]
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"true","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {"cmdline":{"auto-complete-config":"true","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0209 02:48:29.337714 1 server.cc:633]
+-----------------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+-----------------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| clip-image | 1 | READY |
| clip-image-onnx | 1 | UNAVAILABLE: Internal: onnx runtime error 6: Exception during initialization: /workspace/onnxruntime/onnxruntime/core/optimizer/optimizer_execution_frame.cc:78 onnxruntime::OptimizerExecutionFrame::Info::Info(const std::vector<const onnxruntime::Node*>&, const InitializedTensorSet&, const onnxruntime::Path&, const onnxruntime::IExecutionProvider&, const std::function<bool(const std::__cxx11::basic_string&)>&) [ONNXRuntimeError] : 1 : FAIL : tensorprotoutils |
| | | .cc:622 GetExtDataFromTensorProto External initializer: visual.transformer.resblocks.31.mlp.c_proj.weight offset: 1251033600 size to read: 13107200 given file_length: 391708810 are out of bounds or can not be read in full. |
+-----------------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0209 02:48:29.366466 1 metrics.cc:864] Collecting metrics for GPU 0: Tesla T4
I0209 02:48:29.366729 1 metrics.cc:757] Collecting CPU metrics
I0209 02:48:29.366897 1 tritonserver.cc:2264]
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.30.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace logging |
| model_repository_path[0] | /mnt/models |
| model_control_mode | MODE_EXPLICIT |
| startup_models_0 | clip-image |
| startup_models_1 | clip-image-onnx |
| strict_model_config | 0 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0209 02:48:29.366927 1 server.cc:264] Waiting for in-flight requests to complete.
I0209 02:48:29.366935 1 server.cc:280] Timeout 30: Found 0 model versions that have in-flight inferences
I0209 02:48:29.367004 1 server.cc:295] All models are stopped, unloading models
I0209 02:48:29.367016 1 server.cc:302] Timeout 30: Found 1 live models and 0 in-flight non-inference requests
I0209 02:48:29.367023 1 server.cc:309] clip-image v1: UNLOADING
I0209 02:48:29.367056 1 backend_model_instance.cc:789] Stopping backend thread for clip-image_0...
I0209 02:48:29.367107 1 python_be.cc:1998] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0209 02:48:30.367141 1 server.cc:302] Timeout 29: Found 1 live models and 0 in-flight non-inference requests
I0209 02:48:30.367170 1 server.cc:309] clip-image v1: UNLOADING
I0209 02:48:30.629799 1 python_be.cc:1837] TRITONBACKEND_ModelFinalize: delete model state
I0209 02:48:30.629881 1 dynamic_batch_scheduler.cc:430] Stopping dynamic-batcher thread for clip-image...
I0209 02:48:30.629957 1 model_lifecycle.cc:579] successfully unloaded 'clip-image' version 1
I0209 02:48:31.367487 1 server.cc:302] Timeout 28: Found 0 live models and 0 in-flight non-inference requests
I0209 02:48:31.367530 1 backend_manager.cc:137] unloading backend 'onnxruntime'
I0209 02:48:31.368416 1 backend_manager.cc:137] unloading backend 'python'
I0209 02:48:31.368438 1 python_be.cc:1794] TRITONBACKEND_Finalize: Start
I0209 02:48:32.129311 1 python_be.cc:1799] TRITONBACKEND_Finalize: End
error: creating server: Invalid argument - load failed for model 'clip-image-onnx': version 1 is at UNAVAILABLE state: Internal: onnx runtime error 6: Exception during initialization: /workspace/onnxruntime/onnxruntime/core/optimizer/optimizer_execution_frame.cc:78 onnxruntime::OptimizerExecutionFrame::Info::Info(const std::vector<const onnxruntime::Node*>&, const InitializedTensorSet&, const onnxruntime::Path&, const onnxruntime::IExecutionProvider&, const std::function<bool(const std::__cxx11::basic_string&)>&) [ONNXRuntimeError] : 1 : FAIL : tensorprotoutils.cc:622 GetExtDataFromTensorProto External initializer: visual.transformer.resblocks.31.mlp.c_proj.weight offset: 1251033600 size to read: 13107200 given file_length: 391708810 are out of bounds or can not be read in full.
;

`

yangapku · 2023-02-09T11:57:43Z

您好，看上去我们初步怀疑是ONNX文件读取不完整。请问您按照文档重新准备一次（并且实验时保证下ONNX文件的路径不要有变化），还会出现这个问题吗？

xiaoFine · 2023-02-10T03:29:29Z

您好，看上去我们初步怀疑是ONNX文件读取不完整。请问您按照文档重新准备一次（并且实验时保证下ONNX文件的路径不要有变化），还会出现这个问题吗？

重新export了一次并做了md5校验，现在triton server可以正常run起来了!感谢

xiaoFine closed this as completed Feb 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

triton server onnxbackend 部署报错 #50

triton server onnxbackend 部署报错 #50

xiaoFine commented Feb 9, 2023 •

edited

xiaoFine commented Feb 9, 2023

yangapku commented Feb 9, 2023

xiaoFine commented Feb 10, 2023

triton server onnxbackend 部署报错 #50

triton server onnxbackend 部署报错 #50

Comments

xiaoFine commented Feb 9, 2023 • edited

xiaoFine commented Feb 9, 2023

============================= == Triton Inference Server ==

yangapku commented Feb 9, 2023

xiaoFine commented Feb 10, 2023

xiaoFine commented Feb 9, 2023 •

edited

=============================
== Triton Inference Server ==