Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

triton server onnxbackend 部署报错 #50

Closed
xiaoFine opened this issue Feb 9, 2023 · 3 comments
Closed

triton server onnxbackend 部署报错 #50

xiaoFine opened this issue Feb 9, 2023 · 3 comments

Comments

@xiaoFine
Copy link

xiaoFine commented Feb 9, 2023

tritonserver image version: nvcr.io/nvidia/tritonserver:22.05-py3
model: ViT-H-14
bash error: creating server: Invalid argument - load failed for model 'clip-image-onnx': version 1 is at UNAVAILABLE state: Internal: onnx runtime error 6: Exception during initialization: /workspace/onnxruntime/onnxruntime/core/optimizer/optimizer_execution_frame.cc:78 onnxruntime::OptimizerExecutionFrame::Info::Info(const std::vector<const onnxruntime::Node*>&, const InitializedTensorSet&, const onnxruntime::Path&, const onnxruntime::IExecutionProvider&, const std::function<bool(const std::__cxx11::basic_string<char>&)>&) [ONNXRuntimeError] : 1 : FAIL : tensorprotoutils.cc:622 GetExtDataFromTensorProto External initializer: visual.transformer.resblocks.31.mlp.c_proj.weight offset: 1251033600 size to read: 13107200 given file_length: 391708810 are out of bounds or can not be read in full.

@xiaoFine
Copy link
Author

xiaoFine commented Feb 9, 2023

完整日志:
`

=============================
== Triton Inference Server ==

NVIDIA Release 23.01 (build 52277748)
Triton Server Version 2.30.0

Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

NOTE: CUDA Forward Compatibility mode ENABLED.
Using CUDA 11.7 driver version 525.85.11 with kernel driver version 515.86.01.
See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.

Warning: '--strict-model-config' has been deprecated! Please use '--disable-auto-complete-config' instead.
I0209 02:48:01.613148 1 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f9e60000000' with size 268435456
I0209 02:48:01.614967 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0209 02:48:01.620605 1 model_config_utils.cc:646] Server side auto-completed config: name: "clip-image"
max_batch_size: 2
input {
name: "INPUT"
data_type: TYPE_STRING
dims: 1
reshape {
}
}
output {
name: "FEATURE"
data_type: TYPE_FP64
dims: -1
dims: 1024
}
instance_group {
count: 1
kind: KIND_CPU
}
default_model_filename: "model.py"
dynamic_batching {
}
parameters {
key: "EXECUTION_ENV_PATH"
value {
string_value: "$$TRITON_MODEL_DIRECTORY/env.tar.gz"
}
}
backend: "python"

I0209 02:48:01.620967 1 model_config_utils.cc:646] Server side auto-completed config: name: "clip-image-onnx"
platform: "onnxruntime_onnx"
default_model_filename: "model.onnx"
backend: "onnxruntime"

I0209 02:48:01.621045 1 model_lifecycle.cc:459] loading: clip-image:1
I0209 02:48:01.621085 1 model_lifecycle.cc:459] loading: clip-image-onnx:1
I0209 02:48:01.621181 1 backend_model.cc:348] Adding default backend config setting: default-max-batch-size,4
I0209 02:48:01.621208 1 backend_model.cc:348] Adding default backend config setting: default-max-batch-size,4
I0209 02:48:01.621214 1 shared_library.cc:112] OpenLibraryHandle: /opt/tritonserver/backends/python/libtriton_python.so
I0209 02:48:01.622734 1 python_be.cc:1614] 'python' TRITONBACKEND API version: 1.11
I0209 02:48:01.622754 1 python_be.cc:1636] backend configuration:
{"cmdline":{"auto-complete-config":"true","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}}
I0209 02:48:01.622772 1 python_be.cc:1766] Shared memory configuration is shm-default-byte-size=67108864,shm-growth-byte-size=67108864,stub-timeout-seconds=30
I0209 02:48:01.622887 1 python_be.cc:2012] TRITONBACKEND_GetBackendAttribute: setting attributes
I0209 02:48:01.622927 1 shared_library.cc:112] OpenLibraryHandle: /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so
I0209 02:48:01.624288 1 onnxruntime.cc:2459] TRITONBACKEND_Initialize: onnxruntime
I0209 02:48:01.624311 1 onnxruntime.cc:2469] Triton TRITONBACKEND API version: 1.11
I0209 02:48:01.624317 1 onnxruntime.cc:2475] 'onnxruntime' TRITONBACKEND API version: 1.11
I0209 02:48:01.624322 1 onnxruntime.cc:2505] backend configuration:
{"cmdline":{"auto-complete-config":"true","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}}
I0209 02:48:01.636771 1 python_be.cc:1814] TRITONBACKEND_ModelInitialize: clip-image (version 1)
I0209 02:48:01.637211 1 model_config_utils.cc:1838] ModelConfig 64-bit fields:
I0209 02:48:01.637226 1 model_config_utils.cc:1840] ModelConfig::dynamic_batching::default_queue_policy::default_timeout_microseconds
I0209 02:48:01.637230 1 model_config_utils.cc:1840] ModelConfig::dynamic_batching::max_queue_delay_microseconds
I0209 02:48:01.637234 1 model_config_utils.cc:1840] ModelConfig::dynamic_batching::priority_queue_policy::value::default_timeout_microseconds
I0209 02:48:01.637238 1 model_config_utils.cc:1840] ModelConfig::ensemble_scheduling::step::model_version
I0209 02:48:01.637242 1 model_config_utils.cc:1840] ModelConfig::input::dims
I0209 02:48:01.637246 1 model_config_utils.cc:1840] ModelConfig::input::reshape::shape
I0209 02:48:01.637250 1 model_config_utils.cc:1840] ModelConfig::instance_group::secondary_devices::device_id
I0209 02:48:01.637254 1 model_config_utils.cc:1840] ModelConfig::model_warmup::inputs::value::dims
I0209 02:48:01.637258 1 model_config_utils.cc:1840] ModelConfig::optimization::cuda::graph_spec::graph_lower_bound::input::value::dim
I0209 02:48:01.637263 1 model_config_utils.cc:1840] ModelConfig::optimization::cuda::graph_spec::input::value::dim
I0209 02:48:01.637266 1 model_config_utils.cc:1840] ModelConfig::output::dims
I0209 02:48:01.637271 1 model_config_utils.cc:1840] ModelConfig::output::reshape::shape
I0209 02:48:01.637275 1 model_config_utils.cc:1840] ModelConfig::sequence_batching::direct::max_queue_delay_microseconds
I0209 02:48:01.637279 1 model_config_utils.cc:1840] ModelConfig::sequence_batching::max_sequence_idle_microseconds
I0209 02:48:01.637283 1 model_config_utils.cc:1840] ModelConfig::sequence_batching::oldest::max_queue_delay_microseconds
I0209 02:48:01.637287 1 model_config_utils.cc:1840] ModelConfig::sequence_batching::state::dims
I0209 02:48:01.637291 1 model_config_utils.cc:1840] ModelConfig::sequence_batching::state::initial_state::dims
I0209 02:48:01.637295 1 model_config_utils.cc:1840] ModelConfig::version_policy::specific::versions
I0209 02:48:01.637387 1 python_be.cc:1505] Using Python execution env /mnt/models/clip-image/env.tar.gz
I0209 02:48:24.295399 1 stub_launcher.cc:251] Starting Python backend stub: source /tmp/python_env_1zugRg/0/bin/activate && exec env LD_LIBRARY_PATH=/tmp/python_env_1zugRg/0/lib:$LD_LIBRARY_PATH /opt/tritonserver/backends/python/triton_python_backend_stub /mnt/models/clip-image/1/model.py triton_python_backend_shm_region_1 67108864 67108864 1 /opt/tritonserver/backends/python 336 clip-image
I0209 02:48:27.701705 1 python_be.cc:1594] model configuration:
{
"name": "clip-image",
"platform": "",
"backend": "python",
"version_policy": {
"latest": {
"num_versions": 1
}
},
"max_batch_size": 2,
"input": [
{
"name": "INPUT",
"data_type": "TYPE_STRING",
"format": "FORMAT_NONE",
"dims": [
1
],
"reshape": {
"shape": []
},
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": false
}
],
"output": [
{
"name": "FEATURE",
"data_type": "TYPE_FP64",
"dims": [
-1,
1024
],
"label_filename": "",
"is_shape_tensor": false
}
],
"batch_input": [],
"batch_output": [],
"optimization": {
"priority": "PRIORITY_DEFAULT",
"input_pinned_memory": {
"enable": true
},
"output_pinned_memory": {
"enable": true
},
"gather_kernel_buffer_threshold": 0,
"eager_batching": false
},
"dynamic_batching": {
"preferred_batch_size": [
2
],
"max_queue_delay_microseconds": 0,
"preserve_ordering": false,
"priority_levels": 0,
"default_priority_level": 0,
"priority_queue_policy": {}
},
"instance_group": [
{
"name": "clip-image_0",
"kind": "KIND_CPU",
"count": 1,
"gpus": [],
"secondary_devices": [],
"profile": [],
"passive": false,
"host_policy": ""
}
],
"default_model_filename": "model.py",
"cc_model_filenames": {},
"metric_tags": {},
"parameters": {
"EXECUTION_ENV_PATH": {
"string_value": "$$TRITON_MODEL_DIRECTORY/env.tar.gz"
}
},
"model_warmup": []
}
I0209 02:48:27.701805 1 onnxruntime.cc:2563] TRITONBACKEND_ModelInitialize: clip-image-onnx (version 1)
I0209 02:48:27.840702 1 onnxruntime.cc:553] CUDA Execution Accelerator is set for 'clip-image-onnx' on device 0
2023-02-09 02:48:27.840807714 [I:onnxruntime:, inference_session.cc:263 operator()] Flush-to-zero and denormal-as-zero are off
2023-02-09 02:48:27.840836956 [I:onnxruntime:, inference_session.cc:271 ConstructorCommon] Creating and using per session threadpools since use_per_session_threads_ is true
2023-02-09 02:48:27.840844398 [I:onnxruntime:, inference_session.cc:292 ConstructorCommon] Dynamic block base set to 0
2023-02-09 02:48:27.910012148 [I:onnxruntime:, inference_session.cc:1222 Initialize] Initializing session.
2023-02-09 02:48:27.910035307 [I:onnxruntime:, inference_session.cc:1259 Initialize] Adding default CPU execution provider.
2023-02-09 02:48:27.910047050 [I:onnxruntime:log, bfc_arena.cc:26 BFCArena] Creating BFCArena for Cuda with following configs: initial_chunk_size_bytes: 1048576 max_dead_bytes_per_chunk: 134217728 initial_growth_chunk_size_bytes: 2097152 memory limit: 18446744073709551615 arena_extend_strategy: 0
2023-02-09 02:48:27.910053411 [V:onnxruntime:log, bfc_arena.cc:62 BFCArena] Creating 21 bins of max chunk size 256 to 268435456
2023-02-09 02:48:27.910063800 [I:onnxruntime:log, bfc_arena.cc:26 BFCArena] Creating BFCArena for CudaPinned with following configs: initial_chunk_size_bytes: 1048576 max_dead_bytes_per_chunk: 134217728 initial_growth_chunk_size_bytes: 2097152 memory limit: 18446744073709551615 arena_extend_strategy: 0
2023-02-09 02:48:27.910070835 [V:onnxruntime:log, bfc_arena.cc:62 BFCArena] Creating 21 bins of max chunk size 256 to 268435456
2023-02-09 02:48:27.910079844 [I:onnxruntime:log, bfc_arena.cc:26 BFCArena] Creating BFCArena for CUDA_CPU with following configs: initial_chunk_size_bytes: 1048576 max_dead_bytes_per_chunk: 134217728 initial_growth_chunk_size_bytes: 2097152 memory limit: 18446744073709551615 arena_extend_strategy: 0
2023-02-09 02:48:27.910086551 [V:onnxruntime:log, bfc_arena.cc:62 BFCArena] Creating 21 bins of max chunk size 256 to 268435456
2023-02-09 02:48:27.910095675 [I:onnxruntime:log, bfc_arena.cc:26 BFCArena] Creating BFCArena for Cpu with following configs: initial_chunk_size_bytes: 1048576 max_dead_bytes_per_chunk: 134217728 initial_growth_chunk_size_bytes: 2097152 memory limit: 18446744073709551615 arena_extend_strategy: 0
2023-02-09 02:48:27.910102877 [V:onnxruntime:log, bfc_arena.cc:62 BFCArena] Creating 21 bins of max chunk size 256 to 268435456
2023-02-09 02:48:27.945209583 [E:onnxruntime:, inference_session.cc:1499 operator()] Exception during initialization: /workspace/onnxruntime/onnxruntime/core/optimizer/optimizer_execution_frame.cc:78 onnxruntime::OptimizerExecutionFrame::Info::Info(const std::vector<const onnxruntime::Node*>&, const InitializedTensorSet&, const onnxruntime::Path&, const onnxruntime::IExecutionProvider&, const std::function<bool(const std::__cxx11::basic_string&)>&) [ONNXRuntimeError] : 1 : FAIL : tensorprotoutils.cc:622 GetExtDataFromTensorProto External initializer: visual.transformer.resblocks.31.mlp.c_proj.weight offset: 1251033600 size to read: 13107200 given file_length: 391708810 are out of bounds or can not be read in full.

I0209 02:48:27.951284 1 onnxruntime.cc:2586] TRITONBACKEND_ModelFinalize: delete model state
I0209 02:48:27.951291 1 python_be.cc:1858] TRITONBACKEND_ModelInstanceInitialize: clip-image_0 (CPU device 0)
I0209 02:48:27.951331 1 backend_model_instance.cc:68] Creating instance clip-image_0 on CPU using artifact 'model.py'
E0209 02:48:27.951333 1 model_lifecycle.cc:597] failed to load 'clip-image-onnx' version 1: Internal: onnx runtime error 6: Exception during initialization: /workspace/onnxruntime/onnxruntime/core/optimizer/optimizer_execution_frame.cc:78 onnxruntime::OptimizerExecutionFrame::Info::Info(const std::vector<const onnxruntime::Node*>&, const InitializedTensorSet&, const onnxruntime::Path&, const onnxruntime::IExecutionProvider&, const std::function<bool(const std::__cxx11::basic_string&)>&) [ONNXRuntimeError] : 1 : FAIL : tensorprotoutils.cc:622 GetExtDataFromTensorProto External initializer: visual.transformer.resblocks.31.mlp.c_proj.weight offset: 1251033600 size to read: 13107200 given file_length: 391708810 are out of bounds or can not be read in full.

I0209 02:48:27.964180 1 stub_launcher.cc:251] Starting Python backend stub: source /tmp/python_env_1zugRg/0/bin/activate && exec env LD_LIBRARY_PATH=/tmp/python_env_1zugRg/0/lib:$LD_LIBRARY_PATH /opt/tritonserver/backends/python/triton_python_backend_stub /mnt/models/clip-image/1/model.py triton_python_backend_shm_region_2 67108864 67108864 1 /opt/tritonserver/backends/python 336 clip-image_0
I0209 02:48:29.337049 1 python_be.cc:1879] TRITONBACKEND_ModelInstanceInitialize: instance initialization successful clip-image_0 (device 0)
I0209 02:48:29.337230 1 backend_model_instance.cc:766] Starting backend thread for clip-image_0 at nice 0 on device 0...
I0209 02:48:29.337460 1 model_lifecycle.cc:694] successfully loaded 'clip-image' version 1
I0209 02:48:29.337467 1 dynamic_batch_scheduler.cc:284] Starting dynamic-batcher thread for clip-image at nice 0...
I0209 02:48:29.337577 1 server.cc:563]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0209 02:48:29.337638 1 server.cc:590]
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"true","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {"cmdline":{"auto-complete-config":"true","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0209 02:48:29.337714 1 server.cc:633]
+-----------------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+-----------------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| clip-image | 1 | READY |
| clip-image-onnx | 1 | UNAVAILABLE: Internal: onnx runtime error 6: Exception during initialization: /workspace/onnxruntime/onnxruntime/core/optimizer/optimizer_execution_frame.cc:78 onnxruntime::OptimizerExecutionFrame::Info::Info(const std::vector<const onnxruntime::Node*>&, const InitializedTensorSet&, const onnxruntime::Path&, const onnxruntime::IExecutionProvider&, const std::function<bool(const std::__cxx11::basic_string&)>&) [ONNXRuntimeError] : 1 : FAIL : tensorprotoutils |
| | | .cc:622 GetExtDataFromTensorProto External initializer: visual.transformer.resblocks.31.mlp.c_proj.weight offset: 1251033600 size to read: 13107200 given file_length: 391708810 are out of bounds or can not be read in full. |
+-----------------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0209 02:48:29.366466 1 metrics.cc:864] Collecting metrics for GPU 0: Tesla T4
I0209 02:48:29.366729 1 metrics.cc:757] Collecting CPU metrics
I0209 02:48:29.366897 1 tritonserver.cc:2264]
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.30.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace logging |
| model_repository_path[0] | /mnt/models |
| model_control_mode | MODE_EXPLICIT |
| startup_models_0 | clip-image |
| startup_models_1 | clip-image-onnx |
| strict_model_config | 0 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0209 02:48:29.366927 1 server.cc:264] Waiting for in-flight requests to complete.
I0209 02:48:29.366935 1 server.cc:280] Timeout 30: Found 0 model versions that have in-flight inferences
I0209 02:48:29.367004 1 server.cc:295] All models are stopped, unloading models
I0209 02:48:29.367016 1 server.cc:302] Timeout 30: Found 1 live models and 0 in-flight non-inference requests
I0209 02:48:29.367023 1 server.cc:309] clip-image v1: UNLOADING
I0209 02:48:29.367056 1 backend_model_instance.cc:789] Stopping backend thread for clip-image_0...
I0209 02:48:29.367107 1 python_be.cc:1998] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0209 02:48:30.367141 1 server.cc:302] Timeout 29: Found 1 live models and 0 in-flight non-inference requests
I0209 02:48:30.367170 1 server.cc:309] clip-image v1: UNLOADING
I0209 02:48:30.629799 1 python_be.cc:1837] TRITONBACKEND_ModelFinalize: delete model state
I0209 02:48:30.629881 1 dynamic_batch_scheduler.cc:430] Stopping dynamic-batcher thread for clip-image...
I0209 02:48:30.629957 1 model_lifecycle.cc:579] successfully unloaded 'clip-image' version 1
I0209 02:48:31.367487 1 server.cc:302] Timeout 28: Found 0 live models and 0 in-flight non-inference requests
I0209 02:48:31.367530 1 backend_manager.cc:137] unloading backend 'onnxruntime'
I0209 02:48:31.368416 1 backend_manager.cc:137] unloading backend 'python'
I0209 02:48:31.368438 1 python_be.cc:1794] TRITONBACKEND_Finalize: Start
I0209 02:48:32.129311 1 python_be.cc:1799] TRITONBACKEND_Finalize: End
error: creating server: Invalid argument - load failed for model 'clip-image-onnx': version 1 is at UNAVAILABLE state: Internal: onnx runtime error 6: Exception during initialization: /workspace/onnxruntime/onnxruntime/core/optimizer/optimizer_execution_frame.cc:78 onnxruntime::OptimizerExecutionFrame::Info::Info(const std::vector<const onnxruntime::Node*>&, const InitializedTensorSet&, const onnxruntime::Path&, const onnxruntime::IExecutionProvider&, const std::function<bool(const std::__cxx11::basic_string&)>&) [ONNXRuntimeError] : 1 : FAIL : tensorprotoutils.cc:622 GetExtDataFromTensorProto External initializer: visual.transformer.resblocks.31.mlp.c_proj.weight offset: 1251033600 size to read: 13107200 given file_length: 391708810 are out of bounds or can not be read in full.
;

`

@yangapku
Copy link
Member

yangapku commented Feb 9, 2023

您好,看上去我们初步怀疑是ONNX文件读取不完整。请问您按照文档重新准备一次(并且实验时保证下ONNX文件的路径不要有变化),还会出现这个问题吗?

@xiaoFine
Copy link
Author

您好,看上去我们初步怀疑是ONNX文件读取不完整。请问您按照文档重新准备一次(并且实验时保证下ONNX文件的路径不要有变化),还会出现这个问题吗?

重新export了一次并做了md5校验,现在triton server可以正常run起来了!感谢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants