Segmentation fault (core dumped) #1564

LIUKAI0815 · 2024-05-09T03:37:05Z

[TensorRT-LLM] TensorRT-LLM version: 0.10.0.dev2024050700
[TensorRT-LLM][INFO] Engine version 0.10.0.dev2024050700 found in the config file, assuming engine(s) built by new builder API.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'cross_attention' not found
[TensorRT-LLM][WARNING] Optional value for parameter cross_attention will not be set.
[TensorRT-LLM][WARNING] Parameter layer_types cannot be read from json:
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'layer_types' not found
[TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null
[TensorRT-LLM][WARNING] Optional value for parameter kv_cache_quant_algo will not be set.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'num_medusa_heads' not found
[TensorRT-LLM][WARNING] Optional value for parameter num_medusa_heads will not be set.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'max_draft_len' not found
[TensorRT-LLM][WARNING] Optional value for parameter max_draft_len will not be set.
[TensorRT-LLM][INFO] MPI size: 1, rank: 0
[TensorRT-LLM][WARNING] Device 0 peer access Device 1 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 2 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 3 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 4 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 5 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 6 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 7 is not available.
[TensorRT-LLM][INFO] Loaded engine size: 14495 MiB
[TensorRT-LLM][ERROR] 1: [defaultAllocator.cpp::allocate::19] Error Code 1: Cuda Runtime (out of memory)
[TensorRT-LLM][WARNING] Requested amount of GPU memory (13908726432 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[TensorRT-LLM][ERROR] 2: [safeDeserialize.cpp::load::269] Error Code 2: OutOfMemory (no further information)
[01d177e8bded:22501] *** Process received signal ***
[01d177e8bded:22501] Signal: Segmentation fault (11)
[01d177e8bded:22501] Signal code: Address not mapped (1)
[01d177e8bded:22501] Failing at address: 0x8
[01d177e8bded:22501] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f02d1a61420]
[01d177e8bded:22501] [ 1] /opt/conda/envs/tensorRT/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm7runtime11TllmRuntimeC2EPKvmRN8nvinfer17ILoggerE+0x19d)[0x7f01372db78d]
[01d177e8bded:22501] [ 2] /opt/conda/envs/tensorRT/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm7runtime10GptSessionC2ERKNS1_6ConfigERKNS0_11ModelConfigERKNS0_11WorldConfigEPKvmSt10shared_ptrIN8nvinfer17ILoggerEE+0x395)[0x7f0137288d25]
[01d177e8bded:22501] [ 3] /opt/conda/envs/tensorRT/lib/python3.10/site-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0xc5459)[0x7f01ab0f0459]
[01d177e8bded:22501] [ 4] /opt/conda/envs/tensorRT/lib/python3.10/site-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x71b99)[0x7f01ab09cb99]
[01d177e8bded:22501] [ 5] /opt/conda/envs/tensorRT/lib/python3.10/site-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x54a6c)[0x7f01ab07fa6c]
[01d177e8bded:22501] [ 6] python3[0x4fc697]
[01d177e8bded:22501] [ 7] python3(_PyObject_MakeTpCall+0x25b)[0x4f614b]
[01d177e8bded:22501] [ 8] python3[0x50819f]
[01d177e8bded:22501] [ 9] python3(PyVectorcall_Call+0xb9)[0x508bb9]
[01d177e8bded:22501] [10] python3[0x50607f]
[01d177e8bded:22501] [11] python3[0x4f64b6]
[01d177e8bded:22501] [12] /opt/conda/envs/tensorRT/lib/python3.10/site-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x540d9)[0x7f01ab07f0d9]
[01d177e8bded:22501] [13] python3(_PyObject_MakeTpCall+0x25b)[0x4f614b]
[01d177e8bded:22501] [14] python3(_PyEval_EvalFrameDefault+0x5757)[0x4f26f7]
[01d177e8bded:22501] [15] python3[0x507eae]
[01d177e8bded:22501] [16] python3(PyObject_Call+0xb8)[0x508858]
[01d177e8bded:22501] [17] python3(_PyEval_EvalFrameDefault+0x2b79)[0x4efb19]
[01d177e8bded:22501] [18] python3[0x591d92]
[01d177e8bded:22501] [19] python3(PyEval_EvalCode+0x87)[0x591cd7]
[01d177e8bded:22501] [20] python3[0x5c2967]
[01d177e8bded:22501] [21] python3[0x5bdad0]
[01d177e8bded:22501] [22] python3[0x45956b]
[01d177e8bded:22501] [23] python3(_PyRun_SimpleFileObject+0x19f)[0x5b805f]
[01d177e8bded:22501] [24] python3(_PyRun_AnyFileObject+0x43)[0x5b7dc3]
[01d177e8bded:22501] [25] python3(Py_RunMain+0x38d)[0x5b4b7d]
[01d177e8bded:22501] [26] python3(Py_BytesMain+0x39)[0x584e49]
[01d177e8bded:22501] [27] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f02d1725083]
[01d177e8bded:22501] [28] python3[0x584cfe]
[01d177e8bded:22501] *** End of error message ***
Segmentation fault (core dumped)

byshiue · 2024-05-10T08:44:16Z

Please follow the template to share the environment and the reproduced steps.

Tushar-ml · 2024-05-21T16:50:13Z

It is getting OOM ([TensorRT-LLM][ERROR] 2: [safeDeserialize.cpp::load::269] Error Code 2: OutOfMemory (no further information)), could you please share your system or VM configurations

byshiue added the need more info label May 10, 2024

byshiue self-assigned this May 10, 2024

byshiue added the triaged Issue has been triaged by maintainers label May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault (core dumped) #1564

Segmentation fault (core dumped) #1564

LIUKAI0815 commented May 9, 2024

byshiue commented May 10, 2024

Tushar-ml commented May 21, 2024

Segmentation fault (core dumped) #1564

Segmentation fault (core dumped) #1564

Comments

LIUKAI0815 commented May 9, 2024

byshiue commented May 10, 2024

Tushar-ml commented May 21, 2024