Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault (core dumped) #1564

Open
LIUKAI0815 opened this issue May 9, 2024 · 2 comments
Open

Segmentation fault (core dumped) #1564

LIUKAI0815 opened this issue May 9, 2024 · 2 comments
Assignees
Labels
need more info triaged Issue has been triaged by maintainers

Comments

@LIUKAI0815
Copy link

[TensorRT-LLM] TensorRT-LLM version: 0.10.0.dev2024050700
[TensorRT-LLM][INFO] Engine version 0.10.0.dev2024050700 found in the config file, assuming engine(s) built by new builder API.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'cross_attention' not found
[TensorRT-LLM][WARNING] Optional value for parameter cross_attention will not be set.
[TensorRT-LLM][WARNING] Parameter layer_types cannot be read from json:
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'layer_types' not found
[TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null
[TensorRT-LLM][WARNING] Optional value for parameter kv_cache_quant_algo will not be set.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'num_medusa_heads' not found
[TensorRT-LLM][WARNING] Optional value for parameter num_medusa_heads will not be set.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'max_draft_len' not found
[TensorRT-LLM][WARNING] Optional value for parameter max_draft_len will not be set.
[TensorRT-LLM][INFO] MPI size: 1, rank: 0
[TensorRT-LLM][WARNING] Device 0 peer access Device 1 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 2 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 3 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 4 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 5 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 6 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 7 is not available.
[TensorRT-LLM][INFO] Loaded engine size: 14495 MiB
[TensorRT-LLM][ERROR] 1: [defaultAllocator.cpp::allocate::19] Error Code 1: Cuda Runtime (out of memory)
[TensorRT-LLM][WARNING] Requested amount of GPU memory (13908726432 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[TensorRT-LLM][ERROR] 2: [safeDeserialize.cpp::load::269] Error Code 2: OutOfMemory (no further information)
[01d177e8bded:22501] *** Process received signal ***
[01d177e8bded:22501] Signal: Segmentation fault (11)
[01d177e8bded:22501] Signal code: Address not mapped (1)
[01d177e8bded:22501] Failing at address: 0x8
[01d177e8bded:22501] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f02d1a61420]
[01d177e8bded:22501] [ 1] /opt/conda/envs/tensorRT/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm7runtime11TllmRuntimeC2EPKvmRN8nvinfer17ILoggerE+0x19d)[0x7f01372db78d]
[01d177e8bded:22501] [ 2] /opt/conda/envs/tensorRT/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm7runtime10GptSessionC2ERKNS1_6ConfigERKNS0_11ModelConfigERKNS0_11WorldConfigEPKvmSt10shared_ptrIN8nvinfer17ILoggerEE+0x395)[0x7f0137288d25]
[01d177e8bded:22501] [ 3] /opt/conda/envs/tensorRT/lib/python3.10/site-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0xc5459)[0x7f01ab0f0459]
[01d177e8bded:22501] [ 4] /opt/conda/envs/tensorRT/lib/python3.10/site-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x71b99)[0x7f01ab09cb99]
[01d177e8bded:22501] [ 5] /opt/conda/envs/tensorRT/lib/python3.10/site-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x54a6c)[0x7f01ab07fa6c]
[01d177e8bded:22501] [ 6] python3[0x4fc697]
[01d177e8bded:22501] [ 7] python3(_PyObject_MakeTpCall+0x25b)[0x4f614b]
[01d177e8bded:22501] [ 8] python3[0x50819f]
[01d177e8bded:22501] [ 9] python3(PyVectorcall_Call+0xb9)[0x508bb9]
[01d177e8bded:22501] [10] python3[0x50607f]
[01d177e8bded:22501] [11] python3[0x4f64b6]
[01d177e8bded:22501] [12] /opt/conda/envs/tensorRT/lib/python3.10/site-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x540d9)[0x7f01ab07f0d9]
[01d177e8bded:22501] [13] python3(_PyObject_MakeTpCall+0x25b)[0x4f614b]
[01d177e8bded:22501] [14] python3(_PyEval_EvalFrameDefault+0x5757)[0x4f26f7]
[01d177e8bded:22501] [15] python3[0x507eae]
[01d177e8bded:22501] [16] python3(PyObject_Call+0xb8)[0x508858]
[01d177e8bded:22501] [17] python3(_PyEval_EvalFrameDefault+0x2b79)[0x4efb19]
[01d177e8bded:22501] [18] python3[0x591d92]
[01d177e8bded:22501] [19] python3(PyEval_EvalCode+0x87)[0x591cd7]
[01d177e8bded:22501] [20] python3[0x5c2967]
[01d177e8bded:22501] [21] python3[0x5bdad0]
[01d177e8bded:22501] [22] python3[0x45956b]
[01d177e8bded:22501] [23] python3(_PyRun_SimpleFileObject+0x19f)[0x5b805f]
[01d177e8bded:22501] [24] python3(_PyRun_AnyFileObject+0x43)[0x5b7dc3]
[01d177e8bded:22501] [25] python3(Py_RunMain+0x38d)[0x5b4b7d]
[01d177e8bded:22501] [26] python3(Py_BytesMain+0x39)[0x584e49]
[01d177e8bded:22501] [27] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f02d1725083]
[01d177e8bded:22501] [28] python3[0x584cfe]
[01d177e8bded:22501] *** End of error message ***
Segmentation fault (core dumped)

@byshiue
Copy link
Collaborator

byshiue commented May 10, 2024

Please follow the template to share the environment and the reproduced steps.

@byshiue byshiue self-assigned this May 10, 2024
@byshiue byshiue added the triaged Issue has been triaged by maintainers label May 10, 2024
@Tushar-ml
Copy link
Contributor

It is getting OOM ([TensorRT-LLM][ERROR] 2: [safeDeserialize.cpp::load::269] Error Code 2: OutOfMemory (no further information)), could you please share your system or VM configurations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need more info triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants