Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.
This repository was archived by the owner on Jul 4, 2025. It is now read-only.

bug: TensorRT-LLM error #1315

@mafischer

Description

@mafischer

Cortex version

0.5.1-rc2

Describe the Bug

cortex-beta run openhermes-2.5-7b-tensorrt-llm-linux-ada fails with logs below.

Steps to Reproduce

  1. cortex-beta run openhermes-2.5-7b-tensorrt-llm-linux-ada

Screenshots / Logs

20240923 19:58:25.229834 UTC 8237 DEBUG [LoadModel] Reset all resources and states before loading new model - tensorrt-llm_engine.cc:380
20240923 19:58:25.229878 UTC 8237 INFO Reset all resources and states - tensorrt-llm_engine.cc:616
20240923 19:58:25.229884 UTC 8237 DEBUG [LoadModel] n_parallel: 1, batch_size: 16 - tensorrt-llm_engine.cc:388
[TensorRT-LLM][INFO] Set logger level by INFO
20240923 19:58:25.276219 UTC 8237 INFO Successully loaded the tokenizer - tensorrt-llm_engine.h:105
20240923 19:58:25.276238 UTC 8237 INFO Loaded tokenizer from /home/ubuntu/cortexcpp-beta/models/openhermes-2.5-7b-tensorrt-llm-linux-ada/tokenizer.model - tensorrt-llm_engine.cc:414
[TensorRT-LLM][INFO] Engine version 0.11.0 found in the config file, assuming engine(s) built by new builder API.
[TensorRT-LLM][INFO] Parameter layer_types cannot be read from json:
[TensorRT-LLM][INFO] [json.exception.out_of_range.403] key 'layer_types' not found
[TensorRT-LLM][INFO] Parameter has_position_embedding cannot be read from json:
[TensorRT-LLM][INFO] [json.exception.out_of_range.403] key 'has_position_embedding' not found
[TensorRT-LLM][INFO] Parameter has_token_type_embedding cannot be read from json:
[TensorRT-LLM][INFO] [json.exception.out_of_range.403] key 'has_token_type_embedding' not found
[TensorRT-LLM][INFO] [json.exception.type_error.302] type must be string, but is null
[TensorRT-LLM][INFO] Optional value for parameter kv_cache_quant_algo will not be set.
[TensorRT-LLM][INFO] Initializing MPI with thread mode 1
[TensorRT-LLM][INFO] Initialized MPI
[TensorRT-LLM][INFO] MPI size: 1, MPI local size: 1, rank: 0
20240923 19:58:25.765816 UTC 8237 INFO Loaded config from /home/ubuntu/cortexcpp-beta/models/openhermes-2.5-7b-tensorrt-llm-linux-ada/config.json - tensorrt-llm_engine.cc:421
[TensorRT-LLM][INFO] Engine version 0.11.0 found in the config file, assuming engine(s) built by new builder API.
[TensorRT-LLM][INFO] Parameter layer_types cannot be read from json:
[TensorRT-LLM][INFO] [json.exception.out_of_range.403] key 'layer_types' not found
[TensorRT-LLM][INFO] Parameter has_position_embedding cannot be read from json:
[TensorRT-LLM][INFO] [json.exception.out_of_range.403] key 'has_position_embedding' not found
[TensorRT-LLM][INFO] Parameter has_token_type_embedding cannot be read from json:
[TensorRT-LLM][INFO] [json.exception.out_of_range.403] key 'has_token_type_embedding' not found
[TensorRT-LLM][INFO] [json.exception.type_error.302] type must be string, but is null
[TensorRT-LLM][INFO] Optional value for parameter kv_cache_quant_algo will not be set.
[TensorRT-LLM][INFO] MPI size: 1, MPI local size: 1, rank: 0
[TensorRT-LLM][INFO] Rank 0 is using GPU 0
[TensorRT-LLM][INFO] TRTGptModel maxNumSequences: 16
[TensorRT-LLM][INFO] TRTGptModel maxBatchSize: 16
[TensorRT-LLM][INFO] TRTGptModel maxBeamWidth: 1
[TensorRT-LLM][INFO] TRTGptModel maxSequenceLen: 32768
[TensorRT-LLM][INFO] TRTGptModel maxDraftLen: 0
[TensorRT-LLM][INFO] TRTGptModel mMaxAttentionWindowSize: 32768
[TensorRT-LLM][INFO] TRTGptModel computeContextLogits: 0
[TensorRT-LLM][INFO] TRTGptModel computeGenerationLogits: 0
[TensorRT-LLM][INFO] TRTGptModel enableTrtOverlap: 0
[TensorRT-LLM][INFO] TRTGptModel normalizeLogProbs: 1
[TensorRT-LLM][INFO] TRTGptModel maxNumTokens: 8192
[TensorRT-LLM][INFO] TRTGptModel maxInputLen: 8192 = min(maxSequenceLen - 1, maxNumTokens) since context FMHA and usePackedInput are enabled
[TensorRT-LLM][INFO] Capacity Scheduler Policy: GUARANTEED_NO_EVICT
[TensorRT-LLM][INFO] Context Chunking Scheduler Policy: None
20240923 19:58:25.890898 UTC 8237 ERROR Failed to load model: [TensorRT-LLM][ERROR] CUDA runtime error in cudaDeviceGetDefaultMemPool(&memPool, device): operation not supported (/home/runner/actions-runner/_work/cortex.tensorrt-llm/cortex.tensorrt-llm/cpp/tensorrt_llm/runtime/bufferManager.cpp:211)
1 0x7f9ee02bfdb3 void tensorrt_llm::common::check(cudaError, char const*, char const*, int) + 147
2 0x7f9e19d56fe4 tensorrt_llm::runtime::BufferManager::initMemoryPool(int) + 148
3 0x7f9e19d58e5f tensorrt_llm::runtime::BufferManager::BufferManager(std::shared_ptr<tensorrt_llm::runtime::CudaStream>, bool) + 431
4 0x7f9e19e38593 tensorrt_llm::runtime::TllmRuntime::TllmRuntime(tensorrt_llm::runtime::RawEngine const&, nvinfer1::ILogger*, float, bool) + 451
5 0x7f9e1a086579 tensorrt_llm::batch_manager::TrtGptModelInflightBatching::TrtGptModelInflightBatching(std::shared_ptrnvinfer1::ILogger, tensorrt_llm::runtime::ModelConfig const&, tensorrt_llm::runtime::WorldConfig const&, tensorrt_llm::runtime::RawEngine const&, bool, tensorrt_llm::batch_manager::TrtGptModelOptionalParams const&) + 937
6 0x7f9e1a0a965b tensorrt_llm::executor::Executor::Impl::createModel(tensorrt_llm::runtime::RawEngine const&, tensorrt_llm::runtime::ModelConfig const&, tensorrt_llm::runtime::WorldConfig const&, tensorrt_llm::executor::ExecutorConfig const&) + 443
7 0x7f9e1a0aee60 tensorrt_llm::executor::Executor::Impl::loadModel(std::optionalstd::filesystem::__cxx11::path const&, std::optional<std::vector<unsigned char, std::allocator > > const&, tensorrt_llm::runtime::GptJsonConfig const&, tensorrt_llm::executor::ExecutorConfig const&, bool) + 1408
8 0x7f9e1a0aff6a tensorrt_llm::executor::Executor::Impl::Impl(std::filesystem::__cxx11::path const&, std::optionalstd::filesystem::__cxx11::path const&, tensorrt_llm::executor::ModelType, tensorrt_llm::executor::ExecutorConfig const&) + 1978
9 0x7f9e1a0a4cee tensorrt_llm::executor::Executor::Executor(std::filesystem::__cxx11::path const&, tensorrt_llm::executor::ModelType, tensorrt_llm::executor::ExecutorConfig const&) + 62
10 0x7f9ee02bdcef tensorrtllm::TensorrtllmEngine::LoadModel(std::shared_ptrJson::Value, std::function<void (Json::Value&&, Json::Value&&)>&&) + 3983
11 0x55b453c9b0a6 cortex-beta(+0x2560a6) [0x55b453c9b0a6]
12 0x55b453cabe5c cortex-beta(+0x266e5c) [0x55b453cabe5c]
13 0x55b453cabaef cortex-beta(+0x266aef) [0x55b453cabaef]
14 0x55b453cab858 cortex-beta(+0x266858) [0x55b453cab858]
15 0x55b45428f7d1 cortex-beta(+0x84a7d1) [0x55b45428f7d1]
16 0x55b4541fd9f2 cortex-beta(+0x7b89f2) [0x55b4541fd9f2]
17 0x55b45420b863 cortex-beta(+0x7c6863) [0x55b45420b863]
18 0x55b454209c30 cortex-beta(+0x7c4c30) [0x55b454209c30]
19 0x55b454207b1b cortex-beta(+0x7c2b1b) [0x55b454207b1b]
20 0x55b4541fd28a cortex-beta(+0x7b828a) [0x55b4541fd28a]
21 0x55b4541fcfd9 cortex-beta(+0x7b7fd9) [0x55b4541fcfd9]
22 0x55b4541fc619 cortex-beta(+0x7b7619) [0x55b4541fc619]
23 0x55b4541fbcae cortex-beta(+0x7b6cae) [0x55b4541fbcae]
24 0x55b45420c952 cortex-beta(+0x7c7952) [0x55b45420c952]
25 0x55b45420b024 cortex-beta(+0x7c6024) [0x55b45420b024]
26 0x55b45420911a cortex-beta(+0x7c411a) [0x55b45420911a]
27 0x55b4548cc2ef cortex-beta(+0xe872ef) [0x55b4548cc2ef]
28 0x55b4548be629 cortex-beta(+0xe79629) [0x55b4548be629]
29 0x55b4548bd12d cortex-beta(+0xe7812d) [0x55b4548bd12d]
30 0x55b4548c838c cortex-beta(+0xe8338c) [0x55b4548c838c]
31 0x55b4548c655a cortex-beta(+0xe8155a) [0x55b4548c655a]
32 0x55b4548c51ef cortex-beta(+0xe801ef) [0x55b4548c51ef]
33 0x55b453bbd11a cortex-beta(+0x17811a) [0x55b453bbd11a]
34 0x55b4548b6969 cortex-beta(+0xe71969) [0x55b4548b6969]
35 0x55b4548b6830 cortex-beta(+0xe71830) [0x55b4548b6830]
36 0x55b45489754f cortex-beta(+0xe5254f) [0x55b45489754f]
37 0x55b45489af28 cortex-beta(+0xe55f28) [0x55b45489af28]
38 0x55b45489a9eb cortex-beta(+0xe559eb) [0x55b45489a9eb]
39 0x55b45489baba cortex-beta(+0xe56aba) [0x55b45489baba]
40 0x55b45489ba7d cortex-beta(+0xe56a7d) [0x55b45489ba7d]
41 0x55b45489ba2a cortex-beta(+0xe56a2a) [0x55b45489ba2a]
42 0x55b45489b9fe cortex-beta(+0xe569fe) [0x55b45489b9fe]
43 0x55b45489b9e2 cortex-beta(+0xe569e2) [0x55b45489b9e2]
44 0x7f9ee50f5253 /lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f9ee50f5253]
45 0x7f9ee4d7bac3 /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f9ee4d7bac3]
46 0x7f9ee4e0d850 /lib/x86_64-linux-gnu/libc.so.6(+0x126850) [0x7f9ee4e0d850] - tensorrt-llm_engine.cc:439
[TensorRT-LLM][INFO] Engine version 0.11.0 found in the config file, assuming engine(s) built by new builder API.
[TensorRT-LLM][INFO] Parameter layer_types cannot be read from json:
[TensorRT-LLM][INFO] [json.exception.out_of_range.403] key 'layer_types' not found
[TensorRT-LLM][INFO] Parameter has_position_embedding cannot be read from json:
[TensorRT-LLM][INFO] [json.exception.out_of_range.403] key 'has_position_embedding' not found
[TensorRT-LLM][INFO] Parameter has_token_type_embedding cannot be read from json:
[TensorRT-LLM][INFO] [json.exception.out_of_range.403] key 'has_token_type_embedding' not found
[TensorRT-LLM][INFO] [json.exception.type_error.302] type must be string, but is null
[TensorRT-LLM][INFO] Optional value for parameter kv_cache_quant_algo will not be set.
[TensorRT-LLM][INFO] MPI size: 1, MPI local size: 1, rank: 0
[TensorRT-LLM][INFO] Rank 0 is using GPU 0
[TensorRT-LLM][INFO] TRTGptModel maxNumSequences: 16
[TensorRT-LLM][INFO] TRTGptModel maxBatchSize: 16
[TensorRT-LLM][INFO] TRTGptModel maxBeamWidth: 1
[TensorRT-LLM][INFO] TRTGptModel maxSequenceLen: 32768
[TensorRT-LLM][INFO] TRTGptModel maxDraftLen: 0
[TensorRT-LLM][INFO] TRTGptModel mMaxAttentionWindowSize: 32768
[TensorRT-LLM][INFO] TRTGptModel computeContextLogits: 0
[TensorRT-LLM][INFO] TRTGptModel computeGenerationLogits: 0
[TensorRT-LLM][INFO] TRTGptModel enableTrtOverlap: 0
[TensorRT-LLM][INFO] TRTGptModel normalizeLogProbs: 1
[TensorRT-LLM][INFO] TRTGptModel maxNumTokens: 8192
[TensorRT-LLM][INFO] TRTGptModel maxInputLen: 8192 = min(maxSequenceLen - 1, maxNumTokens) since context FMHA and usePackedInput are enabled
[TensorRT-LLM][INFO] Capacity Scheduler Policy: GUARANTEED_NO_EVICT
[TensorRT-LLM][INFO] Context Chunking Scheduler Policy: None
[TensorRT-LLM][INFO] Loaded engine size: 3958 MiB
[TensorRT-LLM][ERROR] Error Code: 6: The engine plan file is generated on an incompatible device, expecting compute 5.2 got compute 8.9, please rebuild.
[TensorRT-LLM][ERROR] [engine.cpp::deserializeEngine::1233] Error Code 2: Internal Error (Assertion engine->deserialize(start, size, allocator, runtime) failed. )
terminate called after throwing an instance of 'tensorrt_llm::common::TllmException'
what(): [TensorRT-LLM][ERROR] CUDA runtime error in cudaDeviceGetDefaultMemPool(&memPool, device): operation not supported (/home/runner/actions-runner/_work/cortex.tensorrt-llm/cortex.tensorrt-llm/cpp/tensorrt_llm/runtime/bufferManager.cpp:258)
1 0x7f9ee02bfdb3 void tensorrt_llm::common::check(cudaError, char const*, char const*, int) + 147
2 0x7f9e19d56eb9 tensorrt_llm::runtime::BufferManager::memoryPoolTrimTo(int, unsigned long) + 73
3 0x7f9e180ac791 /home/ubuntu/cortexcpp-beta/engines/cortex.tensorrt-llm/libtensorrt_llm.so(+0x73a791) [0x7f9e180ac791]
4 0x7f9e1a086579 tensorrt_llm::batch_manager::TrtGptModelInflightBatching::TrtGptModelInflightBatching(std::shared_ptrnvinfer1::ILogger, tensorrt_llm::runtime::ModelConfig const&, tensorrt_llm::runtime::WorldConfig const&, tensorrt_llm::runtime::RawEngine const&, bool, tensorrt_llm::batch_manager::TrtGptModelOptionalParams const&) + 937
5 0x7f9e1a0a965b tensorrt_llm::executor::Executor::Impl::createModel(tensorrt_llm::runtime::RawEngine const&, tensorrt_llm::runtime::ModelConfig const&, tensorrt_llm::runtime::WorldConfig const&, tensorrt_llm::executor::ExecutorConfig const&) + 443
6 0x7f9e1a0aee60 tensorrt_llm::executor::Executor::Impl::loadModel(std::optionalstd::filesystem::__cxx11::path const&, std::optional<std::vector<unsigned char, std::allocator > > const&, tensorrt_llm::runtime::GptJsonConfig const&, tensorrt_llm::executor::ExecutorConfig const&, bool) + 1408
7 0x7f9e1a0aff6a tensorrt_llm::executor::Executor::Impl::Impl(std::filesystem::__cxx11::path const&, std::optionalstd::filesystem::__cxx11::path const&, tensorrt_llm::executor::ModelType, tensorrt_llm::executor::ExecutorConfig const&) + 1978
8 0x7f9e1a0a4cee tensorrt_llm::executor::Executor::Executor(std::filesystem::__cxx11::path const&, tensorrt_llm::executor::ModelType, tensorrt_llm::executor::ExecutorConfig const&) + 62
9 0x7f9ee02c147a std::_MakeUniq<tensorrt_llm::executor::Executor>::__single_object std::make_unique<tensorrt_llm::executor::Executor, std::__cxx11::basic_string<char, std::char_traits, std::allocator >&, tensorrt_llm::executor::ModelType, tensorrt_llm::executor::ExecutorConfig&>(std::__cxx11::basic_string<char, std::char_traits, std::allocator >&, tensorrt_llm::executor::ModelType&&, tensorrt_llm::executor::ExecutorConfig&) + 138
10 0x7f9ee02a9f57 /home/ubuntu/cortexcpp-beta/engines/cortex.tensorrt-llm/libengine.so(+0x88f57) [0x7f9ee02a9f57]
11 0x55b453c9b0a6 cortex-beta(+0x2560a6) [0x55b453c9b0a6]
12 0x55b453cabe5c cortex-beta(+0x266e5c) [0x55b453cabe5c]
13 0x55b453cabaef cortex-beta(+0x266aef) [0x55b453cabaef]
14 0x55b453cab858 cortex-beta(+0x266858) [0x55b453cab858]
15 0x55b45428f7d1 cortex-beta(+0x84a7d1) [0x55b45428f7d1]
16 0x55b4541fd9f2 cortex-beta(+0x7b89f2) [0x55b4541fd9f2]
17 0x55b45420b863 cortex-beta(+0x7c6863) [0x55b45420b863]
18 0x55b454209c30 cortex-beta(+0x7c4c30) [0x55b454209c30]
19 0x55b454207b1b cortex-beta(+0x7c2b1b) [0x55b454207b1b]
20 0x55b4541fd28a cortex-beta(+0x7b828a) [0x55b4541fd28a]
21 0x55b4541fcfd9 cortex-beta(+0x7b7fd9) [0x55b4541fcfd9]
22 0x55b4541fc619 cortex-beta(+0x7b7619) [0x55b4541fc619]
23 0x55b4541fbcae cortex-beta(+0x7b6cae) [0x55b4541fbcae]
24 0x55b45420c952 cortex-beta(+0x7c7952) [0x55b45420c952]
25 0x55b45420b024 cortex-beta(+0x7c6024) [0x55b45420b024]
26 0x55b45420911a cortex-beta(+0x7c411a) [0x55b45420911a]
27 0x55b4548cc2ef cortex-beta(+0xe872ef) [0x55b4548cc2ef]
28 0x55b4548be629 cortex-beta(+0xe79629) [0x55b4548be629]
29 0x55b4548bd12d cortex-beta(+0xe7812d) [0x55b4548bd12d]
30 0x55b4548c838c cortex-beta(+0xe8338c) [0x55b4548c838c]
31 0x55b4548c655a cortex-beta(+0xe8155a) [0x55b4548c655a]
32 0x55b4548c51ef cortex-beta(+0xe801ef) [0x55b4548c51ef]
33 0x55b453bbd11a cortex-beta(+0x17811a) [0x55b453bbd11a]
34 0x55b4548b6969 cortex-beta(+0xe71969) [0x55b4548b6969]
35 0x55b4548b6830 cortex-beta(+0xe71830) [0x55b4548b6830]
36 0x55b45489754f cortex-beta(+0xe5254f) [0x55b45489754f]
37 0x55b45489af28 cortex-beta(+0xe55f28) [0x55b45489af28]
38 0x55b45489a9eb cortex-beta(+0xe559eb) [0x55b45489a9eb]
39 0x55b45489baba cortex-beta(+0xe56aba) [0x55b45489baba]
40 0x55b45489ba7d cortex-beta(+0xe56a7d) [0x55b45489ba7d]
41 0x55b45489ba2a cortex-beta(+0xe56a2a) [0x55b45489ba2a]
42 0x55b45489b9fe cortex-beta(+0xe569fe) [0x55b45489b9fe]
43 0x55b45489b9e2 cortex-beta(+0xe569e2) [0x55b45489b9e2]
44 0x7f9ee50f5253 /lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f9ee50f5253]
45 0x7f9ee4d7bac3 /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f9ee4d7bac3]
46 0x7f9ee4e0d850 /lib/x86_64-linux-gnu/libc.so.6(+0x126850) [0x7f9ee4e0d850]

MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.

HTTP error: Failed to read connection

What is your OS?

  • MacOS
  • Windows
  • Linux

What engine are you running?

  • cortex.llamacpp (default)
  • cortex.tensorrt-llm (Nvidia GPUs)
  • cortex.onnx (NPUs, DirectML)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions