[Bug] InternalError: Check failed: nwrite != -1 (-1 vs. -1) : Write Error: Broken pipe #2854

Erxl · 2024-08-25T14:12:31Z

🐛 Bug

To Reproduce

After running the server, wait for a period of time.
model: mistral-large-instruct-2407-q4f16_1
"tensor_parallel_shards": 4,

mlcllm) a@aserver:~$ mlc_llm serve llm/mistral-large-instruct-2407-q4f16_1 --host 192.168.1.4
[2024-08-25 13:59:31] INFO auto_device.py:88: Not found device: cuda:0
[2024-08-25 13:59:33] INFO auto_device.py:79: Found device: rocm:0
[2024-08-25 13:59:33] INFO auto_device.py:79: Found device: rocm:1
[2024-08-25 13:59:33] INFO auto_device.py:79: Found device: rocm:2
[2024-08-25 13:59:33] INFO auto_device.py:79: Found device: rocm:3
[2024-08-25 13:59:33] INFO auto_device.py:79: Found device: rocm:4
[2024-08-25 13:59:33] INFO auto_device.py:79: Found device: rocm:5
[2024-08-25 13:59:33] INFO auto_device.py:79: Found device: rocm:6
[2024-08-25 13:59:33] INFO auto_device.py:79: Found device: rocm:7
[2024-08-25 13:59:34] INFO auto_device.py:88: Not found device: metal:0
[2024-08-25 13:59:36] INFO auto_device.py:79: Found device: vulkan:0
[2024-08-25 13:59:36] INFO auto_device.py:79: Found device: vulkan:1
[2024-08-25 13:59:36] INFO auto_device.py:79: Found device: vulkan:2
[2024-08-25 13:59:36] INFO auto_device.py:79: Found device: vulkan:3
[2024-08-25 13:59:36] INFO auto_device.py:79: Found device: vulkan:4
[2024-08-25 13:59:36] INFO auto_device.py:79: Found device: vulkan:5
[2024-08-25 13:59:36] INFO auto_device.py:79: Found device: vulkan:6
[2024-08-25 13:59:36] INFO auto_device.py:79: Found device: vulkan:7
[2024-08-25 13:59:36] INFO auto_device.py:79: Found device: vulkan:8
[2024-08-25 13:59:38] INFO auto_device.py:88: Not found device: opencl:0
[2024-08-25 13:59:38] INFO auto_device.py:35: Using device: rocm:0
[2024-08-25 13:59:38] INFO jit.py:43: MLC_JIT_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2024-08-25 13:59:38] INFO jit.py:158: Using cached model lib: /home/a/.cache/mlc_llm/model_lib/cfead2d711f56e44c7fd0fa68bddd3bd.so
[2024-08-25 13:59:38] INFO engine_base.py:180: The selected engine mode is local. We choose small max batch size and KV cache capacity to use less GPU memory.
[2024-08-25 13:59:38] INFO engine_base.py:205: If you don't have concurrent requests and only use the engine interactively, please select mode "interactive".
[2024-08-25 13:59:38] INFO engine_base.py:210: If you have high concurrent requests and want to maximize the GPU memory utilization, please select mode "server".
[13:59:41] /workspace/mlc-llm/cpp/serve/config.cc:687: Under mode "local", max batch size will be set to 4, max KV cache token capacity will be set to 8192, prefill chunk size will be set to 2048. 
[13:59:41] /workspace/mlc-llm/cpp/serve/config.cc:687: Under mode "interactive", max batch size will be set to 1, max KV cache token capacity will be set to 41729, prefill chunk size will be set to 2048. 
[13:59:41] /workspace/mlc-llm/cpp/serve/config.cc:687: Under mode "server", max batch size will be set to 80, max KV cache token capacity will be set to 41260, prefill chunk size will be set to 2048. 
[13:59:41] /workspace/mlc-llm/cpp/serve/config.cc:768: The actual engine mode is "local". So max batch size is 4, max KV cache token capacity is 8192, prefill chunk size is 2048.
[13:59:41] /workspace/mlc-llm/cpp/serve/config.cc:773: Estimated total single GPU memory usage: 17995.347 MB (Parameters: 16771.148 MB. KVCache: 778.401 MB. Temporary buffer: 445.798 MB). The actual usage might be slightly larger than the estimated number.
[13:59:41] /workspace/mlc-llm/cpp/multi_gpu/multi_gpu_loader.cc:150: [Worker #0] Loading model to device: rocm:0
[13:59:41] /workspace/mlc-llm/cpp/multi_gpu/multi_gpu_loader.cc:150: [Worker #1] Loading model to device: rocm:1
[13:59:41] /workspace/mlc-llm/cpp/multi_gpu/multi_gpu_loader.cc:150: [Worker #2] Loading model to device: rocm:2
[13:59:41] /workspace/mlc-llm/cpp/multi_gpu/multi_gpu_loader.cc:150: [Worker #3] Loading model to device: rocm:3
[13:59:41] /workspace/mlc-llm/cpp/multi_gpu/multi_gpu_loader.cc:175: Loading parameters...
[==================================================================================================>]  [885/885]
[14:01:06] /workspace/mlc-llm/cpp/multi_gpu/multi_gpu_loader.cc:203: Loading done. Time used: Loading 76.568 s Preprocessing 8.240 s.
INFO:     Started server process [15112]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     192.168.1.9:55521 - "OPTIONS /v1/chat/completions HTTP/1.1" 200 OK
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/home/a/miniconda3/envs/mlcllm/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
    self.run()
  File "/home/a/miniconda3/envs/mlcllm/lib/python3.11/threading.py", line 982, in run
    self._target(*self._args, **self._kwargs)
  File "tvm/_ffi/_cython/./packed_func.pxi", line 339, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 270, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./packed_func.pxi", line 259, in tvm._ffi._cy3.core.FuncCall3
  File "tvm/_ffi/_cython/./base.pxi", line 185, in tvm._ffi._cy3.core.CHECK_CALL
  File "/home/a/miniconda3/envs/mlcllm/lib/python3.11/site-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
    raise py_err
  File "/workspace/mlc-llm/cpp/serve/threaded_engine.cc", line 182, in mlc::llm::serve::ThreadedEngineImpl::RunBackgroundLoop()
  File "/workspace/mlc-llm/cpp/serve/engine.cc", line 650, in mlc::llm::serve::EngineImpl::Step()
  File "/workspace/mlc-llm/cpp/serve/engine_actions/new_request_prefill.cc", line 45, in mlc::llm::serve::NewRequestPrefillActionObj::Step(mlc::llm::serve::EngineState)
  File "/workspace/mlc-llm/cpp/serve/engine_actions/new_request_prefill.cc", line 301, in mlc::llm::serve::NewRequestPrefillActionObj::MatchPrefixCache(mlc::llm::serve::EngineState, mlc::llm::serve::BatchPrefillBaseActionObj::PrefillInput*)
  File "/workspace/mlc-llm/cpp/serve/model.cc", line 642, in mlc::llm::serve::ModelImpl::AddNewSequence(long)
  File "/workspace/mlc-llm/cpp/serve/function_table.cc", line 68, in operator()
tvm.error.InternalError: Traceback (most recent call last):
  9: mlc::llm::serve::ThreadedEngineImpl::RunBackgroundLoop()
        at /workspace/mlc-llm/cpp/serve/threaded_engine.cc:182
  8: mlc::llm::serve::EngineImpl::Step()
        at /workspace/mlc-llm/cpp/serve/engine.cc:650
  7: mlc::llm::serve::NewRequestPrefillActionObj::Step(mlc::llm::serve::EngineState)
        at /workspace/mlc-llm/cpp/serve/engine_actions/new_request_prefill.cc:45
  6: mlc::llm::serve::NewRequestPrefillActionObj::MatchPrefixCache(mlc::llm::serve::EngineState, mlc::llm::serve::BatchPrefillBaseActionObj::PrefillInput*)
        at /workspace/mlc-llm/cpp/serve/engine_actions/new_request_prefill.cc:301
  5: mlc::llm::serve::ModelImpl::AddNewSequence(long)
        at /workspace/mlc-llm/cpp/serve/model.cc:642
  4: operator()
        at /workspace/mlc-llm/cpp/serve/function_table.cc:68
  3: tvm::runtime::BcastSessionObj::CallWithPacked(tvm::runtime::TVMArgs const&)
  2: tvm::runtime::ProcessSessionObj::BroadcastPacked(tvm::runtime::TVMArgs const&)
  1: tvm::support::Pipe::Write(void const*, unsigned long)
  0: _ZN3tvm7runtime6deta
  File "/workspace/tvm/src/runtime/disco/../../support/pipe.h", line 129
InternalError: Check failed: nwrite != -1 (-1 vs. -1) : Write Error: Broken pipe

Expected behavior

Environment

Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA):rocm6.2
Operating system (e.g. Ubuntu/Windows/MacOS/...):ubuntu22.04
Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...) 7900xtx
How you installed MLC-LLM (conda, source):python pre-built package
How you installed TVM-Unity (pip, source):
Python version (e.g. 3.10):3.11
GPU driver version (if applicable):
CUDA/cuDNN version (if applicable):
TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models):
Any other relevant information:

The text was updated successfully, but these errors were encountered:

MasterJH5574 · 2024-09-01T20:23:15Z

On our side such “broken pipe” error sometimes happens but rather rarely. On one hand we are working on finding the cause, and on the other hand you can kill the processes and rerun the server.

Erxl added the bug Confirmed bugs label Aug 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] InternalError: Check failed: nwrite != -1 (-1 vs. -1) : Write Error: Broken pipe #2854

[Bug] InternalError: Check failed: nwrite != -1 (-1 vs. -1) : Write Error: Broken pipe #2854

Erxl commented Aug 25, 2024 •

edited

Loading

MasterJH5574 commented Sep 1, 2024

[Bug] InternalError: Check failed: nwrite != -1 (-1 vs. -1) : Write Error: Broken pipe #2854

[Bug] InternalError: Check failed: nwrite != -1 (-1 vs. -1) : Write Error: Broken pipe #2854

Comments

Erxl commented Aug 25, 2024 • edited Loading

🐛 Bug

To Reproduce

Expected behavior

Environment

MasterJH5574 commented Sep 1, 2024

Erxl commented Aug 25, 2024 •

edited

Loading