Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] InternalError: Check failed: nwrite != -1 (-1 vs. -1) : Write Error: Broken pipe #2854

Open
Erxl opened this issue Aug 25, 2024 · 1 comment
Labels
bug Confirmed bugs

Comments

@Erxl
Copy link

Erxl commented Aug 25, 2024

🐛 Bug

To Reproduce

  • After running the server, wait for a period of time.
  • model: mistral-large-instruct-2407-q4f16_1
  • "tensor_parallel_shards": 4,
mlcllm) a@aserver:~$ mlc_llm serve llm/mistral-large-instruct-2407-q4f16_1 --host 192.168.1.4
[2024-08-25 13:59:31] INFO auto_device.py:88: Not found device: cuda:0
[2024-08-25 13:59:33] INFO auto_device.py:79: Found device: rocm:0
[2024-08-25 13:59:33] INFO auto_device.py:79: Found device: rocm:1
[2024-08-25 13:59:33] INFO auto_device.py:79: Found device: rocm:2
[2024-08-25 13:59:33] INFO auto_device.py:79: Found device: rocm:3
[2024-08-25 13:59:33] INFO auto_device.py:79: Found device: rocm:4
[2024-08-25 13:59:33] INFO auto_device.py:79: Found device: rocm:5
[2024-08-25 13:59:33] INFO auto_device.py:79: Found device: rocm:6
[2024-08-25 13:59:33] INFO auto_device.py:79: Found device: rocm:7
[2024-08-25 13:59:34] INFO auto_device.py:88: Not found device: metal:0
[2024-08-25 13:59:36] INFO auto_device.py:79: Found device: vulkan:0
[2024-08-25 13:59:36] INFO auto_device.py:79: Found device: vulkan:1
[2024-08-25 13:59:36] INFO auto_device.py:79: Found device: vulkan:2
[2024-08-25 13:59:36] INFO auto_device.py:79: Found device: vulkan:3
[2024-08-25 13:59:36] INFO auto_device.py:79: Found device: vulkan:4
[2024-08-25 13:59:36] INFO auto_device.py:79: Found device: vulkan:5
[2024-08-25 13:59:36] INFO auto_device.py:79: Found device: vulkan:6
[2024-08-25 13:59:36] INFO auto_device.py:79: Found device: vulkan:7
[2024-08-25 13:59:36] INFO auto_device.py:79: Found device: vulkan:8
[2024-08-25 13:59:38] INFO auto_device.py:88: Not found device: opencl:0
[2024-08-25 13:59:38] INFO auto_device.py:35: Using device: rocm:0
[2024-08-25 13:59:38] INFO jit.py:43: MLC_JIT_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2024-08-25 13:59:38] INFO jit.py:158: Using cached model lib: /home/a/.cache/mlc_llm/model_lib/cfead2d711f56e44c7fd0fa68bddd3bd.so
[2024-08-25 13:59:38] INFO engine_base.py:180: The selected engine mode is local. We choose small max batch size and KV cache capacity to use less GPU memory.
[2024-08-25 13:59:38] INFO engine_base.py:205: If you don't have concurrent requests and only use the engine interactively, please select mode "interactive".
[2024-08-25 13:59:38] INFO engine_base.py:210: If you have high concurrent requests and want to maximize the GPU memory utilization, please select mode "server".
[13:59:41] /workspace/mlc-llm/cpp/serve/config.cc:687: Under mode "local", max batch size will be set to 4, max KV cache token capacity will be set to 8192, prefill chunk size will be set to 2048. 
[13:59:41] /workspace/mlc-llm/cpp/serve/config.cc:687: Under mode "interactive", max batch size will be set to 1, max KV cache token capacity will be set to 41729, prefill chunk size will be set to 2048. 
[13:59:41] /workspace/mlc-llm/cpp/serve/config.cc:687: Under mode "server", max batch size will be set to 80, max KV cache token capacity will be set to 41260, prefill chunk size will be set to 2048. 
[13:59:41] /workspace/mlc-llm/cpp/serve/config.cc:768: The actual engine mode is "local". So max batch size is 4, max KV cache token capacity is 8192, prefill chunk size is 2048.
[13:59:41] /workspace/mlc-llm/cpp/serve/config.cc:773: Estimated total single GPU memory usage: 17995.347 MB (Parameters: 16771.148 MB. KVCache: 778.401 MB. Temporary buffer: 445.798 MB). The actual usage might be slightly larger than the estimated number.
[13:59:41] /workspace/mlc-llm/cpp/multi_gpu/multi_gpu_loader.cc:150: [Worker #0] Loading model to device: rocm:0
[13:59:41] /workspace/mlc-llm/cpp/multi_gpu/multi_gpu_loader.cc:150: [Worker #1] Loading model to device: rocm:1
[13:59:41] /workspace/mlc-llm/cpp/multi_gpu/multi_gpu_loader.cc:150: [Worker #2] Loading model to device: rocm:2
[13:59:41] /workspace/mlc-llm/cpp/multi_gpu/multi_gpu_loader.cc:150: [Worker #3] Loading model to device: rocm:3
[13:59:41] /workspace/mlc-llm/cpp/multi_gpu/multi_gpu_loader.cc:175: Loading parameters...
[==================================================================================================>]  [885/885]
[14:01:06] /workspace/mlc-llm/cpp/multi_gpu/multi_gpu_loader.cc:203: Loading done. Time used: Loading 76.568 s Preprocessing 8.240 s.
INFO:     Started server process [15112]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     192.168.1.9:55521 - "OPTIONS /v1/chat/completions HTTP/1.1" 200 OK
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/home/a/miniconda3/envs/mlcllm/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
    self.run()
  File "/home/a/miniconda3/envs/mlcllm/lib/python3.11/threading.py", line 982, in run
    self._target(*self._args, **self._kwargs)
  File "tvm/_ffi/_cython/./packed_func.pxi", line 339, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 270, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./packed_func.pxi", line 259, in tvm._ffi._cy3.core.FuncCall3
  File "tvm/_ffi/_cython/./base.pxi", line 185, in tvm._ffi._cy3.core.CHECK_CALL
  File "/home/a/miniconda3/envs/mlcllm/lib/python3.11/site-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
    raise py_err
  File "/workspace/mlc-llm/cpp/serve/threaded_engine.cc", line 182, in mlc::llm::serve::ThreadedEngineImpl::RunBackgroundLoop()
  File "/workspace/mlc-llm/cpp/serve/engine.cc", line 650, in mlc::llm::serve::EngineImpl::Step()
  File "/workspace/mlc-llm/cpp/serve/engine_actions/new_request_prefill.cc", line 45, in mlc::llm::serve::NewRequestPrefillActionObj::Step(mlc::llm::serve::EngineState)
  File "/workspace/mlc-llm/cpp/serve/engine_actions/new_request_prefill.cc", line 301, in mlc::llm::serve::NewRequestPrefillActionObj::MatchPrefixCache(mlc::llm::serve::EngineState, mlc::llm::serve::BatchPrefillBaseActionObj::PrefillInput*)
  File "/workspace/mlc-llm/cpp/serve/model.cc", line 642, in mlc::llm::serve::ModelImpl::AddNewSequence(long)
  File "/workspace/mlc-llm/cpp/serve/function_table.cc", line 68, in operator()
tvm.error.InternalError: Traceback (most recent call last):
  9: mlc::llm::serve::ThreadedEngineImpl::RunBackgroundLoop()
        at /workspace/mlc-llm/cpp/serve/threaded_engine.cc:182
  8: mlc::llm::serve::EngineImpl::Step()
        at /workspace/mlc-llm/cpp/serve/engine.cc:650
  7: mlc::llm::serve::NewRequestPrefillActionObj::Step(mlc::llm::serve::EngineState)
        at /workspace/mlc-llm/cpp/serve/engine_actions/new_request_prefill.cc:45
  6: mlc::llm::serve::NewRequestPrefillActionObj::MatchPrefixCache(mlc::llm::serve::EngineState, mlc::llm::serve::BatchPrefillBaseActionObj::PrefillInput*)
        at /workspace/mlc-llm/cpp/serve/engine_actions/new_request_prefill.cc:301
  5: mlc::llm::serve::ModelImpl::AddNewSequence(long)
        at /workspace/mlc-llm/cpp/serve/model.cc:642
  4: operator()
        at /workspace/mlc-llm/cpp/serve/function_table.cc:68
  3: tvm::runtime::BcastSessionObj::CallWithPacked(tvm::runtime::TVMArgs const&)
  2: tvm::runtime::ProcessSessionObj::BroadcastPacked(tvm::runtime::TVMArgs const&)
  1: tvm::support::Pipe::Write(void const*, unsigned long)
  0: _ZN3tvm7runtime6deta
  File "/workspace/tvm/src/runtime/disco/../../support/pipe.h", line 129
InternalError: Check failed: nwrite != -1 (-1 vs. -1) : Write Error: Broken pipe

Expected behavior

Environment

  • Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA):rocm6.2
  • Operating system (e.g. Ubuntu/Windows/MacOS/...):ubuntu22.04
  • Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...) 7900xtx
  • How you installed MLC-LLM (conda, source):python pre-built package
  • How you installed TVM-Unity (pip, source):
  • Python version (e.g. 3.10):3.11
  • GPU driver version (if applicable):
  • CUDA/cuDNN version (if applicable):
  • TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models):
  • Any other relevant information:
@Erxl Erxl added the bug Confirmed bugs label Aug 25, 2024
@MasterJH5574
Copy link
Member

On our side such “broken pipe” error sometimes happens but rather rarely. On one hand we are working on finding the cause, and on the other hand you can kill the processes and rerun the server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed bugs
Projects
None yet
Development

No branches or pull requests

2 participants