Device-side assertion triggered on `Batch.prepare_for_decode`, release v0.1.16 #461

noah-kim-theori · 2024-05-22T04:40:42Z

On sglang.srt.managers.router.infer_batch.Batch, Batch.prepare_for_decode triggers a device-side assertion.

model=Command-R-v01, AWQ 4bit quantized
max_new_tokens=32384
mem_fraction_static=0.6 (on single A100, 81920MiB VRAM)
on Regex-constrained decoding

Issue point was at

sglang/python/sglang/srt/managers/router/infer_batch.py

Line 421 in 39191c8

self.req_to_token_pool.req_to_token[

Below is the crash log; however, apologize that reproduction is not provided.

new fill batch. #seq: 1. #cached_token: 8110. #new_token: 3. #remaining_req: 0. #running_req: 0. tree_cache_hit_rate: 99.66%.
new fill batch. #seq: 1. #cached_token: 8153. #new_token: 3. #remaining_req: 0. #running_req: 0. tree_cache_hit_rate: 99.66%.
new fill batch. #seq: 1. #cached_token: 8162. #new_token: 3. #remaining_req: 0. #running_req: 0. tree_cache_hit_rate: 99.66%.
new fill batch. #seq: 1. #cached_token: 8177. #new_token: 4. #remaining_req: 0. #running_req: 0. tree_cache_hit_rate: 99.66%.
new fill batch. #seq: 1. #cached_token: 8182. #new_token: 3. #remaining_req: 0. #running_req: 0. tree_cache_hit_rate: 99.67%.
new fill batch. #seq: 1. #cached_token: 8189. #new_token: 3. #remaining_req: 0. #running_req: 0. tree_cache_hit_rate: 99.67%.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
Exception in ModelRpcClient:
Traceback (most recent call last):
  File "/home/noah/sglang/python/sglang/srt/managers/router/model_rpc.py", line 213, in exposed_step
    self.forward_step()
  File "/home/noah/miniconda3/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/noah/sglang/python/sglang/srt/managers/router/model_rpc.py", line 248, in forward_step
    self.forward_decode_batch(self.running_batch)
  File "/home/noah/sglang/python/sglang/srt/managers/router/model_rpc.py", line 566, in forward_decode_batch
    batch.prepare_for_decode()
  File "/home/noah/sglang/python/sglang/srt/managers/router/infer_batch.py", line 432, in prepare_for_decode
    self.req_to_token_pool.req_to_token[
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

The text was updated successfully, but these errors were encountered:

noah-kim-theori · 2024-05-22T06:31:26Z

I discovered that in sglang.srt.model_config.ModelConfig, the default value of maximum context length is borrowed from the HuggingFace configurations.

The function sglang.srt.hf_transformers_utils.get_context_length checks for the existence of candidates in the following order: max_sequence_length > seq_length > max_position_embeddings > max_seq_len > model_max_length. Since Cohere Command-R v01 has a 128k context length but also an 8k positional embedding, sglang assumes the model has an 8k context length.

Consequently, mis-indexing of Batch.req_to_token_pool.req_to_token occurred when it tried to generate more than 8k tokens.

To fix this issue, the order of candidates should be reconsidered.

noah-kim-theori · 2024-05-25T16:13:39Z

I think it would be updated also too

        if server_args.context_length is not None:
            self.context_len = server_args.context_length
        else:
            self.context_len = get_context_length(self.hf_config)

github-actions · 2024-07-26T01:02:27Z

This issue has been automatically closed due to inactivity. Please feel free to reopen it if needed.

m0g1cian mentioned this issue May 24, 2024

RuntimeError: CUDA error: device-side assert triggered when running #271

Closed

pseudotensor mentioned this issue May 25, 2024

../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [40,0,0], thread: [26,0,0] Assertion srcIndex < srcSelectDimSize failed. #473

Closed

github-actions bot closed this as completed Jul 26, 2024

github-actions bot added the inactive label Jul 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Device-side assertion triggered on `Batch.prepare_for_decode`, release v0.1.16 #461

Device-side assertion triggered on `Batch.prepare_for_decode`, release v0.1.16 #461

noah-kim-theori commented May 22, 2024 •

edited

Loading

noah-kim-theori commented May 22, 2024

noah-kim-theori commented May 25, 2024 •

edited

Loading

github-actions bot commented Jul 26, 2024

Device-side assertion triggered on Batch.prepare_for_decode, release v0.1.16 #461

Device-side assertion triggered on Batch.prepare_for_decode, release v0.1.16 #461

Comments

noah-kim-theori commented May 22, 2024 • edited Loading

noah-kim-theori commented May 22, 2024

noah-kim-theori commented May 25, 2024 • edited Loading

github-actions bot commented Jul 26, 2024

Device-side assertion triggered on `Batch.prepare_for_decode`, release v0.1.16 #461

Device-side assertion triggered on `Batch.prepare_for_decode`, release v0.1.16 #461

noah-kim-theori commented May 22, 2024 •

edited

Loading

noah-kim-theori commented May 25, 2024 •

edited

Loading