[Feature] disable-req-waiting

### Checklist

- [ ] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [ ] 2. Please use English, otherwise it will be closed.

### Motivation

TTFT is also an important online indicator.In sglang, I find some badcases:
when vram is not enough for the coming req, the req must wait for a while in waiting_queue, then ttft could be bad as user see(including waiting time in waiting queue)
so I want to fuse my some work about it in upstream. if we disable-req-waiting, when vram is not enough for the coming req, the scheduler could return 403 to server and user or router could try again at the service level.

Which parts may be modified:
1. in scheduler.py, we need add some free-vram check in "handle_generate_request" and if vram is not enough, just return aborted status to tokenizer
2. in tokenizer.py and open_ai/adapter.py , we need to support return this kind of errors , for example, in my previous implementation, return 403 http code to client.
3. in schedule_batch.py, we need remian_vram property to know the free-vram and get a possible video memory usage for new requests, to judge whether the new req could be inserted in waiting_queue

What is expected：
1. if a request be inserted in waiting_queue, means it could be inferenced quickly(about a forward-step latency) and ttft could be close to the time required for prefill. 
2. in router/service level, we could make a better load balance

Timeline:
done before 5th May

### Related resources

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] disable-req-waiting #5446

Checklist

Motivation

Related resources

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] disable-req-waiting #5446

Description

Checklist

Motivation

Related resources

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions