Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: process prompt fairly accross slots #6607

Open
phymbert opened this issue Apr 11, 2024 · 1 comment
Open

server: process prompt fairly accross slots #6607

phymbert opened this issue Apr 11, 2024 · 1 comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed server/webui

Comments

@phymbert
Copy link
Collaborator

Context

At the moment we implement a FIFO approach to batch prompt tokens. So if a large prompt is to be processed it blocks all other slots.

Proposal: implement a fair batch usage of prompt processing accross all pending slots.

References:

@phymbert phymbert added enhancement New feature or request help wanted Extra attention is needed good first issue Good for newcomers server/webui labels Apr 11, 2024
@pudepiedj
Copy link
Contributor

@phymbert What would be the side-effects (or other objections/snags) of adding a SLOT_STATE_RESERVED status to the two present slot states SLOT_STATE_IDLE and SLOT_STATE_PROCESSING that allowed some slots to be kept in reserve for new prompts or running chats so that new requests don't bump them? It struck me when I was playing with my slot graphics that this might be desirable and now it has emerged as an issue, so what do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed server/webui
Projects
None yet
Development

No branches or pull requests

2 participants