idea: Production Level Queue System

## Objective

- Do we need a queue system that scales to thousands of requests

### Motivation

_Nullpointer Errors?_
- Currently, inference requests are handled FIFO
- We are adopting an OpenAI API, which means that we will receive requests across Chat, Audio, Vision etc
- Given that users are on laptops with limited RAM and VRAM, we are likely to have to switch models

_Preparing for Cloud Native_
- Our long-term future is likely as an enterprise OpenAI-alternative, which will be multi-user and have a queue system
- Should we bake in this abstraction, and use a local file-based queue (which is later swapped out for a more sophisticated queue?)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

idea: Production Level Queue System #580

Objective

Motivation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

idea: Production Level Queue System #580

Description

Objective

Motivation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions