Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

idea: Production Level Queue System #580

Open
dan-homebrew opened this issue Nov 28, 2023 · 4 comments
Open

idea: Production Level Queue System #580

dan-homebrew opened this issue Nov 28, 2023 · 4 comments
Labels
category: app shell Installer, updaters, distributions P2: enhancement low impact on functionality

Comments

@dan-homebrew
Copy link
Contributor

dan-homebrew commented Nov 28, 2023

Objective

  • Do we need a queue system that scales to thousands of requests

Motivation

Nullpointer Errors?

  • Currently, inference requests are handled FIFO
  • We are adopting an OpenAI API, which means that we will receive requests across Chat, Audio, Vision etc
  • Given that users are on laptops with limited RAM and VRAM, we are likely to have to switch models

Preparing for Cloud Native

  • Our long-term future is likely as an enterprise OpenAI-alternative, which will be multi-user and have a queue system
  • Should we bake in this abstraction, and use a local file-based queue (which is later swapped out for a more sophisticated queue?)
@dan-homebrew dan-homebrew added the type: epic A major feature or initiative label Nov 28, 2023
@dan-homebrew dan-homebrew changed the title epic: Queue System for Inference? epic: Queue System for Inference Dec 12, 2023
@dan-homebrew dan-homebrew changed the title epic: Queue System for Inference feat: Queue System for Inference Dec 12, 2023
@dan-homebrew dan-homebrew added this to the Jan Server milestone Dec 12, 2023
@dan-homebrew dan-homebrew changed the title feat: Queue System for Inference feat: Queue System for Inference? Dec 12, 2023
@dan-homebrew dan-homebrew removed the type: epic A major feature or initiative label Dec 12, 2023
@hiro-v hiro-v removed their assignment Mar 14, 2024
@Van-QA
Copy link
Contributor

Van-QA commented Apr 16, 2024

Quoted from users from janhq/jan#2704:

Problem
When a generation is ongoing, entering a new prompt causes the generation to be interrupted. It would be nice if subsequent prompts to the same model be queued instead of resulting in interruptions.

Success Criteria
It would be nice if we could queue up a couple of prompts and then get back to the responses after a while.

@louis-jan
Copy link
Contributor

Scoped for refactoring the Cortex Backend.

@Van-QA Van-QA transferred this issue from janhq/jan May 16, 2024
@0xSage 0xSage mentioned this issue Jul 1, 2024
@louis-jan
Copy link
Contributor

Should be on cortex-cpp

@0xSage
Copy link
Contributor

0xSage commented Sep 6, 2024

@vansangpfiev dont we already have a basic queue in place? If so can close this issue 🙏

Update: Nvm, modifying this issue to track a prod-level queue system long term. Out of scope for now.

@0xSage 0xSage changed the title feat: Queue System for Inference? epic: Queue System for Inference? Sep 6, 2024
@0xSage 0xSage changed the title epic: Queue System for Inference? epic: Production Level Queue System Sep 6, 2024
@0xSage 0xSage added P2: enhancement low impact on functionality category: app shell Installer, updaters, distributions and removed type: question labels Sep 6, 2024
@dan-homebrew dan-homebrew changed the title epic: Production Level Queue System idea: Production Level Queue System Sep 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: app shell Installer, updaters, distributions P2: enhancement low impact on functionality
Projects
Status: Icebox
Development

No branches or pull requests

6 participants