Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epic: Jan 0.6.0 that uses Cortex Platform #3365

Closed
8 of 10 tasks
Van-QA opened this issue Aug 14, 2024 · 9 comments
Closed
8 of 10 tasks

Epic: Jan 0.6.0 that uses Cortex Platform #3365

Van-QA opened this issue Aug 14, 2024 · 9 comments
Assignees
Labels
P1: important Important feature / fix type: epic A major feature or initiative

Comments

@Van-QA
Copy link
Contributor

Van-QA commented Aug 14, 2024

Motivation

Main Decisions:

  • Integrate Cortex as an Experimental Backend
  • Overall: Nitro -> Cortex
  • Today: Jan Roadmap for v0.6, 0.6.1, -> path to Cortex
  • Blog Post: share the roadmap with the community

v0.6 scope

  • Cortex as additional Engine
  • Existing feature improvements, bug fixes @ashley
  • UI to introduce Cortex as experimental
  • Roadmap for that incremental migration @van @louis

v0.6 out-of-scope

  • Conversation threads in DB
  • Migration from legacy to DB

Specs

Tasklist

Discontinued

Appendix

@Van-QA Van-QA added the type: epic A major feature or initiative label Aug 14, 2024
@Van-QA Van-QA modified the milestones: v0.6.1, v.0.6.0 Aug 14, 2024
@Van-QA Van-QA changed the title epic: Jan 0.5.2 with Cortex platform epic: Jan 0.5.2 with Cortex platform extensions Aug 14, 2024
@imtuyethan imtuyethan added the P1: important Important feature / fix label Aug 14, 2024
@Van-QA Van-QA mentioned this issue Aug 23, 2024
3 tasks
@Van-QA Van-QA pinned this issue Aug 23, 2024
@Van-QA Van-QA modified the milestones: v.0.6.0, v.0.6.1 Aug 26, 2024
@louis-jan louis-jan changed the title epic: Jan 0.5.2 with Cortex platform extensions epic: Jan 0.6.0 with Cortex platform extensions Aug 27, 2024
@louis-jan
Copy link
Contributor

@marknguyen1302 to attach the specs here

@dan-homebrew
Copy link
Contributor

dan-homebrew commented Aug 30, 2024

@louis-jan @marknguyen1302 I'm adding the #3325 to this Tasklist

  • we can discuss later if we don't need a migration wizard (if we will continue supporting legacy filesystem)

@louis-jan
Copy link
Contributor

louis-jan commented Aug 30, 2024

@dan-homebrew @marknguyen1302 I'm listing somepoints to consider / concerns here so we can address together

  • Model prepopulation - One important point to consider when using the HF/Provider model is fetching (3373, 3374) feat: Remote APIs can fetch Model List #3374 (comment)
  • Cortex process handling - For now, Cortex serves the chat function and also serves the server API. There are possible side effects such as:
    • Load model fail -> cortex process crashed -> server stopped working
    • Run Cortex on app load so the API server can always be on (on message send before)
    • Change thread settings (context length / ngl) / Stop / Delete a running model -> Unload instead of just kill as before
    • New UX on state update? E.g. Cortex stopped working -> Restart by process watchdog -> Running
  • Need a better UX for running multiple models simultaneously
    • Loading so many models could break the user's machine
    • Unloading a model could affect a running server (or not, since sending a message could load the model again)
  • Need a better UX for engine setup (the UI update should also work with our extension architecture but not direct users to many screens)
    • Llama.cpp is bundled by default, but not for others, also CUDA dependencies (should some how shared among engines)
  • Models can be run through the API Server
    • Legacy models are routed to Nitro (even eventually route to cortex-cpp), new models are routed to Cortex. Is it a good UX to force Cortex API server to serve legacy nitro models?
  • Pre-populate HF cortex models: there should be duplicated models where the 0.5.3 models list is updated. New branch / model repo to serve Jan hub only? So we can filter out which models could be pre-populated instead of everything.
  • Preserve thread settings - confusing feature? since we will work on Presets soon, let's take it out?
  • Sync settings (& dtos?) across projects (https://discord.com/channels/1107178041848909847/1239846009258119178/1279001259613093910)
  • Cortex release process - e.g. We are updating the cortex repository's CI/CD for interim CLI development
  • Cortex provides a backward-compatible FS module, so this means all requests from Jan go through the Cortex API? Back to the previous integration, there are so many edge cases to handle such as:
    • The Cortex server should be ON before showing threads/models. Data caching can improve the experiment for a better user experience, as users cannot wait for loading every time they reopen the app.
    • Cortex process status, as described above ^
    • Models import should also go thru cortex server
    • Deprecate core FS from Jan
    • Advanced Settings from Jan -> configure Cortex?
    • HTTP Proxy for model downloading?
    • HuggingFace API Token input from Jan -> Cortex
    • Logs settings from Jan > Cortex (on/off, cleaning interval..), how to log from Jan app -> cortex?
    • Tools support (Actually just PDF retrieval for now) - Depends on Jan core FS, if we don't deprecate that is fine
  • IMPORTANT: Multiple instructions set support - Multiple engine binaries distribution

@marknguyen1302
Copy link
Contributor

thanks @louis-jan for this. about Cortex provides a backward-compatible FS module, I think it's for the Jan API server only, so we keep the current app the same, instead of spawning a new port like the current Jan, we will use Cortex Platform API for that server. what do you think?

@marknguyen1302
Copy link
Contributor

about Cortex process handling, I think when load/inference/start the Jan API server, we should check the health of the Cortex server and attempt to spawn if the server is not on.

@louis-jan
Copy link
Contributor

Yes @marknguyen1302, scope down the other parts would help.
For now:

  • I think just trying to sync the state between threads and the API server could be a good option when the Cortex process is terminated -> update the state from both places, and the user can either send a new message to try to load again or turn on the server.

But when we migrate full requests to Cortex endpoints, there should be something to addres.

@dan-homebrew dan-homebrew changed the title epic: Jan 0.6.0 with Cortex platform extensions epic: Jan 0.6.0 that uses Cortex Platform Sep 3, 2024
@dan-homebrew
Copy link
Contributor

@louis-jan Thanks for your points here - I'll likely need to break them up into separate discussions - it covers a lot of issues
#3365 (comment)

@louis-jan louis-jan changed the title epic: Jan 0.6.0 that uses Cortex Platform Epic: Jan 0.6.0 that uses Cortex Platform Sep 5, 2024
@0xSage
Copy link
Contributor

0xSage commented Sep 9, 2024

to be rescoped into #3599

@dan-homebrew
Copy link
Contributor

Discontinuing this due to Cortex Platform deprecation

@0xSage 0xSage unpinned this issue Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1: important Important feature / fix type: epic A major feature or initiative
Projects
Archived in project
Development

No branches or pull requests

6 participants