Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.

Conversation

@louis-jan
Copy link
Contributor

@louis-jan louis-jan commented Jul 22, 2024

Describe Your Changes

This PR is to add model loading on the /chat/completions & /embeddings request from the JS layer.

Note: We use cortexso/nomic-embed-text-v1 as the default embedding model, in case the model is not defined. It will automatically pull embedding model if it does not exit.

CLI update should be covered by refactoring to send REST requests.

sequenceDiagram
    participant User
    participant API_Controller
    participant Engine_Server
    participant Model

    User->>API_Controller: Chat request
    API_Controller->>Engine_Server: Check Model Running
    Engine_Server-->>API_Controller: Model status
    alt Model is not running
        API_Controller->>Engine_Server: Start Model command
        Engine_Server->>Model: Start Model
        Model-->>Engine_Server: Model started
        Engine_Server-->>API_Controller: Model ready
    end
    API_Controller->>Model: Send Chat Completions
    Model-->>API_Controller: Return Completions
    API_Controller-->>User: Send Response
Loading

Fixes Issues

  • Closes #

Self Checklist

  • Added relevant comments, esp in complex areas
  • Updated docs (for bug fixes / features)
  • Created issues for follow-up changes or refactoring needed

@louis-jan louis-jan force-pushed the chore/auto-load-model-on-chat-completions branch 4 times, most recently from f77b9c8 to d0ad07b Compare July 22, 2024 10:03
@louis-jan louis-jan force-pushed the chore/auto-load-model-on-chat-completions branch from d0ad07b to c39f4c1 Compare July 22, 2024 10:07
@louis-jan louis-jan merged commit 749cf0c into dev Jul 22, 2024
@louis-jan louis-jan deleted the chore/auto-load-model-on-chat-completions branch July 22, 2024 11:35
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants