Skip to content

Conversation

ServeurpersoCom
Copy link
Collaborator

Introduce OpenAI-compatible model selector in JSON payload

This PR adds a minimal model selector to the WebUI sidebar, allowing users to pick an available model exposed through the /v1/models OpenAI-compatible endpoint

The selector automatically fetches and lists models from the server, persists the selected model in local storage, and sends it in the JSON body of subsequent /v1/chat/completions requests. The selection logic mirrors OpenAI’s client behavior while remaining fully offline-compatible with local llama.cpp instances

This enables direct interoperability with OpenAI-compatible clients and simplifies multi-model setups in the WebUI

Restore OpenAI-Compatible model source of truth and unify metadata capture :

This change re-establishes a single, reliable source of truth for the active model:
fully aligned with the OpenAI-Compat API behavior

It introduces a unified metadata flow that captures the model field from both
streaming and non-streaming responses, wiring a new onModel callback through ChatService
The model name is now resolved directly from the API payload rather than relying on
server /props or UI assumptions

ChatStore records and persists the resolved model for each assistant message during
streaming, ensuring consistency across the UI and database
Type definitions for API and settings were also extended to include model metadata
and the onModel callback, completing the alignment with OpenAI-Compat semantics

Remaining '/props' usage audit in the WebUI :

A repository-wide search inside 'tools/server/webui' shows the remaining '/props' references are intentional because the WebUI still needs to bootstrap and validate server capabilities outside of chat responses:

  • 'src/routes/+layout.svelte' and 'src/lib/stores/server.svelte.ts' fetch '/props' on application startup to populate the global server store with template, model alias, and capability metadata that never appears in chat completions.
  • 'src/lib/components/app/server/ServerErrorSplash.svelte' and 'src/lib/components/app/chat/ChatScreen/ChatScreenWarning.svelte' surface fallback UI when '/props' is unreachable, ensuring the user understands cached data might be stale.
  • 'src/lib/utils/api-key-validation.ts' validates API keys against '/props' so that the UI can warn about incompatible keys before issuing chat requests.
  • 'src/lib/services/chat.ts' performs a last-resort fetch to '/props' when the streaming handshake fails, preserving compatibility with legacy servers that only expose model names via that endpoint.

…data capture

This change re-establishes a single, reliable source of truth for the active model:
fully aligned with the OpenAI-Compat API behavior

It introduces a unified metadata flow that captures the model field from both
streaming and non-streaming responses, wiring a new onModel callback through ChatService
The model name is now resolved directly from the API payload rather than relying on
server /props or UI assumptions

ChatStore records and persists the resolved model for each assistant message during
streaming, ensuring consistency across the UI and database
Type definitions for API and settings were also extended to include model metadata
and the onModel callback, completing the alignment with OpenAI-Compat semantics
@ServeurpersoCom
Copy link
Collaborator Author

TL;DR:
Adds a lightweight model selector for the WebUI using the /v1/models OpenAI-compatible endpoint.
Selected models are persisted locally and included in chat request payloads (model field).
Also unifies model metadata capture during streaming and non-streaming responses : the WebUI now uses a single source of truth for the active model across the stack.

@ServeurpersoCom
Copy link
Collaborator Author

ServeurpersoCom commented Oct 13, 2025

@ngxson :) What do you think about this approach ?

  • aiming to stay compatible with the current standalone llama-server,
  • llama-swap
  • and future multi-model evolutions of llama-server?

It introduces a unified, KISS, OpenAI-compatible model selection path while keeping everything backward-compatible with existing setups

A standalone llama-server on a Raspberry Pi 5 :
Sans titre
I'll have to filter the model path here too (?)

@ServeurpersoCom
Copy link
Collaborator Author

@allozaur mind taking a look at those default Svelte arrows and the scrolling manager? I figured your Svelte wizardry might know the cleanest way to get rid of them 😄 I like things to be pixel-perfect, but it looks like this is built into the framework : and I’d rather not bypass Svelte just for that.
SvelteArrow

@allozaur
Copy link
Collaborator

@allozaur mind taking a look at those default Svelte arrows and the scrolling manager? I figured your Svelte wizardry might know the cleanest way to get rid of them 😄 I like things to be pixel-perfect, but it looks like this is built into the framework : and I’d rather not bypass Svelte just for that.

SvelteArrow

Yep, will take a look at that and come back to u with an answer 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants