Release v225 · mostlygeek/llama-swap

Model Capabilities

The new capabilities configuration option makes it easy to define what modalities the model supports for input and output. Here is what the configuration looks like:

models:
    example_model:    
      # capabilities: defines what the model accepts for input, output and other metadata
      # - optional; omitted or all-zero means no capabilities
      # - used in v1/models to inform clients what the model can do
      capabilities:
        # in: list of modalities understood by the model
        # - default: []
        # - valid: text, audio, image
        in:
          - text
          - audio
          - image
        # out: list of modalities generated by the model
        # - default: []
        # - valid: text, audio, image
        out:
          - text
          - audio
          - image
        # tools: the model supports function calling
        # - default: false
        tools: true
  
        # reranker: the model supports the /v1/rerank endpoint
        # - default: false
        reranker: false
  
        # context: the maximum token context length supported
        # - default: 0
        # - must be an integer > 0
        context: 32000

    # capabilities can be written in a very condensed form
    image_gen:
        capabilities:
            in: [text]
            out: [image]
    speech_to_text:
        capabilities:
            in: [text]
            out: [audio]
    transcription:
        capabilities:
            in: [audio]
            out: [text]
    reranker:
        capabilities:
          reranker: true

When a client calls v1/models it will generate metadata that is compatible with mistral, openrouter and huggingface chat-ui formats.

{
  "data": [
    {
      "id": "image_gen",
      "object": "model",
      "created": 1781420051,
      "owned_by": "llama-swap",
      "architecture": {
        "input_modalities": [
          "text"
        ],
        "modality": "text->image",
        "output_modalities": [
          "image"
        ]
      },
      "capabilities": {
        "image_generation": true
      }
    },
    {
      "id": "reranker",
      "object": "model",
      "created": 1781420051,
      "owned_by": "llama-swap",
      "capabilities": {
        "reranker": true
      }
    },
    {
      "id": "speech_to_text",
      "object": "model",
      "created": 1781420051,
      "owned_by": "llama-swap",
      "architecture": {
        "input_modalities": [
          "text"
        ],
        "modality": "text->audio",
        "output_modalities": [
          "audio"
        ]
      },
      "capabilities": {
        "audio_speech": true
      }
    },
    {
      "id": "transcription",
      "object": "model",
      "created": 1781420051,
      "owned_by": "llama-swap",
      "architecture": {
        "input_modalities": [
          "audio"
        ],
        "modality": "audio->text",
        "output_modalities": [
          "text"
        ]
      },
      "capabilities": {
        "audio_transcriptions": true
      }
    }
  ],
  "object": "list"
}

Other changes

Implementation of a new scheduler backend (#823). No functional changes for users but will make implementing different scheduling and swapping strategies a bit easier. This is just the first step and the goal is for anyone to customize llama-swap's behaviour with by implementing the new interfaces.
#839 is a follow up to improve abstractions and implementation boundaries for new schedulers / swappers
- it also resolved the long standing #717! If you have api keys set in the configuration the UI will prompt for a password now :)
- the /metrics endpoint requires an api key now. HTTP Basic Auth is supported so prometheus integration is a single step.

Changelog

92b9044 Model capabilities 734 (#842)
62aea0e internal/router,server,shared: refactor auth, libs (#839)
8c660dc main: gofmt
f6877b8 main: show message when listening on network (#836)
9b3a33d Implement new scheduler (#823)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v225

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Model Capabilities

Other changes

Changelog

Uh oh!