Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.
This repository was archived by the owner on Jul 4, 2025. It is now read-only.

bug: some of the engine parameters in the model load request are ignored #1824

@louis-jan

Description

@louis-jan

Cortex version

1.0.6

Describe the issue and expected behaviour

When starting a model, there are engine parameters that can be configured as described here: https://github.com/janhq/cortex.llamacpp. However, when sending these parameters through the cortex.cpp server, most of them are filtered out due to a new model.yaml configuration that hardcodes several acceptable parameters.

After reviewing the model.yaml implementation, I noticed that the settings are not applicable because these declaration are missing. So that they all fallback to default settings.

  • cpu_threads
  • n_batch
  • caching_enabled
  • grp_attn_n
  • grp_attn_w
  • mlock
  • grammar_file
  • model_type
  • model_alias
  • flash_attn
  • cache_type
  • use_mmap
  • llama_model_path
  • embedding
  • cont_batching
  • user_prompt
  • ai_prompt
  • system_prompt
  • pre_prompt

Steps to Reproduce

  1. Start cortex server
  2. Start a model by sending a request with cpu_threads or n_batch settings
  3. Observe cortex.log
  4. See the error

Screenshots / Logs

No response

What is your OS?

  • Windows
  • Mac Silicon
  • Mac Intel
  • Linux / Ubuntu

What engine are you running?

  • cortex.llamacpp (default)
  • cortex.tensorrt-llm (Nvidia GPUs)
  • cortex.onnx (NPUs, DirectML)

Hardware Specs eg OS version, GPU

No response

Metadata

Metadata

Assignees

Labels

P1: importantImportant feature / fixtype: bugSomething isn't working

Type

Projects

Relationships

None yet

Development

No branches or pull requests

Issue actions