This repository was archived by the owner on Jul 4, 2025. It is now read-only.

Description
Cortex version
1.0.6
Describe the issue and expected behaviour
When starting a model, there are engine parameters that can be configured as described here: https://github.com/janhq/cortex.llamacpp. However, when sending these parameters through the cortex.cpp server, most of them are filtered out due to a new model.yaml configuration that hardcodes several acceptable parameters.
After reviewing the model.yaml implementation, I noticed that the settings are not applicable because these declaration are missing. So that they all fallback to default settings.
- cpu_threads
- n_batch
- caching_enabled
- grp_attn_n
- grp_attn_w
- mlock
- grammar_file
- model_type
- model_alias
- flash_attn
- cache_type
- use_mmap
- llama_model_path
- embedding
- cont_batching
- user_prompt
- ai_prompt
- system_prompt
- pre_prompt
Steps to Reproduce
- Start cortex server
- Start a model by sending a request with
cpu_threads or n_batch settings
- Observe cortex.log
- See the error
Screenshots / Logs
No response
What is your OS?
What engine are you running?
Hardware Specs eg OS version, GPU
No response