feat(config): default prompt_cache_all to true by localai-bot · Pull Request #9951 · mudler/LocalAI

localai-bot · 2026-05-22T19:43:30Z

Summary

Make the per-request cache_prompt knob default to on, matching upstream llama.cpp.

backend/cpp/llama-cpp/grpc-server.cpp unconditionally forwards the proto field:

data["cache_prompt"] = predict->promptcacheall();   // grpc-server.cpp:197

Since the YAML-loaded Go value was a plain bool (zero = false), any model that didn't explicitly set prompt_cache_all: true ended up sending cache_prompt=false to llama.cpp — overriding upstream's own default (common/common.h:592: bool cache_prompt = true). With kv_unified=true and cache_idle_slots=true already default in parse_options, this was the last piece keeping the per-request prompt cache from being usable out of the box.

Change

Make LLMConfig.PromptCacheAll tristate (*bool), mirroring MMap, MMlock, Reranking, etc.
In SetDefaults, when nil → set to true.
Dereference at the proto boundary (gRPCPredictOpts is post-SetDefaults, so non-nil by contract — same idiom as the surrounding *c.Temperature, *c.TopK lines).
Add three Ginkgo specs covering: default, explicit false, explicit true. Mirrors the existing enable_prefix_caching precedent in hooks_test.go.

Notes

No proto change. Generated pb.PredictOptions.PromptCacheAll stays bool.
Users can still opt out with prompt_cache_all: false in the model YAML.
No in-tree YAML currently sets prompt_cache_all: false, so the behavior flip lands as a pure improvement.

Test plan

go test ./core/config/ ./core/backend/ — 102 specs (up from 99) pass
go vet ./core/config/ ./core/backend/ clean
CI lint job (golangci-lint) — verify in PR
Manual smoke: serve a chat model without prompt_cache_all in YAML, confirm second request with shared prefix is faster than the first

🤖 Generated with Claude Code

Upstream llama.cpp defaults `cache_prompt = true` (common/common.h), but `parse_options` in the grpc-server backend unconditionally forwards the proto `PromptCacheAll` field, so any model that didn't set `prompt_cache_all: true` in its YAML was getting `cache_prompt=false` — silently overriding llama.cpp's own default. With `kv_unified` and `cache_idle_slots` already on by default, this was the last piece preventing the per-request prompt cache from being usable out of the box. Make `PromptCacheAll` tristate (`*bool`), default it to `true` in `SetDefaults`, and dereference at the proto boundary. Users can still opt out with an explicit `prompt_cache_all: false`. Same pattern as `MMap`, `MMlock`, `Reranking`, etc. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mudler merged commit c500461 into master May 22, 2026
56 checks passed

mudler deleted the default-cache-prompt-true branch May 22, 2026 20:06

localai-bot added the enhancement New feature or request label May 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(config): default prompt_cache_all to true#9951

feat(config): default prompt_cache_all to true#9951
mudler merged 1 commit into
masterfrom
default-cache-prompt-true

localai-bot commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

localai-bot commented May 22, 2026

Summary

Change

Notes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants