From 3d90f9923e7cf501b52566ef544614d67b10f357 Mon Sep 17 00:00:00 2001 From: Oleksandr Kuvshynov <661042+okuvshynov@users.noreply.github.com> Date: Sun, 5 Oct 2025 15:50:11 -0400 Subject: [PATCH] server: update readme to mention n_past_max metric https://github.com/ggml-org/llama.cpp/pull/15361 added new metric exported, but I've missed this doc. --- tools/server/README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/server/README.md b/tools/server/README.md index 9f7ab229f7ddf..6825c8bf300c6 100644 --- a/tools/server/README.md +++ b/tools/server/README.md @@ -1045,6 +1045,7 @@ Available metrics: - `llamacpp:kv_cache_tokens`: KV-cache tokens. - `llamacpp:requests_processing`: Number of requests processing. - `llamacpp:requests_deferred`: Number of requests deferred. +- `llamacpp:n_past_max`: High watermark of the context size observed. ### POST `/slots/{id_slot}?action=save`: Save the prompt cache of the specified slot to a file.