From e96590c5f0f786725c16400b4ebe6af582da1e91 Mon Sep 17 00:00:00 2001
From: Daniel Bevenius <daniel.bevenius@gmail.com>
Date: Mon, 24 Nov 2025 12:21:40 +0100
Subject: [PATCH] llama : skip output reordering for single token batches

This commit adds a check to skip the output reordering logic when
n_outputs == 1. With a single output token, the data is trivially
sorted and the reordering code is currently doing unnecessary work
(resetting and rebuilding output_ids to the same values).

The motivation for this change is improved code clarity and avoiding
confusion when debugging. While the performance impact is probably
negligible, this unnecessary work happens on every decode call in
llama-server when processing batches with single-token outputs.
---
 src/llama-context.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/llama-context.cpp b/src/llama-context.cpp
index 70a3ec62dfc..2aa6d52a242 100644
--- a/src/llama-context.cpp
+++ b/src/llama-context.cpp
@@ -1248,7 +1248,7 @@ int llama_context::decode(const llama_batch & batch_inp) {
 
         // make the outputs have the same order they had in the user-provided batch
         // note: this is mostly relevant for recurrent models atm
-        if (!sorted_output) {
+        if (!sorted_output && n_outputs > 1) {
             GGML_ASSERT((size_t) n_outputs == out_ids.size());
 
             // TODO: is there something more efficient which also minimizes swaps?