Skip to content

Conversation

@danbev
Copy link
Member

@danbev danbev commented Nov 24, 2025

This commit adds a check to skip the output reordering logic when n_outputs == 1. With a single output token, the data is trivially sorted and the reordering code is currently doing unnecessary work (resetting and rebuilding output_ids to the same values).

The motivation for this change is improved code clarity and avoiding confusion when debugging. While the performance impact is probably negligible, this unnecessary work happens on every decode call in llama-server when processing batches with single-token outputs.

This commit adds a check to skip the output reordering logic when
n_outputs == 1. With a single output token, the data is trivially
sorted and the reordering code is currently doing unnecessary work
(resetting and rebuilding output_ids to the same values).

The motivation for this change is improved code clarity and avoiding
confusion when debugging. While the performance impact is probably
negligible, this unnecessary work happens on every decode call in
llama-server when processing batches with single-token outputs.
@danbev danbev merged commit 134e694 into ggml-org:master Nov 24, 2025
73 of 74 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants