Skip to content

Conversation

@danbev
Copy link
Member

@danbev danbev commented Feb 7, 2025

TThis commit replaces the call to common_detokenize with
common_token_to_piece in the populate_token_probs.

The motivation for this change is to avoid an issue where
common_detokenize would remove the word boundary character for tokens,
which caused a regression in the server generated token probabilities.

Resolves: #11728

@ngxson
Copy link
Collaborator

ngxson commented Feb 7, 2025

Also, a clue is to look around the common_detokenize, probably we miss certain params when constructing these detokenized string. The lines around result.probs.reserve(max_probs); should be related

@danbev danbev changed the title server : use text_to_send instead of detokenized token llama : add remove_space_prefix to llama_detokenize Feb 10, 2025
Copy link

@CyberShadow CyberShadow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't comment on the code, but I've tested this patch and it builds successfully + resolves the issue #11728 . Thank you!

@danbev danbev force-pushed the server-logprobs-11728 branch from 456dabf to cc1fd2f Compare February 10, 2025 15:27
@danbev danbev marked this pull request as ready for review February 10, 2025 15:28
This commit replaces the call to common_detokenize with
common_token_to_piece in the populate_token_probs.

The motivation for this change is to avoid an issue where
common_detokenize would remove the word boundary character for tokens,
which caused a regression in the server generated token probabilities.

Resolves: ggml-org#11728
@danbev danbev force-pushed the server-logprobs-11728 branch from cc1fd2f to b70fd3a Compare February 10, 2025 18:33
@ngxson
Copy link
Collaborator

ngxson commented Feb 10, 2025

IIRC there is another place having the same logic (near the speculative decode logic), that should be fixed too

Use common_token_to_piece for post_sampling_probs as well.
@danbev
Copy link
Member Author

danbev commented Feb 11, 2025

IIRC there is another place having the same logic (near the speculative decode logic), that should be fixed too

I've taken a look but could not find this in/near the speculative decoding. I did find this TODO though.

I noticed that this same logic is part of the post_sampling_probs and have updated this in 5deee0a.

@danbev danbev changed the title llama : add remove_space_prefix to llama_detokenize server : use common_token_to_piece instead of common_detokenize Feb 11, 2025
@ngxson ngxson merged commit a18f481 into ggml-org:master Feb 11, 2025
46 checks passed
tinglou pushed a commit to tinglou/llama.cpp that referenced this pull request Feb 13, 2025
…-org#11740)

* server : use common_token_to_piece instead of common_detokenize

This commit replaces the call to common_detokenize with
common_token_to_piece in the populate_token_probs.

The motivation for this change is to avoid an issue where
common_detokenize would remove the word boundary character for tokens,
which caused a regression in the server generated token probabilities.

Resolves: ggml-org#11728

* squash! server : use common_token_to_piece instead of common_detokenize

Use common_token_to_piece for post_sampling_probs as well.
orca-zhang pushed a commit to orca-zhang/llama.cpp that referenced this pull request Feb 26, 2025
…-org#11740)

* server : use common_token_to_piece instead of common_detokenize

This commit replaces the call to common_detokenize with
common_token_to_piece in the populate_token_probs.

The motivation for this change is to avoid an issue where
common_detokenize would remove the word boundary character for tokens,
which caused a regression in the server generated token probabilities.

Resolves: ggml-org#11728

* squash! server : use common_token_to_piece instead of common_detokenize

Use common_token_to_piece for post_sampling_probs as well.
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Feb 26, 2025
…-org#11740)

* server : use common_token_to_piece instead of common_detokenize

This commit replaces the call to common_detokenize with
common_token_to_piece in the populate_token_probs.

The motivation for this change is to avoid an issue where
common_detokenize would remove the word boundary character for tokens,
which caused a regression in the server generated token probabilities.

Resolves: ggml-org#11728

* squash! server : use common_token_to_piece instead of common_detokenize

Use common_token_to_piece for post_sampling_probs as well.
ubergarm pushed a commit to ubergarm/llama.cpp that referenced this pull request Mar 1, 2025
…-org#11740)

* server : use common_token_to_piece instead of common_detokenize

This commit replaces the call to common_detokenize with
common_token_to_piece in the populate_token_probs.

The motivation for this change is to avoid an issue where
common_detokenize would remove the word boundary character for tokens,
which caused a regression in the server generated token probabilities.

Resolves: ggml-org#11728

* squash! server : use common_token_to_piece instead of common_detokenize

Use common_token_to_piece for post_sampling_probs as well.
mglambda pushed a commit to mglambda/llama.cpp that referenced this pull request Mar 8, 2025
…-org#11740)

* server : use common_token_to_piece instead of common_detokenize

This commit replaces the call to common_detokenize with
common_token_to_piece in the populate_token_probs.

The motivation for this change is to avoid an issue where
common_detokenize would remove the word boundary character for tokens,
which caused a regression in the server generated token probabilities.

Resolves: ggml-org#11728

* squash! server : use common_token_to_piece instead of common_detokenize

Use common_token_to_piece for post_sampling_probs as well.
@danbev danbev deleted the server-logprobs-11728 branch August 13, 2025 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Misc. bug: Tokens in top_probs / top_logprobs are missing whitespace

4 participants