Bug: llama-service can only generate garbled text after a request with invalid tokens.

### What happened?

Hi there.

My llama-server initially worked fine, but after receiving a request with illegal characters, it started generating garbled responses to all valid requests.
For example, I start my llama-server with:
```bash
./llama.cpp-b3938/build_gpu/bin/llama-server     -m ../models/Meta-Llama-3-8B.Q4_K_M.gguf     -ngl 33  --cache-type-k q8_0
```
And I send a request with legal characters at first:

```bash
curl --request POST     --url http://localhost:8080/completion     --header "Content-Type: application/json"     --data '{         "prompt": "What is the meaning of life?\n\n",         "n_predict": 256 }' 
```
The response is as expected:
```
{"content":"The meaning of life is whatever you want it to be. But in a more serious sense, the meaning of life is to experience life, and to learn from experience.\n\nWhat is the most important thing in life?\n\nThe most important thing in life is love, because without love, life would have no meaning.","id_slot":0,"stop":true,"model":"../models/Meta-Llama-3-8B.Q4_K_M.gguf","tokens_predicted":63,"tokens_evaluated":8,"generation_settings":{"n_ctx":8192,"n_predict":-1,"model":"../models/Meta-Llama-3-8B.Q4_K_M.gguf","seed":4294967295,"seed_cur":1591500953,"temperature":0.800000011920929,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"top_k":40,"top_p":0.949999988079071,"min_p":0.05000000074505806,"xtc_probability":0.0,"xtc_threshold":0.10000000149011612,"tfs_z":1.0,"typical_p":1.0,"repeat_last_n":64,"repeat_penalty":1.0,"presence_penalty":0.0,"frequency_penalty":0.0,"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.10000000149011612,"penalize_nl":false,"stop":[],"max_tokens":256,"n_keep":0,"n_discard":0,"ignore_eos":false,"stream":false,"n_probs":0,"min_keep":0,"grammar":"","samplers":["top_k","tfs_z","typ_p","top_p","min_p","xtc","temperature"]},"prompt":"What is the meaning of life?\n\n","has_new_line":true,"truncated":false,"stopped_eos":true,"stopped_word":false,"stopped_limit":false,"stopping_word":"","tokens_cached":70,"timings":{"prompt_n":8,"prompt_ms":51.577,"prompt_per_token_ms":6.447125,"prompt_per_second":155.1078969308025,"predicted_n":63,"predicted_ms":1169.143,"predicted_per_token_ms":18.557825396825397,"predicted_per_second":53.88562391426883},"index":0}
```

And then I send a request with illegal characters:

```bash
curl --request POST     --url http://localhost:8080/completion     --header "Content-Type: application/json"     --data '{         "prompt": "What is the meaning of ÿþÿþÿþÿþÿþÿþÿþî?\n\n",         "n_predict": 256     }'
```

And I get a inconherent response, which is also as expected:

```
{"content":"GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG","id_slot":0,"stop":true,"model":"../models/Meta-Llama-3-8B.Q4_K_M.gguf","tokens_predicted":256,"tokens_evaluated":29,"generation_settings":{"n_ctx":8192,"n_predict":-1,"model":"../models/Meta-Llama-3-8B.Q4_K_M.gguf","seed":4294967295,"seed_cur":4216084932,"temperature":0.800000011920929,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"top_k":40,"top_p":0.949999988079071,"min_p":0.05000000074505806,"xtc_probability":0.0,"xtc_threshold":0.10000000149011612,"tfs_z":1.0,"typical_p":1.0,"repeat_last_n":64,"repeat_penalty":1.0,"presence_penalty":0.0,"frequency_penalty":0.0,"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.10000000149011612,"penalize_nl":false,"stop":[],"max_tokens":256,"n_keep":0,"n_discard":0,"ignore_eos":false,"stream":false,"n_probs":0,"min_keep":0,"grammar":"","samplers":["top_k","tfs_z","typ_p","top_p","min_p","xtc","temperature"]},"prompt":"What is the meaning of ÿþÿþÿþÿþÿþÿþÿþî?\n\n","has_new_line":false,"truncated":false,"stopped_eos":false,"stopped_word":false,"stopped_limit":true,"stopping_word":"","tokens_cached":284,"timings":{"prompt_n":29,"prompt_ms":53.786,"prompt_per_token_ms":1.854689655172414,"prompt_per_second":539.1737626891755,"predicted_n":256,"predicted_ms":4888.785,"predicted_per_token_ms":19.09681640625,"predicted_per_second":52.364749114555046},"index":0}
```

However, after that, I get garbled responses to all valid requests:

```bash
curl --request POST     --url http://localhost:8080/completion     --header "Content-Type: application/json"     --data '{         "prompt": "What is the meaning of life?\n\n",         "n_predict": 256     }'
```

And I still get the garbled response:

```
{"content":"GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG","id_slot":0,"stop":true,"model":"../models/Meta-Llama-3-8B.Q4_K_M.gguf","tokens_predicted":256,"tokens_evaluated":8,"generation_settings":{"n_ctx":8192,"n_predict":-1,"model":"../models/Meta-Llama-3-8B.Q4_K_M.gguf","seed":4294967295,"seed_cur":3234876351,"temperature":0.800000011920929,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"top_k":40,"top_p":0.949999988079071,"min_p":0.05000000074505806,"xtc_probability":0.0,"xtc_threshold":0.10000000149011612,"tfs_z":1.0,"typical_p":1.0,"repeat_last_n":64,"repeat_penalty":1.0,"presence_penalty":0.0,"frequency_penalty":0.0,"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.10000000149011612,"penalize_nl":false,"stop":[],"max_tokens":256,"n_keep":0,"n_discard":0,"ignore_eos":false,"stream":false,"n_probs":0,"min_keep":0,"grammar":"","samplers":["top_k","tfs_z","typ_p","top_p","min_p","xtc","temperature"]},"prompt":"What is the meaning of life?\n\n","has_new_line":false,"truncated":false,"stopped_eos":false,"stopped_word":false,"stopped_limit":true,"stopping_word":"","tokens_cached":263,"timings":{"prompt_n":8,"prompt_ms":51.324,"prompt_per_token_ms":6.4155,"prompt_per_second":155.8724962980282,"predicted_n":256,"predicted_ms":4874.52,"predicted_per_token_ms":19.04109375,"predicted_per_second":52.51799151506199},"index":0}
```









### Name and Version

./llama.cpp-b3938/build_gpu/bin/llama-server --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
version: 7 (d9a33c5)
built with cc (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0 for x86_64-linux-gnu

### What operating system are you seeing the problem on?

Linux

### Relevant log output

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: llama-service can only generate garbled text after a request with invalid tokens. #10074

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug: llama-service can only generate garbled text after a request with invalid tokens. #10074

Description

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions