Skip to content

Bug: llama-service can only generate garbled text after a request with invalid tokens. #10074

@morgen52

Description

@morgen52

What happened?

Hi there.

My llama-server initially worked fine, but after receiving a request with illegal characters, it started generating garbled responses to all valid requests.
For example, I start my llama-server with:

./llama.cpp-b3938/build_gpu/bin/llama-server     -m ../models/Meta-Llama-3-8B.Q4_K_M.gguf     -ngl 33  --cache-type-k q8_0

And I send a request with legal characters at first:

curl --request POST     --url http://localhost:8080/completion     --header "Content-Type: application/json"     --data '{         "prompt": "What is the meaning of life?\n\n",         "n_predict": 256 }' 

The response is as expected:

{"content":"The meaning of life is whatever you want it to be. But in a more serious sense, the meaning of life is to experience life, and to learn from experience.\n\nWhat is the most important thing in life?\n\nThe most important thing in life is love, because without love, life would have no meaning.","id_slot":0,"stop":true,"model":"../models/Meta-Llama-3-8B.Q4_K_M.gguf","tokens_predicted":63,"tokens_evaluated":8,"generation_settings":{"n_ctx":8192,"n_predict":-1,"model":"../models/Meta-Llama-3-8B.Q4_K_M.gguf","seed":4294967295,"seed_cur":1591500953,"temperature":0.800000011920929,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"top_k":40,"top_p":0.949999988079071,"min_p":0.05000000074505806,"xtc_probability":0.0,"xtc_threshold":0.10000000149011612,"tfs_z":1.0,"typical_p":1.0,"repeat_last_n":64,"repeat_penalty":1.0,"presence_penalty":0.0,"frequency_penalty":0.0,"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.10000000149011612,"penalize_nl":false,"stop":[],"max_tokens":256,"n_keep":0,"n_discard":0,"ignore_eos":false,"stream":false,"n_probs":0,"min_keep":0,"grammar":"","samplers":["top_k","tfs_z","typ_p","top_p","min_p","xtc","temperature"]},"prompt":"What is the meaning of life?\n\n","has_new_line":true,"truncated":false,"stopped_eos":true,"stopped_word":false,"stopped_limit":false,"stopping_word":"","tokens_cached":70,"timings":{"prompt_n":8,"prompt_ms":51.577,"prompt_per_token_ms":6.447125,"prompt_per_second":155.1078969308025,"predicted_n":63,"predicted_ms":1169.143,"predicted_per_token_ms":18.557825396825397,"predicted_per_second":53.88562391426883},"index":0}

And then I send a request with illegal characters:

curl --request POST     --url http://localhost:8080/completion     --header "Content-Type: application/json"     --data '{         "prompt": "What is the meaning of ÿþÿþÿþÿþÿþÿþÿþî?\n\n",         "n_predict": 256     }'

And I get a inconherent response, which is also as expected:

{"content":"GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG","id_slot":0,"stop":true,"model":"../models/Meta-Llama-3-8B.Q4_K_M.gguf","tokens_predicted":256,"tokens_evaluated":29,"generation_settings":{"n_ctx":8192,"n_predict":-1,"model":"../models/Meta-Llama-3-8B.Q4_K_M.gguf","seed":4294967295,"seed_cur":4216084932,"temperature":0.800000011920929,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"top_k":40,"top_p":0.949999988079071,"min_p":0.05000000074505806,"xtc_probability":0.0,"xtc_threshold":0.10000000149011612,"tfs_z":1.0,"typical_p":1.0,"repeat_last_n":64,"repeat_penalty":1.0,"presence_penalty":0.0,"frequency_penalty":0.0,"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.10000000149011612,"penalize_nl":false,"stop":[],"max_tokens":256,"n_keep":0,"n_discard":0,"ignore_eos":false,"stream":false,"n_probs":0,"min_keep":0,"grammar":"","samplers":["top_k","tfs_z","typ_p","top_p","min_p","xtc","temperature"]},"prompt":"What is the meaning of ÿþÿþÿþÿþÿþÿþÿþî?\n\n","has_new_line":false,"truncated":false,"stopped_eos":false,"stopped_word":false,"stopped_limit":true,"stopping_word":"","tokens_cached":284,"timings":{"prompt_n":29,"prompt_ms":53.786,"prompt_per_token_ms":1.854689655172414,"prompt_per_second":539.1737626891755,"predicted_n":256,"predicted_ms":4888.785,"predicted_per_token_ms":19.09681640625,"predicted_per_second":52.364749114555046},"index":0}

However, after that, I get garbled responses to all valid requests:

curl --request POST     --url http://localhost:8080/completion     --header "Content-Type: application/json"     --data '{         "prompt": "What is the meaning of life?\n\n",         "n_predict": 256     }'

And I still get the garbled response:

{"content":"GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG","id_slot":0,"stop":true,"model":"../models/Meta-Llama-3-8B.Q4_K_M.gguf","tokens_predicted":256,"tokens_evaluated":8,"generation_settings":{"n_ctx":8192,"n_predict":-1,"model":"../models/Meta-Llama-3-8B.Q4_K_M.gguf","seed":4294967295,"seed_cur":3234876351,"temperature":0.800000011920929,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"top_k":40,"top_p":0.949999988079071,"min_p":0.05000000074505806,"xtc_probability":0.0,"xtc_threshold":0.10000000149011612,"tfs_z":1.0,"typical_p":1.0,"repeat_last_n":64,"repeat_penalty":1.0,"presence_penalty":0.0,"frequency_penalty":0.0,"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.10000000149011612,"penalize_nl":false,"stop":[],"max_tokens":256,"n_keep":0,"n_discard":0,"ignore_eos":false,"stream":false,"n_probs":0,"min_keep":0,"grammar":"","samplers":["top_k","tfs_z","typ_p","top_p","min_p","xtc","temperature"]},"prompt":"What is the meaning of life?\n\n","has_new_line":false,"truncated":false,"stopped_eos":false,"stopped_word":false,"stopped_limit":true,"stopping_word":"","tokens_cached":263,"timings":{"prompt_n":8,"prompt_ms":51.324,"prompt_per_token_ms":6.4155,"prompt_per_second":155.8724962980282,"predicted_n":256,"predicted_ms":4874.52,"predicted_per_token_ms":19.04109375,"predicted_per_second":52.51799151506199},"index":0}

Name and Version

./llama.cpp-b3938/build_gpu/bin/llama-server --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
version: 7 (d9a33c5)
built with cc (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0 for x86_64-linux-gnu

What operating system are you seeing the problem on?

Linux

Relevant log output

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bug-unconfirmedcritical severityUsed to report critical severity bugs in llama.cpp (e.g. Crashing, Corrupted, Dataloss)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions