-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Description
What happened?
Hi there.
My llama-server initially worked fine, but after receiving a request with illegal characters, it started generating garbled responses to all valid requests.
For example, I start my llama-server with:
./llama.cpp-b3938/build_gpu/bin/llama-server -m ../models/Meta-Llama-3-8B.Q4_K_M.gguf -ngl 33 --cache-type-k q8_0And I send a request with legal characters at first:
curl --request POST --url http://localhost:8080/completion --header "Content-Type: application/json" --data '{ "prompt": "What is the meaning of life?\n\n", "n_predict": 256 }' The response is as expected:
{"content":"The meaning of life is whatever you want it to be. But in a more serious sense, the meaning of life is to experience life, and to learn from experience.\n\nWhat is the most important thing in life?\n\nThe most important thing in life is love, because without love, life would have no meaning.","id_slot":0,"stop":true,"model":"../models/Meta-Llama-3-8B.Q4_K_M.gguf","tokens_predicted":63,"tokens_evaluated":8,"generation_settings":{"n_ctx":8192,"n_predict":-1,"model":"../models/Meta-Llama-3-8B.Q4_K_M.gguf","seed":4294967295,"seed_cur":1591500953,"temperature":0.800000011920929,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"top_k":40,"top_p":0.949999988079071,"min_p":0.05000000074505806,"xtc_probability":0.0,"xtc_threshold":0.10000000149011612,"tfs_z":1.0,"typical_p":1.0,"repeat_last_n":64,"repeat_penalty":1.0,"presence_penalty":0.0,"frequency_penalty":0.0,"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.10000000149011612,"penalize_nl":false,"stop":[],"max_tokens":256,"n_keep":0,"n_discard":0,"ignore_eos":false,"stream":false,"n_probs":0,"min_keep":0,"grammar":"","samplers":["top_k","tfs_z","typ_p","top_p","min_p","xtc","temperature"]},"prompt":"What is the meaning of life?\n\n","has_new_line":true,"truncated":false,"stopped_eos":true,"stopped_word":false,"stopped_limit":false,"stopping_word":"","tokens_cached":70,"timings":{"prompt_n":8,"prompt_ms":51.577,"prompt_per_token_ms":6.447125,"prompt_per_second":155.1078969308025,"predicted_n":63,"predicted_ms":1169.143,"predicted_per_token_ms":18.557825396825397,"predicted_per_second":53.88562391426883},"index":0}
And then I send a request with illegal characters:
curl --request POST --url http://localhost:8080/completion --header "Content-Type: application/json" --data '{ "prompt": "What is the meaning of ÿþÿþÿþÿþÿþÿþÿþî?\n\n", "n_predict": 256 }'And I get a inconherent response, which is also as expected:
{"content":"GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG","id_slot":0,"stop":true,"model":"../models/Meta-Llama-3-8B.Q4_K_M.gguf","tokens_predicted":256,"tokens_evaluated":29,"generation_settings":{"n_ctx":8192,"n_predict":-1,"model":"../models/Meta-Llama-3-8B.Q4_K_M.gguf","seed":4294967295,"seed_cur":4216084932,"temperature":0.800000011920929,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"top_k":40,"top_p":0.949999988079071,"min_p":0.05000000074505806,"xtc_probability":0.0,"xtc_threshold":0.10000000149011612,"tfs_z":1.0,"typical_p":1.0,"repeat_last_n":64,"repeat_penalty":1.0,"presence_penalty":0.0,"frequency_penalty":0.0,"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.10000000149011612,"penalize_nl":false,"stop":[],"max_tokens":256,"n_keep":0,"n_discard":0,"ignore_eos":false,"stream":false,"n_probs":0,"min_keep":0,"grammar":"","samplers":["top_k","tfs_z","typ_p","top_p","min_p","xtc","temperature"]},"prompt":"What is the meaning of ÿþÿþÿþÿþÿþÿþÿþî?\n\n","has_new_line":false,"truncated":false,"stopped_eos":false,"stopped_word":false,"stopped_limit":true,"stopping_word":"","tokens_cached":284,"timings":{"prompt_n":29,"prompt_ms":53.786,"prompt_per_token_ms":1.854689655172414,"prompt_per_second":539.1737626891755,"predicted_n":256,"predicted_ms":4888.785,"predicted_per_token_ms":19.09681640625,"predicted_per_second":52.364749114555046},"index":0}
However, after that, I get garbled responses to all valid requests:
curl --request POST --url http://localhost:8080/completion --header "Content-Type: application/json" --data '{ "prompt": "What is the meaning of life?\n\n", "n_predict": 256 }'And I still get the garbled response:
{"content":"GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG","id_slot":0,"stop":true,"model":"../models/Meta-Llama-3-8B.Q4_K_M.gguf","tokens_predicted":256,"tokens_evaluated":8,"generation_settings":{"n_ctx":8192,"n_predict":-1,"model":"../models/Meta-Llama-3-8B.Q4_K_M.gguf","seed":4294967295,"seed_cur":3234876351,"temperature":0.800000011920929,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"top_k":40,"top_p":0.949999988079071,"min_p":0.05000000074505806,"xtc_probability":0.0,"xtc_threshold":0.10000000149011612,"tfs_z":1.0,"typical_p":1.0,"repeat_last_n":64,"repeat_penalty":1.0,"presence_penalty":0.0,"frequency_penalty":0.0,"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.10000000149011612,"penalize_nl":false,"stop":[],"max_tokens":256,"n_keep":0,"n_discard":0,"ignore_eos":false,"stream":false,"n_probs":0,"min_keep":0,"grammar":"","samplers":["top_k","tfs_z","typ_p","top_p","min_p","xtc","temperature"]},"prompt":"What is the meaning of life?\n\n","has_new_line":false,"truncated":false,"stopped_eos":false,"stopped_word":false,"stopped_limit":true,"stopping_word":"","tokens_cached":263,"timings":{"prompt_n":8,"prompt_ms":51.324,"prompt_per_token_ms":6.4155,"prompt_per_second":155.8724962980282,"predicted_n":256,"predicted_ms":4874.52,"predicted_per_token_ms":19.04109375,"predicted_per_second":52.51799151506199},"index":0}
Name and Version
./llama.cpp-b3938/build_gpu/bin/llama-server --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
version: 7 (d9a33c5)
built with cc (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0 for x86_64-linux-gnu
What operating system are you seeing the problem on?
Linux
Relevant log output
No response