-
Notifications
You must be signed in to change notification settings - Fork 13.8k
Description
Name and Version
./llama-cli --version
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (AMD EPYC 9654 96-Core Processor)
load_backend: failed to find ggml_backend_init in /data/ylwang/Projects/llama.cpp/build/bin/libggml-cpu.so
version: 7090 (0de8878)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0 for x86_64-linux-gnu
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Command line
./llama-server -m 0.gguf --host 0.0.0.0Problem description & steps to reproduce
PoC
import requests
a = int(2147483648/2)
resp = requests.post(
"http://localhost:8080/v1/chat/completions",
json={
"messages": [
{"role": "\n" * a, "content": "\n" * a}
],
"max_tokens": 20,
}
)
print(resp.text)
When setting breakpoints on line 3030 and 3037 in
./llama.cpp/common/chat.cpp inside llama-server:
(gdb) n
3030 for (size_t i = 0; i < contents.size(); ++i) {
(gdb) p alloc_size
$16 = -2147483648
(gdb) n
At this point, alloc_size overflows into a negative value.
Thread 8 "llama-server" hit Breakpoint 5, common_chat_templates_apply_legacy (tmpls=0x50300029f170, inputs=...) at /data/ylwang/Projects/llama.cpp/common/chat.cpp:3037
3037 std::vector<char> buf(alloc_size);
Here, alloc_size is treated as a uint64, producing a very large value, which results in an error when such a large memory allocation is attempted.
got exception: {"error":{"code":500,"message":"cannot create std::vector larger than max_size()","type":"server_error"}}
Root Cause Analysis
In this code:
int alloc_size = 0;
...
alloc_size += (msg.role.size() + content.size()) * 1.25;
Because an int is used to store the value of alloc_size, the calculation
alloc_size += (msg.role.size() + content.size()) * 1.25;
causes the result to become negative due to integer overflow.
Fix Recommendation
Use unsigned int or size_t to store the value of alloc_size.
First Bad Commit
No response