-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Contact Details
No response
What happened?
Cmd:
./llama32.llamafile --server --nobrowser --port 8080 --gpu nvidia --n-gpu-layers 999 --temp 0 --parallel 5
For certain inputs, llamafile crashes (see attached logs). Here is an example input:
curl http://127.0.0.1:8080/v1/chat/completions -H "Content-Type: application/json" -d '{ "stream": true, "messages": [{ "role": "system", "content": "\n The following is an audio transcript:\n\n === Begin Transcript ===\n[00:01:30.160 - 00:01:31.700]是照抗黃金的不二人選\n[00:01:31.700 - 00:01:33.700]誰知尼克聽後卻持反對意見\n[00:01:33.700 - 00:01:35.220]這座金山可是自己發現的\n[00:01:35.220 - 00:01:36.620]他死都不會離開這裡半步\n[00:01:36.620 - 00:01:38.180]眼看尼克依舊一意孤行\n[00:01:38.180 - 00:01:40.540]同伴為其科普出無人區的突發情況\n[00:01:40.540 - 00:01:42.780]在這裡不光要應對各種猛獸的襲擊\n[00:01:42.780 - 00:01:44.540]還要面臨荒漠裡的極端天氣\n[00:01:44.540 - 00:01:46.300]如果有人經過也不能打招呼\n[00:01:46.300 - 00:01:47.500]更不能暴露金子出來\n[00:01:47.500 - 00:01:49.340]如果沒有食物還得自己去狩獵\n[00:01:49.340 - 00:01:50.900]他勸尼克要三思而後行\n[00:01:50.900 - 00:01:52.620]不料尼克聽後依舊不為所動\n[00:01:52.620 - 00:01:54.200]這個守家的任務非他莫屬\n[00:01:54.200 - 00:01:55.500]眼看尼克心意已決\n[00:01:55.500 - 00:01:56.900]同伴也不好再多說什麼\n[00:01:56.900 - 00:01:59.000]隔天一早兩人便開始分配物資\n[00:01:59.000 - 00:02:00.980]同伴以冷卻汽車引擎為由\n[00:02:00.980 - 00:02:02.120]拿走一大部分水源\n[00:02:02.120 - 00:02:04.320]臨走之際他丟給尼克一部衛星電話\n[00:02:04.320 - 00:02:06.380]表示途中就靠這個來相互聯繫\n[00:02:14.560 - 00:02:16.580]說完同伴便駕車揚塵而去\n[00:02:16.580 - 00:02:17.640]碩大的無人區\n[00:02:17.640 - 00:02:19.300]頓時就剩下尼克孤身一人\n[00:02:19.300 - 00:02:21.560]隨後他用沙土掩蓋金身的表面\n[00:02:21.560 - 00:02:24.020]接著用棚布製作出一個簡易的遮陽長符\n[00:02:24.020 - 00:02:25.320]為了抵禦夜晚的野狗群\n[00:02:25.320 - 00:02:27.000]他提前撿拾一些木柴回來\n[00:02:27.000 - 00:02:28.280]憑藉火堆的加持\n[00:02:28.280 - 00:02:29.920]他有驚無險度過了第一晚\n[00:02:29.920 - 00:02:31.560]然而隔天一起床他卻發現\n[00:02:31.560 - 00:02:33.620]不遠處又有什麼東西在閃閃發光\n[00:02:33.620 - 00:02:34.920]難不成又有黃金嗎\n[00:02:34.920 - 00:02:36.640]帶著一夥尼克來到跟前一看\n[00:02:36.640 - 00:02:38.960]才發現近視下破爛不堪的飛機殘骸\n[00:02:38.960 - 00:02:40.540]來到面前打量一番後\n[00:02:40.540 - 00:02:42.180]發現這是一個完美的棲身之地\n[00:02:42.180 - 00:02:43.880]內部的隔熱效果也是極佳\n[00:02:43.880 - 00:02:45.280]但很快他便坐立難安\n[00:02:45.280 - 00:02:46.780]因為這裡根本看不到金身\n[00:02:46.780 - 00:02:48.480]為了回到心心念念的金字旁\n[00:02:48.480 - 00:02:50.960]他果斷將飛機的零部件都一一拆卸下來\n[00:02:50.960 - 00:02:52.640]接著再將其拖回營地中\n[00:02:52.640 - 00:02:53.620]一番搭建過後\n[00:02:53.620 - 00:02:55.460]他總算是有了一處棲身之所\n[00:02:55.460 - 00:02:57.200]望著近在咫尺的狗頭金身\n[00:02:57.200 - 00:02:58.880]他才敢放心大膽的小氣\n[00:02:58.880 - 00:03:01.200]可頭頂的烈焰卻使他無法坦然入水\n[00:03:01.200 - 00:03:03.200]地表溫度甚至高達五十多度\n[00:03:03.200 - 00:03:05.160]長此以往他的額頭以及脖梗處\n === End Transcript ===\n\n You are a native Assamese speaker for over 50 years. Here are your instructions:\n\n === Begin Instructions ===\n - Rewrite the above section in Assamese.\n - Avoid anglicisms where sensible and instead formulate the words in Assamese.\n - Formulate short, concise sentences that are easy to comprehend.\n - Keep the original transcript structure intact.\n - Do not add comments.\n - Only reply with the original timestamps and translated text, nothing else.\n - Make sure that all lines contained in the transcript are translated, even if they are very short.\n - Include each timestamp with its associated translation.\n === End Instructions ===\n" }] }'
I could not reproduce the same crash locally, but there is something strange going on here too: After streaming successfully for a while, the server logs begin looping this endlessly:
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 1024
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 512
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 256
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 128
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 64
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 32
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 16
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 8
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 4
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 2
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 1
update_slots : failed to decode the batch, n_batch = 1, ret = 1
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 1024
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 512
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 256
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 128
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 64
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 32
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 16
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 8
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 4
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 2
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 1
update_slots : failed to decode the batch, n_batch = 1, ret = 1
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 1024
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 512
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 256
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 128
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 64
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 32
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 16
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 8
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 4
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 2
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 1
update_slots : failed to decode the batch, n_batch = 1, ret = 1
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 1024
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 512
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 256
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 128
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 64
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 32
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 16
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 8
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 4
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 2
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 1
update_slots : failed to decode the batch, n_batch = 1, ret = 1
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 1024
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 512
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 256
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 128
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 64
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 32
update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 16
Version
llamafile v0.9.0
What operating system are you seeing the problem on?
Linux
Relevant log output
{"function":"update_slots","level":"INFO","line":1705,"msg":"slot context shift","n_cache_tokens":1638,"n_ctx":8192,"n_discard":1974,"n_keep":1,"n_left":3948,"n_past":3949,"n_system_tokens":0,"slot_id":3,"task_id":315,"tid":"12265824","timestamp":1750175977}
libc++abi: terminating due to uncaught exception of type std::length_error: vector
error: Uncaught SIGABRT (SI_TKILL) at 0x3e90008dd44 on xv14379 pid 580932 tid 580932
/opt/llama/llama32.llamafile
No such file or directory
Linux Cosmopolitan 4.0.2 MODE=x86_64; #63-Ubuntu SMP PREEMPT_DYNAMIC Tue Apr 15 19:04:15 UTC 2025 xv14379 6.8.0-60-generic
RAX 0000000000000000 RBX 0000000000000006 RDI 000000000008dd44
RCX 000000000093fd51 RDX 0000000000000000 RSI 0000000000000006
RBP 00007ffc9e183a90 RSP 00007ffc9e183a90 RIP 000000000093fd51
R8 0000000000000000 R9 0000000000000000 R10 000000000093fd51
R11 0000000000000296 R12 0000000000ac9e48 R13 000000000094f6c0
R14 0000000000ac9e30 R15 00007ffc9e183f60
TLS 0000000000b95f00
XMM0 00000000000000000000000000000000 XMM8 00000000000000000000000035800000
XMM1 00000000000000000000000000000000 XMM9 ffffffffffffffffffffffffffffffff
XMM2 000000000000000000007ffc9e182bb0 XMM10 ffffffffffffffffffffffffffffffff
XMM3 226c6576656c222c2273746f6c735f65 XMM11 00000000000000000000000000000000
XMM4 6f6c73223a2267736d222c353037313a XMM12 00000000000000000000000000000000
XMM5 22656e696c222c224f464e49223a226c XMM13 ffffffffffffffffffffffffffffffff
XMM6 6576656c222c2273746f6c735f657461 XMM14 00007c34bc45f70800007c34bc45f710
XMM7 647075223a226e6f6974636e7566227b XMM15 00007c34bc45f71800007c34bc45f720
cosmoaddr2line /opt/llama/llama32.llamafile 93fd51 92918b 407818 94f56a 94f66e 9017e6 901412 4291d1 42a5bb 42a595 426e2f 47dd03 44b5d6 4427a0 41dde2 4040dc 4015f4
0x000000000093fd51: ?? ??:0
0x000000000092918b: ?? ??:0
0x0000000000407818: ?? ??:0
0x000000000094f56a: ?? ??:0
0x000000000094f66e: ?? ??:0
0x00000000009017e6: ?? ??:0
0x0000000000901412: ?? ??:0
0x00000000004291d1: ?? ??:0
0x000000000042a5bb: ?? ??:0
0x000000000042a595: ?? ??:0
0x0000000000426e2f: ?? ??:0
0x000000000047dd03: ?? ??:0
0x000000000044b5d6: ?? ??:0
0x00000000004427a0: ?? ??:0
0x000000000041dde2: ?? ??:0
0x00000000004040dc: ?? ??:0
0x00000000004015f4: ?? ??:0
000000400000-000000ae21e0 r-xi- 7048kb
000000ae3000-000003251000 rw-i- 39mb
000003251000-0006fe000000 28gb
0006fe000000-0006fe001000 rw-pa 4096b
0006fe001000-7c32ee000000 124tb
7c32ee000000-7c32f2000000 rw-pa 64mb
7c32f2000000-7c33cdc00000 3516mb
7c33cdc00000-7c33ce400000 rw-pa 8192kb
7c33ce400000-7c33da580000 194mb
7c33da580000-7c3477ee0080 r--s- 2521mb
7c3477ee1000-7c347c600000 71mb
7c347c600000-7c3480000000 rw-pa 58mb
7c3480000000-7c3490200000 258mb
7c3490200000-7c3491200000 rw-pa 16mb
7c3491200000-7c34be89f000 727mb
7c34be89f000-7c34be8a0000 ---pa 4096b
7c34be8a0000-7c34be8b4000 rw-pa 80kb
7c34be8b4000-7c34be8b5000 ---pa 4096b
7c34be8b5000-7c34be8c9000 rw-pa 80kb
7c34be8c9000-7c34be8ca000 ---pa 4096b
7c34be8ca000-7c34be8de000 rw-pa 80kb
7c34be8de000-7c34c5282000 106mb
7c34c5282000-7c34c5283000 ---pa 4096b
# 5'730'791'424 bytes in 31 mappings
/opt/llama/llama32.llamafile --server --nobrowser --port 8085 --parallel 5 --gpu nvidia --n-gpu-layers 999 --temp 0
systemd[1]: llama.service: Main process exited, code=exited, status=134/n/a
systemd[1]: llama.service: Failed with result 'exit-code'.
systemd[1]: llama.service: Consumed 9min 5.683s CPU time.
systemd[1]: llama.service: Scheduled restart job, restart counter is at 1.