-
Notifications
You must be signed in to change notification settings - Fork 13.2k
Open
Labels
Description
Name and Version
llama-server --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
version: 6674 (5113efd3)
built with cc (Ubuntu 12.4.0-2ubuntu1~24.04) 12.4.0 for x86_64-linux-gnu
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Command line
llama-server -m Qwen3-Reranker-8B-q4_k_s.gguf -c 4096 -ngl 99 --host 0.0.0.0 --port 8181 --prio 2 --no-webui -ctk q4_0 -ctv q4_0 -fa auto --rerank
Problem description & steps to reproduce
the reranking give bad result with qwen3-reranker (0.6B 4B 8B), bge, mxbai
llama-server -m Qwen3-Reranker-8B-q4_k_s.gguf -c 4096 -ngl 99 --host 0.0.0.0 --port 8181 --prio 2 --no-webui -ctk q4_0 -ctv q4_0 -fa auto --rerank
I test with this sh script :
#!/bin/bash
# Set default URL if not provided
URL=${1:-http://127.0.0.1:8181}
curl "$URL/v1/rerank" -H "Content-Type: application/json" \
-d '{ "model": "M", "query": "What is the recipe to make bread ?",
"return_text" : false,
"texts" : true,
"top_n": 6,"documents": [
"voici la recette pour faire du pain, il faut de la farine de l eau et du levain et du sel",
"it is a bear",
"bread recipe : floor, water, yest, salt",
"The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.",
"here is the ingedients to bake bread : 500g floor, 350g water, 120g fresh refresh yest, 15g salt",
"recipe to make cookies : floor, eggs, water, chocolat",
"here is the recipe to make bread : 500g floor, 350g water, 120g fresh refresh yest, 15g salt",
"il fait tres beau aujourd hui",
"je n ai pas faim, je ne veux pas manger",
"je suis a paris"
] }' | jq
I alway get result like this :
[
{
"index": 5,
"score": 1.1353239058953662E-28
},
{
"index": 3,
"score": 3.111864641067425E-29
},
{
"index": 8,
"score": 2.3408178355156106E-29
},
{
"index": 9,
"score": 2.6804792039427738E-30
},
{
"index": 1,
"score": 6.065239931987211E-32
},
{
"index": 2,
"score": 5.335152733382172E-32
}
]
the only models that seems works with llama-server is jina-reranker
llama-server -m jina-reranker-v2-base-multilingual-Q8_0.gguf -c 16000 -ngl 99 --host 0.0.0.0 --port 8181 --prio 2 --no-webui -ctk q4_0 -ctv q4_0 -fa auto --rerank
the result is not too bad but is very differant from https://jina.ai/reranker/
result with llama-server :
[
{
"index": 6,
"score": 0.7979143261909485
},
{
"index": 0,
"score": 0.3886369466781616
},
{
"index": 2,
"score": 0.2865810990333557
},
{
"index": 4,
"score": -0.5105927586555481
},
{
"index": 5,
"score": -1.9573085308074951
},
{
"index": 8,
"score": -3.1544036865234375
}
]
result in https://jina.ai/reranker/
same model, same content
[
{
"index": 6,
"relevance_score": 0.69595832
},
{
"index": 0,
"relevance_score": 0.60346454
},
{
"index": 2,
"relevance_score": 0.5559175
},
{
"index": 4,
"relevance_score": 0.3684057
},
{
"index": 5,
"relevance_score": 0.12085322
},
{
"index": 7,
"relevance_score": 0.04146227
}
]
First Bad Commit
No response