Skip to content

Misc. bug: server/rerank output result is wrong with most models include qwen3-Rerank #16407

@YannFollet

Description

@YannFollet

Name and Version

llama-server --version

ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
version: 6674 (5113efd3)
built with cc (Ubuntu 12.4.0-2ubuntu1~24.04) 12.4.0 for x86_64-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-server

Command line

llama-server -m Qwen3-Reranker-8B-q4_k_s.gguf -c 4096 -ngl 99  --host 0.0.0.0 --port 8181 --prio 2 --no-webui -ctk q4_0 -ctv q4_0 -fa auto --rerank

Problem description & steps to reproduce

the reranking give bad result with qwen3-reranker (0.6B 4B 8B), bge, mxbai
llama-server -m Qwen3-Reranker-8B-q4_k_s.gguf -c 4096 -ngl 99 --host 0.0.0.0 --port 8181 --prio 2 --no-webui -ctk q4_0 -ctv q4_0 -fa auto --rerank

I test with this sh script :

#!/bin/bash

# Set default URL if not provided
URL=${1:-http://127.0.0.1:8181}

curl "$URL/v1/rerank" -H "Content-Type: application/json" \
 -d '{ "model": "M", "query": "What is the recipe to make bread ?",
 "return_text" : false,
 "texts" : true,
 "top_n": 6,"documents": [
 "voici la recette pour faire du pain, il faut de la farine de l eau et du levain et du sel",
 "it is a bear",
 "bread recipe : floor, water, yest, salt",
 "The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.",
 "here is the ingedients to bake bread : 500g floor, 350g water, 120g fresh refresh yest, 15g salt",
 "recipe to make cookies : floor, eggs, water, chocolat",
 "here is the recipe to make bread : 500g floor, 350g water, 120g fresh refresh yest, 15g salt",
 "il fait tres beau aujourd hui",
 "je n ai pas faim, je ne veux pas manger",
 "je suis a paris"
 ] }' | jq

I alway get result like this :

[
  {
    "index": 5,
    "score": 1.1353239058953662E-28
  },
  {
    "index": 3,
    "score": 3.111864641067425E-29
  },
  {
    "index": 8,
    "score": 2.3408178355156106E-29
  },
  {
    "index": 9,
    "score": 2.6804792039427738E-30
  },
  {
    "index": 1,
    "score": 6.065239931987211E-32
  },
  {
    "index": 2,
    "score": 5.335152733382172E-32
  }
]

the only models that seems works with llama-server is jina-reranker
llama-server -m jina-reranker-v2-base-multilingual-Q8_0.gguf -c 16000 -ngl 99 --host 0.0.0.0 --port 8181 --prio 2 --no-webui -ctk q4_0 -ctv q4_0 -fa auto --rerank
the result is not too bad but is very differant from https://jina.ai/reranker/
result with llama-server :

[
  {
    "index": 6,
    "score": 0.7979143261909485
  },
  {
    "index": 0,
    "score": 0.3886369466781616
  },
  {
    "index": 2,
    "score": 0.2865810990333557
  },
  {
    "index": 4,
    "score": -0.5105927586555481
  },
  {
    "index": 5,
    "score": -1.9573085308074951
  },
  {
    "index": 8,
    "score": -3.1544036865234375
  }
]

result in https://jina.ai/reranker/ same model, same content

[
    {
      "index": 6,
      "relevance_score": 0.69595832
    },
    {
      "index": 0,
      "relevance_score": 0.60346454
    },
    {
      "index": 2,
      "relevance_score": 0.5559175
    },
    {
      "index": 4,
      "relevance_score": 0.3684057
    },
    {
      "index": 5,
      "relevance_score": 0.12085322
    },
    {
      "index": 7,
      "relevance_score": 0.04146227
    }
  ]

First Bad Commit

No response

Relevant log output

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions