Eval bug: Qwen 2.5 VL-3B subpar OCR performance compared to Transformers implementation

### Name and Version

./llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 ROCm devices:
  Device 0: AMD Radeon PRO W7900, gfx1100 (0x1100), VMM: no, Wave Size: 32
  Device 1: AMD Radeon PRO W7900, gfx1100 (0x1100), VMM: no, Wave Size: 32
version: 6602 (72b24d96)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu

### Operating systems

Linux

### GGML backends

HIP

### Hardware

2x Radeon Pro W7900

### Models

ggml-org/Qwen2.5-VL-3B-Instruct-GGUF F16 for both model and mmproj

### Problem description & steps to reproduce

When using Qwen2.5-VL-3B-Instruct-GGUF with F16 model and F16 mmproj, OCR performance is significantly worse compared to the same model and the same prompt in transformers.

For instance for this scientific paper:

<img width="1224" height="1584" alt="Image" src="https://github.com/user-attachments/assets/5565ca47-dbf9-4e44-9baf-bbc3df986517" />

Qwen 2.5-VL-3B via transformers outputs as follows:
```markdown
# Algorithms for the Markov Entropy Decomposition

Andrew J. Ferris and David Poulin

Département de Physique, Université de Sherbrooke, Québec, J1K 2R1, Canada

(Dated: October 31, 2018)

The Markov entropy decomposition (MED) is a recently-proposed, cluster-based simulation method for finite temperature quantum systems with arbitrary geometry. In this paper, we detail numerical algorithms for performing the required steps of the MED, principally solving a minimization problem with a preconditioned Newton's algorithm, as well as how to extract global susceptibilities and thermal responses. We demonstrate the power of the method with the spin-1/2 XXZ model on the 2D square lattice, including the extraction of critical points and details of each phase. Although the method shares some qualitative similarities with exact-diagonalization, we show the MED is both more accurate and significantly more flexible.

PACS numbers: 05.10.-a, 02.50.Ng, 03.67.-a, 74.40.Kb

## I. INTRODUCTION

Although the equations governing quantum many-body systems are simple to write down, finding solutions for the majority of systems remains incredibly difficult. Modern physics finds itself in need of new tools to compute the emergent behavior of large, many-body systems.

There has been a great variety of tools developed to tackle many-body problems, but in general, large 2D and 3D quantum systems remain hard to deal with. Most systems are thought to be non-integrable, so exact analytic solutions are not usually expected. Direct numerical diagonalization can be performed for relatively small systems — however the emergent behavior of a system in the thermodynamic limit may be difficult to extract, especially in systems with large correlation lengths. Monte Carlo approaches are technically exact (up to sampling error), but suffer from the so-called sign problem for fermionic, frustrated, or dynamical problems. Thus we are limited to search for clever approximations to solve the majority of many-body problems.

Over the past century, hundreds of such approximations have been proposed, and we will mention just a few notable examples applicable to quantum lattice models. Mean-field theory is simple and frequently arrives at the correct qualitative description, but often fails when correlations are important. Density-matrix renormalisation group (DMRG) [1] is efficient and extremely accurate at solving 1D problems, but the computational cost grows exponentially with system size in two- or higher-dimensions [2, 3]. Related tensor-network techniques designed for 2D systems are still in their infancy [4-6]. Series-expansion methods [7] can be successful, but may diverge or otherwise converge slowly, obscuring the state in certain regimes. There exist a variety of cluster-based techniques, such as dynamical-mean-field theory [8] and density-matrix embedding [9].

Here we discuss the so-called Markov entropy decomposition (MED), recently proposed by Poulin & Hastings [10] (and analogous to a slightly earlier classical algorithm [11]). This is a self-consistent cluster method for finite temperature systems that takes advantage of an approximation of the (von Neumann) entropy. In [10], it was shown that the entropy per site can be rigorously upper bounded using only local information — a local, reduced density matrix on $N$ sites, say.

This approximation becomes exact in the case of a 1D quantum (or classical) Markov chain [10], and leads to an exponential reduction of cost for exact entropy calculations when the global density matrix is a higher-dimensional Markov network state [12, 13].

The second approximation used in the MED approach is related to the $N$-representibility problem. Given a set of local but overlapping reduced density matrices $\{\hat{\rho}_{i}\}$, it is a very challenging problem to determine if there exists a global density operator which is positive semi-definite and whose partial trace agrees with each $\hat{\rho}_{i}$. This problem is QMA-hard (the quantum analogue of NP) [14, 15], and is hopelessly difficult to enforce. Thus, the second approximation employed involves ignoring global consistency with a positive operator, while requiring local consistency on any overlapping regions between the $\hat{\rho}_{i}$. At the zero-temperature limit, the MED approach becomes analogous to the variational $n$th-order reduced density matrix approach, where positivity is enforced on all reduced density matrices of size $n$ [16-18].

The MED approach is an extremely flexible cluster method, applicable to both translationally invariant systems of any dimension in the thermodynamic limit, as well as finite systems or systems without translational invariance (e.g. disordered lattices, or harmonically trapped atoms in optical lattices). The free energy given by MED is guaranteed to lower bound the true free energy, which in turn lower-bounds the ground state energy — thus providing a natural complement to variational approaches which upper-bound the ground state energy. The ability to provide a rigorous ground-state energy window is a powerful validation tool, creating a very compelling reason to use this approach.

In this paper we present a pedagogical introduction to MED, including numerical implementation issues and applications to 2D quantum lattice models in the thermodynamic limit. In Sec. II, we give a brief derivation of the Markov entropy decomposition. Section III outlines a robust numerical strategy for optimizing the clusters that make up the decomposition. In Sec. IV we show how we can extend these algorithms to extract non-trivial information, such as specific heat and susceptibilities. We present an application of the method to the spin-1/2 XXZ model on a 2D square lattice in Sec. V, describing how to characterize the phase diagram and determine critical points, before concluding in Sec. VI.
```

Qwen 2.5-VL-3B via llama.cpp outputs as follows:
```markdown
Algorithms for the Markov Entropy Decomposition

Andrew J. Ferris, and David Poulin
Département de Physique, Université de Sherbrooke, Québec, J1K 2R1, Canada

The Markov entropy decomposition (MED) is a recently-proposed, cluster-based simulation method for finite temperature quantum many-body systems with arbitrary geometry. In this paper, we detail numerical algorithms for performing required steps of the MED, principally solving a preconditioned Newton's algorithm, as well as how to extract global susceptibilities and thermal responses. We demonstrate the power of the method with the spin-1/2 XXZ model on the 2D square lattice, including the extraction of critical points and details of each phase. Although the method shares some qualitative similarities with exact diagonalization, we show the MED is both more accurate and significantly more flexible.

PACS numbers: 05.10.-a, 02.50.Ng, 03.67.-a, 74.40.Kb

The Markov entropy decomposition (MED) is a recently-proposed, cluster-based simulation method for finite temperature quantum many-body systems with arbitrary geometry. In this paper, we detail numerical algorithms for performing required steps of the MED, principally solving a preconditioned Newton's algorithm, as well as how to extract global susceptibilities and thermal responses. We demonstrate the power of the method with the spin-1/2 XXZ model on the 2D square lattice, including the extraction of critical points and details of each phase. Although the method shares some qualitative similarities with exact diagonalization, we show the MED is both more accurate and significantly more flexible.

PACS numbers: 05.10.-a, 02.50.Ng, 03.67.-a, 74.40.Kb
```

The llama.cpp implementation is not correct.

### First Bad Commit

_No response_

### Relevant log output

```shell
Full mtmd-cli output:
https://gist.github.com/AbdullahMPrograms/11a5ac3a1abb93fa3bf3629ef055da89

Transformers code:
https://gist.github.com/AbdullahMPrograms/28651bafb4f585b0548a98d4390eca9a
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: Qwen 2.5 VL-3B subpar OCR performance compared to Transformers implementation #16334

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: Qwen 2.5 VL-3B subpar OCR performance compared to Transformers implementation #16334

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions