OLMoE Q4_0 quant does not work

### Name and Version

commit hash: a4f011e8d02179f032627130f961eb77ee30401c

### Operating systems

Other? (Please let us know in description)

### GGML backends

CPU

### Hardware

Snapdragon 8 Gen 2

### Models

Model is here: https://huggingface.co/allenai/OLMoE-1B-7B-0125-Instruct-GGUF/tree/main

### Problem description & steps to reproduce

It is failing with the following error for arrch64:

`llama.cpp/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4013: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed`

Model is here: https://huggingface.co/allenai/OLMoE-1B-7B-0125-Instruct-GGUF/tree/main

Do you know why this error happens? Does the model need to be re-quanted?

### First Bad Commit

_No response_

### Relevant log output

```shell
llama.cpp/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4013: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OLMoE Q4_0 quant does not work #11862

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

OLMoE Q4_0 quant does not work #11862

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions