Skip to content

OLMoE Q4_0 quant does not work #11862

@l3utterfly

Description

@l3utterfly

Name and Version

commit hash: a4f011e

Operating systems

Other? (Please let us know in description)

GGML backends

CPU

Hardware

Snapdragon 8 Gen 2

Models

Model is here: https://huggingface.co/allenai/OLMoE-1B-7B-0125-Instruct-GGUF/tree/main

Problem description & steps to reproduce

It is failing with the following error for arrch64:

llama.cpp/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4013: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed

Model is here: https://huggingface.co/allenai/OLMoE-1B-7B-0125-Instruct-GGUF/tree/main

Do you know why this error happens? Does the model need to be re-quanted?

First Bad Commit

No response

Relevant log output

llama.cpp/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4013: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions