Getting bizarre output on Metal #1862

battaglia01 · 2024-02-13T08:00:07Z

I'm getting very strange output when running some of the basic samples from Metal on an M1 Max. In particular, I'm getting this for jfk.wav, with the CoreML base.en, basically from the instructions that are on the README right now:

[00:00:00.000 --> 00:00:07.200]   and my first and and and and and and and.
[00:00:07.200 --> 00:00:14.200]   and then, and, and, and, and, and,

I've installed all of the pip libraries and etc. Here's the entire output log from generating the coreml file onward:

CoreML

$ ./models/generate-coreml-model.sh base.en
ModelDimensions(n_mels=80, n_audio_ctx=1500, n_audio_state=512, n_audio_head=8, n_audio_layer=6, n_vocab=51864, n_text_ctx=448, n_text_state=512, n_text_head=8, n_text_layer=6)
/Users/mike/Projects/LLMs/whisper.cpp-metal/models/convert-whisper-to-coreml.py:137: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert x.shape[1:] == self.positional_embedding.shape[::-1], "incorrect audio shape"
/Users/mike/Library/miniconda3/envs/whisper/lib/python3.10/site-packages/ane_transformers/reference/layer_norm.py:60: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert inputs.size(1) == self.num_channels
/Users/mike/Projects/LLMs/whisper.cpp-metal/models/convert-whisper-to-coreml.py:77: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  dim_per_head = dim // self.n_head
/Users/mike/Projects/LLMs/whisper.cpp-metal/models/convert-whisper-to-coreml.py:79: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  scale = float(dim_per_head)**-0.5
Converting PyTorch Frontend ==> MIL Ops: 100%|████████████████████████████████████████████████████████████████████████████████▉| 821/822 [00:00<00:00, 8861.61 ops/s]
Running MIL frontend_pytorch pipeline: 100%|█████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 337.22 passes/s]
Running MIL default pipeline: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 71/71 [00:02<00:00, 27.99 passes/s]
Running MIL backend_mlprogram pipeline: 100%|██████████████████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 493.93 passes/s]
done converting
/Users/mike/Projects/LLMs/whisper.cpp-metal/models/coreml-encoder-base.en.mlmodelc/coremldata.bin
models/coreml-encoder-base.en.mlmodelc -> models/ggml-base.en-encoder.mlmodelc

Running whisper.cpp

$ ./main -m models/ggml-base.en.bin -f samples/jfk.wav
whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 2 (base)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs       = 99
whisper_backend_init: using Metal backend
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M1 Max
ggml_metal_init: picking default device: Apple M1 Max
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil
ggml_metal_init: loading '/Users/mike/Projects/LLMs/whisper.cpp-metal/ggml-metal.metal'
ggml_metal_init: GPU name:   Apple M1 Max
ggml_metal_init: GPU family: MTLGPUFamilyApple7  (1007)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: simdgroup reduction support   = true
ggml_metal_init: simdgroup matrix mul. support = true
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 51539.61 MB
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =   140.55 MiB, (  141.36 / 49152.00)
whisper_model_load:    Metal total size =   147.37 MB
whisper_model_load: model size    =  147.37 MB
whisper_backend_init: using Metal backend
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M1 Max
ggml_metal_init: picking default device: Apple M1 Max
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil
ggml_metal_init: loading '/Users/mike/Projects/LLMs/whisper.cpp-metal/ggml-metal.metal'
ggml_metal_init: GPU name:   Apple M1 Max
ggml_metal_init: GPU family: MTLGPUFamilyApple7  (1007)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: simdgroup reduction support   = true
ggml_metal_init: simdgroup matrix mul. support = true
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 51539.61 MB
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =    15.75 MiB, (  157.61 / 49152.00)
whisper_init_state: kv self size  =   16.52 MB
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =    17.58 MiB, (  175.19 / 49152.00)
whisper_init_state: kv cross size =   18.43 MB
whisper_init_state: loading Core ML model from 'models/ggml-base.en-encoder.mlmodelc'
whisper_init_state: first run on a device may take a while ...
whisper_init_state: Core ML model loaded
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =     3.86 MiB, (  179.05 / 49152.00)
whisper_init_state: compute buffer (conv)   =    5.74 MB
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =     2.94 MiB, (  181.98 / 49152.00)
whisper_init_state: compute buffer (cross)  =    4.78 MB
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =    90.39 MiB, (  272.38 / 49152.00)
whisper_init_state: compute buffer (decode) =   96.48 MB

system_info: n_threads = 4 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | METAL = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | CUDA = 0 | COREML = 1 | OPENVINO = 0 |

main: processing 'samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:07.200]   and my first and and and and and and and.
[00:00:07.200 --> 00:00:14.200]   and then, and, and, and, and, and,


whisper_print_timings:     load time =   139.33 ms
whisper_print_timings:     fallbacks =   3 p /   0 h
whisper_print_timings:      mel time =     6.35 ms
whisper_print_timings:   sample time =   678.18 ms /  2355 runs (    0.29 ms per run)
whisper_print_timings:   encode time =   165.31 ms /     2 runs (   82.65 ms per run)
whisper_print_timings:   decode time =    19.68 ms /     7 runs (    2.81 ms per run)
whisper_print_timings:   batchd time =  3077.60 ms /  2328 runs (    1.32 ms per run)
whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time = 14592.51 ms
ggml_metal_free: deallocating
ggml_metal_free: deallocating

If I run the same thing without coreml I get the correct output. Has something changed since this PR was added?

The text was updated successfully, but these errors were encountered:

ggerganov · 2024-02-13T09:39:35Z

Which PR? Are you using the latest master?

battaglia01 · 2024-02-13T12:13:23Z

I meant the PR where the Metal support was added. Yes, this is the latest master and cloned right before.

milsun · 2024-02-13T12:18:10Z

I am facing the same issue, on Apple M2 Pro.

ggerganov · 2024-02-13T12:47:11Z

Hm strange, it works on my M1 Pro and M2 Ultra

ggerganov · 2024-02-13T13:12:01Z

Can you try a completely fresh build:

git clone https://github.com/ggerganov/whisper.cpp whisper.cpp-coreml
cd whisper.cpp-coreml

# download base.en
./models/download-ggml-model.sh base.en

# create CoreML model
python3 -m venv .coreml
source ./.coreml/bin/activate
pip3 install -r ./models/requirements-coreml.txt 
./models/generate-coreml-model.sh base.en

# build and run
WHISPER_COREML=1 make -j && ./main -m models/ggml-base.en.bin -f samples/jfk.wav

battaglia01 · 2024-02-14T03:44:36Z

Yes, same thing. I just ran all that and the output is

[00:00:00.000 --> 00:00:07.200]   and my first and and and and and and and.
[00:00:07.200 --> 00:00:14.200]   and then, and, and, and, and, and

ggerganov · 2024-02-14T08:37:43Z

Hm strange. And can you confirm that it works if you checkout v1.5.4 ?

milsun · 2024-02-14T08:38:49Z

Possibly something to do with Swift version, maybe, a wild guess. I am using Swift version 5.8, btw.

milsun · 2024-02-15T17:23:35Z

Can you try a completely fresh build:

git clone https://github.com/ggerganov/whisper.cpp whisper.cpp-coreml
cd whisper.cpp-coreml

# download base.en
./models/download-ggml-model.sh base.en

# create CoreML model
python3 -m venv .coreml
source ./.coreml/bin/activate
pip3 install -r ./models/requirements-coreml.txt 
./models/generate-coreml-model.sh base.en

# build and run
WHISPER_COREML=1 make -j && ./main -m models/ggml-base.en.bin -f samples/jfk.wav

I get the following output:

[00:00:00.000 --> 00:00:30.000] and my first

milsun · 2024-02-15T18:25:48Z

Hm strange. And can you confirm that it works if you checkout v1.5.4 ?

Gives following output, for this too:
[00:00:00.000 --> 00:00:30.000] and my first

ggerganov · 2024-02-16T09:46:21Z

So this means that it is not caused by any recent changes.
Did CoreML ever work for you? If yes, can you find the latest commit where it works?

battaglia01 · 2024-02-16T21:12:16Z

On v1.5.4, I still get the same thing:

[00:00:00.000 --> 00:00:07.200]   and my first and and and and and and and.
[00:00:07.200 --> 00:00:14.200]   and then, and, and, and, and, and,

Which is, strangely, slightly different from what milsun is getting above.

I'll see if I can figure out some earlier commit that works.

battaglia01 · 2024-02-16T21:25:53Z

OK, it does work in commit 5e47e22, which is where CoreML was added.

[00:00:00.000 --> 00:00:11.000]   And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.

I note for both this commit and v1.5.4, I used conda instead of venv as ./models/requirements-coreml.txt wasn't there.

milsun · 2024-02-20T07:28:16Z

```shell
WHISPER_COREML=1 make -j && ./main -m models/ggml-base.en.bin -f samples/jfk.wav

The commit seems to work. Thanks!

milsun · 2024-02-29T09:29:58Z

Can you try a completely fresh build:

git clone https://github.com/ggerganov/whisper.cpp whisper.cpp-coreml
cd whisper.cpp-coreml

# download base.en
./models/download-ggml-model.sh base.en

# create CoreML model
python3 -m venv .coreml
source ./.coreml/bin/activate
pip3 install -r ./models/requirements-coreml.txt 
./models/generate-coreml-model.sh base.en

# build and run
WHISPER_COREML=1 make -j && ./main -m models/ggml-base.en.bin -f samples/jfk.wav

Tried the latest code on M3, works.

gavin1818 mentioned this issue Mar 4, 2024

Update README to Recommend MacOS Sonoma for Core ML to avoid hallucination #1917

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting bizarre output on Metal #1862

Getting bizarre output on Metal #1862

battaglia01 commented Feb 13, 2024

ggerganov commented Feb 13, 2024

battaglia01 commented Feb 13, 2024

milsun commented Feb 13, 2024

ggerganov commented Feb 13, 2024

ggerganov commented Feb 13, 2024

battaglia01 commented Feb 14, 2024

ggerganov commented Feb 14, 2024

milsun commented Feb 14, 2024

milsun commented Feb 15, 2024

milsun commented Feb 15, 2024 •

edited

ggerganov commented Feb 16, 2024

battaglia01 commented Feb 16, 2024

battaglia01 commented Feb 16, 2024

milsun commented Feb 20, 2024

milsun commented Feb 29, 2024

Getting bizarre output on Metal #1862

Getting bizarre output on Metal #1862

Comments

battaglia01 commented Feb 13, 2024

CoreML

Running whisper.cpp

ggerganov commented Feb 13, 2024

battaglia01 commented Feb 13, 2024

milsun commented Feb 13, 2024

ggerganov commented Feb 13, 2024

ggerganov commented Feb 13, 2024

battaglia01 commented Feb 14, 2024

ggerganov commented Feb 14, 2024

milsun commented Feb 14, 2024

milsun commented Feb 15, 2024

milsun commented Feb 15, 2024 • edited

ggerganov commented Feb 16, 2024

battaglia01 commented Feb 16, 2024

battaglia01 commented Feb 16, 2024

milsun commented Feb 20, 2024

milsun commented Feb 29, 2024

milsun commented Feb 15, 2024 •

edited