Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting bizarre output on Metal #1862

Open
battaglia01 opened this issue Feb 13, 2024 · 15 comments
Open

Getting bizarre output on Metal #1862

battaglia01 opened this issue Feb 13, 2024 · 15 comments

Comments

@battaglia01
Copy link

I'm getting very strange output when running some of the basic samples from Metal on an M1 Max. In particular, I'm getting this for jfk.wav, with the CoreML base.en, basically from the instructions that are on the README right now:

[00:00:00.000 --> 00:00:07.200]   and my first and and and and and and and.
[00:00:07.200 --> 00:00:14.200]   and then, and, and, and, and, and,

I've installed all of the pip libraries and etc. Here's the entire output log from generating the coreml file onward:

CoreML

$ ./models/generate-coreml-model.sh base.en
ModelDimensions(n_mels=80, n_audio_ctx=1500, n_audio_state=512, n_audio_head=8, n_audio_layer=6, n_vocab=51864, n_text_ctx=448, n_text_state=512, n_text_head=8, n_text_layer=6)
/Users/mike/Projects/LLMs/whisper.cpp-metal/models/convert-whisper-to-coreml.py:137: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert x.shape[1:] == self.positional_embedding.shape[::-1], "incorrect audio shape"
/Users/mike/Library/miniconda3/envs/whisper/lib/python3.10/site-packages/ane_transformers/reference/layer_norm.py:60: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert inputs.size(1) == self.num_channels
/Users/mike/Projects/LLMs/whisper.cpp-metal/models/convert-whisper-to-coreml.py:77: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  dim_per_head = dim // self.n_head
/Users/mike/Projects/LLMs/whisper.cpp-metal/models/convert-whisper-to-coreml.py:79: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  scale = float(dim_per_head)**-0.5
Converting PyTorch Frontend ==> MIL Ops: 100%|████████████████████████████████████████████████████████████████████████████████▉| 821/822 [00:00<00:00, 8861.61 ops/s]
Running MIL frontend_pytorch pipeline: 100%|█████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 337.22 passes/s]
Running MIL default pipeline: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 71/71 [00:02<00:00, 27.99 passes/s]
Running MIL backend_mlprogram pipeline: 100%|██████████████████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 493.93 passes/s]
done converting
/Users/mike/Projects/LLMs/whisper.cpp-metal/models/coreml-encoder-base.en.mlmodelc/coremldata.bin
models/coreml-encoder-base.en.mlmodelc -> models/ggml-base.en-encoder.mlmodelc

Running whisper.cpp

$ ./main -m models/ggml-base.en.bin -f samples/jfk.wav
whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 2 (base)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs       = 99
whisper_backend_init: using Metal backend
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M1 Max
ggml_metal_init: picking default device: Apple M1 Max
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil
ggml_metal_init: loading '/Users/mike/Projects/LLMs/whisper.cpp-metal/ggml-metal.metal'
ggml_metal_init: GPU name:   Apple M1 Max
ggml_metal_init: GPU family: MTLGPUFamilyApple7  (1007)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: simdgroup reduction support   = true
ggml_metal_init: simdgroup matrix mul. support = true
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 51539.61 MB
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =   140.55 MiB, (  141.36 / 49152.00)
whisper_model_load:    Metal total size =   147.37 MB
whisper_model_load: model size    =  147.37 MB
whisper_backend_init: using Metal backend
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M1 Max
ggml_metal_init: picking default device: Apple M1 Max
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil
ggml_metal_init: loading '/Users/mike/Projects/LLMs/whisper.cpp-metal/ggml-metal.metal'
ggml_metal_init: GPU name:   Apple M1 Max
ggml_metal_init: GPU family: MTLGPUFamilyApple7  (1007)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: simdgroup reduction support   = true
ggml_metal_init: simdgroup matrix mul. support = true
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 51539.61 MB
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =    15.75 MiB, (  157.61 / 49152.00)
whisper_init_state: kv self size  =   16.52 MB
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =    17.58 MiB, (  175.19 / 49152.00)
whisper_init_state: kv cross size =   18.43 MB
whisper_init_state: loading Core ML model from 'models/ggml-base.en-encoder.mlmodelc'
whisper_init_state: first run on a device may take a while ...
whisper_init_state: Core ML model loaded
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =     3.86 MiB, (  179.05 / 49152.00)
whisper_init_state: compute buffer (conv)   =    5.74 MB
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =     2.94 MiB, (  181.98 / 49152.00)
whisper_init_state: compute buffer (cross)  =    4.78 MB
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =    90.39 MiB, (  272.38 / 49152.00)
whisper_init_state: compute buffer (decode) =   96.48 MB

system_info: n_threads = 4 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | METAL = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | CUDA = 0 | COREML = 1 | OPENVINO = 0 |

main: processing 'samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:07.200]   and my first and and and and and and and.
[00:00:07.200 --> 00:00:14.200]   and then, and, and, and, and, and,


whisper_print_timings:     load time =   139.33 ms
whisper_print_timings:     fallbacks =   3 p /   0 h
whisper_print_timings:      mel time =     6.35 ms
whisper_print_timings:   sample time =   678.18 ms /  2355 runs (    0.29 ms per run)
whisper_print_timings:   encode time =   165.31 ms /     2 runs (   82.65 ms per run)
whisper_print_timings:   decode time =    19.68 ms /     7 runs (    2.81 ms per run)
whisper_print_timings:   batchd time =  3077.60 ms /  2328 runs (    1.32 ms per run)
whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time = 14592.51 ms
ggml_metal_free: deallocating
ggml_metal_free: deallocating

If I run the same thing without coreml I get the correct output. Has something changed since this PR was added?

@ggerganov
Copy link
Owner

Which PR? Are you using the latest master?

@battaglia01
Copy link
Author

I meant the PR where the Metal support was added. Yes, this is the latest master and cloned right before.

@milsun
Copy link

milsun commented Feb 13, 2024

I am facing the same issue, on Apple M2 Pro.

@ggerganov
Copy link
Owner

Hm strange, it works on my M1 Pro and M2 Ultra

@ggerganov
Copy link
Owner

Can you try a completely fresh build:

git clone https://github.com/ggerganov/whisper.cpp whisper.cpp-coreml
cd whisper.cpp-coreml

# download base.en
./models/download-ggml-model.sh base.en

# create CoreML model
python3 -m venv .coreml
source ./.coreml/bin/activate
pip3 install -r ./models/requirements-coreml.txt 
./models/generate-coreml-model.sh base.en

# build and run
WHISPER_COREML=1 make -j && ./main -m models/ggml-base.en.bin -f samples/jfk.wav

@battaglia01
Copy link
Author

Yes, same thing. I just ran all that and the output is

[00:00:00.000 --> 00:00:07.200]   and my first and and and and and and and.
[00:00:07.200 --> 00:00:14.200]   and then, and, and, and, and, and

@ggerganov
Copy link
Owner

Hm strange. And can you confirm that it works if you checkout v1.5.4 ?

@milsun
Copy link

milsun commented Feb 14, 2024

Possibly something to do with Swift version, maybe, a wild guess. I am using Swift version 5.8, btw.

@milsun
Copy link

milsun commented Feb 15, 2024

Can you try a completely fresh build:

git clone https://github.com/ggerganov/whisper.cpp whisper.cpp-coreml
cd whisper.cpp-coreml

# download base.en
./models/download-ggml-model.sh base.en

# create CoreML model
python3 -m venv .coreml
source ./.coreml/bin/activate
pip3 install -r ./models/requirements-coreml.txt 
./models/generate-coreml-model.sh base.en

# build and run
WHISPER_COREML=1 make -j && ./main -m models/ggml-base.en.bin -f samples/jfk.wav

I get the following output:

[00:00:00.000 --> 00:00:30.000] and my first

@milsun
Copy link

milsun commented Feb 15, 2024

Hm strange. And can you confirm that it works if you checkout v1.5.4 ?

Gives following output, for this too:
[00:00:00.000 --> 00:00:30.000] and my first

@ggerganov
Copy link
Owner

So this means that it is not caused by any recent changes.
Did CoreML ever work for you? If yes, can you find the latest commit where it works?

@battaglia01
Copy link
Author

On v1.5.4, I still get the same thing:

[00:00:00.000 --> 00:00:07.200]   and my first and and and and and and and.
[00:00:07.200 --> 00:00:14.200]   and then, and, and, and, and, and,

Which is, strangely, slightly different from what milsun is getting above.

I'll see if I can figure out some earlier commit that works.

@battaglia01
Copy link
Author

OK, it does work in commit 5e47e22, which is where CoreML was added.

[00:00:00.000 --> 00:00:11.000]   And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.

I note for both this commit and v1.5.4, I used conda instead of venv as ./models/requirements-coreml.txt wasn't there.

@milsun
Copy link

milsun commented Feb 20, 2024

```shell
WHISPER_COREML=1 make -j && ./main -m models/ggml-base.en.bin -f samples/jfk.wav

The commit seems to work. Thanks!

@milsun
Copy link

milsun commented Feb 29, 2024

Can you try a completely fresh build:

git clone https://github.com/ggerganov/whisper.cpp whisper.cpp-coreml
cd whisper.cpp-coreml

# download base.en
./models/download-ggml-model.sh base.en

# create CoreML model
python3 -m venv .coreml
source ./.coreml/bin/activate
pip3 install -r ./models/requirements-coreml.txt 
./models/generate-coreml-model.sh base.en

# build and run
WHISPER_COREML=1 make -j && ./main -m models/ggml-base.en.bin -f samples/jfk.wav

Tried the latest code on M3, works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants