Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoreML on M1 gives a wrong transcription #1901

Open
nbarrera opened this issue Feb 24, 2024 · 3 comments
Open

CoreML on M1 gives a wrong transcription #1901

nbarrera opened this issue Feb 24, 2024 · 3 comments

Comments

@nbarrera
Copy link

nbarrera commented Feb 24, 2024

Hi there I am following instructions to get CoreML working on Apple Silicon M1.

after I get everything going and trying to transcribe the jfk sample, I only get a wrong transcription:

[00:00:00.000 --> 00:00:30.000]   " in "

while the correct output is:

[00:00:00.300 --> 00:00:09.180]   And so, my fellow Americans, ask not what your country can do for you, ask what you
[00:00:09.180 --> 00:00:11.000]   can do for your country.

I think I did everything as instructed (python 3.10, miniconda, installed the packages) but I am very new to all this AI thing.

The regular (not CoreML) model is working perfectly for me, I am just trying to see if I get a better performance out of my M1 chip.

Thank you in advance, Nicolas.

(here's an excerpt of the output)

./main -m models/ggml-large-v3.bin -f samples/jfk.wav
...
whisper_init_state: loading Core ML model from 'models/ggml-large-v3-encoder.mlmodelc'
whisper_init_state: first run on a device may take a while ...
whisper_init_state: Core ML model loaded
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =     8.80 MiB, ( 3412.97 / 10922.67)
whisper_init_state: compute buffer (conv)   =   10.92 MB
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =     7.33 MiB, ( 3420.30 / 10922.67)
whisper_init_state: compute buffer (cross)  =    9.38 MB
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =   197.95 MiB, ( 3618.25 / 10922.67)
whisper_init_state: compute buffer (decode) =  209.26 MB

system_info: n_threads = 4 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | METAL = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | CUDA = 0 | COREML = 1 | OPENVINO = 0 | 

main: processing 'samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:30.000]   " in "


whisper_print_timings:     load time =  1128.05 ms
whisper_print_timings:     fallbacks =   1 p /   0 h
whisper_print_timings:      mel time =     7.88 ms
whisper_print_timings:   sample time =    53.50 ms /    45 runs (    1.19 ms per run)
whisper_print_timings:   encode time =  1311.38 ms /     1 runs ( 1311.38 ms per run)
whisper_print_timings:   decode time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:   batchd time =   798.65 ms /    41 runs (   19.48 ms per run)
whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time =  8631.10 ms
ggml_metal_free: deallocating

@gavin1818
Copy link
Contributor

Im having the same issue, The regular (not CoreML) model is working for me,but the CoreML is giving wrong transcription.

@gavin1818
Copy link
Contributor

@nbarrera I solved the issue after updating to the latest Macos 14.3.1

@nbarrera
Copy link
Author

nbarrera commented Mar 2, 2024

Thank you I thought about updating at one time, but I am very reluctant to update...

But I do have a good reason to do it now, so well

Will find sometime to update and test again so I can close the issue next week

Thank you!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants