Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run talk example failed #782

Closed
jhezjkp opened this issue Apr 17, 2023 · 8 comments
Closed

Run talk example failed #782

jhezjkp opened this issue Apr 17, 2023 · 8 comments
Labels
bug Something isn't working

Comments

@jhezjkp
Copy link

jhezjkp commented Apr 17, 2023

➜ whisper.cpp git:(master) ✗ ./talk -p santa
whisper_init_from_file_no_state: loading model from 'models/ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 512
whisper_model_load: n_text_head = 8
whisper_model_load: n_text_layer = 6
whisper_model_load: n_mels = 80
whisper_model_load: f16 = 1
whisper_model_load: type = 2
whisper_model_load: mem required = 218.00 MB (+ 6.00 MB per decoder)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx = 140.60 MB
whisper_model_load: model size = 140.54 MB
whisper_init_state: kv self size = 5.25 MB
whisper_init_state: kv cross size = 17.58 MB
gpt2_model_load: loading model from 'models/ggml-gpt-2-117M.bin'
gpt2_model_load: n_vocab = 50257
gpt2_model_load: n_ctx = 1024
gpt2_model_load: n_embd = 768
gpt2_model_load: n_head = 12
gpt2_model_load: n_layer = 12
gpt2_model_load: f16 = 1
gpt2_model_load: ggml ctx size = 311.12 MB
gpt2_model_load: memory size = 72.00 MB, n_mem = 12288
gpt2_model_load: tensor 'model/h0/attn/c_attn/w' has wrong shape in model file: got [2304, 768], expected [768, 2304]
gpt2_init: failed to load model from 'models/ggml-gpt-2-117M.bin'

main: processing, 4 threads, lang = en, task = transcribe, timestamps = 0 ...

init: found 1 capture devices:
init: - Capture device #0: 'MacBook Pro麦克风'
init: attempt to open default capture device ...
init: obtained spec for input device (SDL Id = 2):
init: - sample rate: 16000
init: - format: 33056 (required: 33056)
init: - channels: 1 (required: 1)
init: - samples per frame: 1024
[1] 68264 segmentation fault ./talk -p santa

➜ whisper.cpp git:(master) ✗ shasum -a 256 ./models/ggml-gpt-2-117M.bin
b457d5fcc7f2f71e727bee74298d42d80610619e02af16beca53d44a71d5f607 ./models/ggml-gpt-2-117M.bin

@jhezjkp
Copy link
Author

jhezjkp commented Apr 17, 2023

Apple M1 Pro
Mac OS 13.3

@ggerganov ggerganov added the bug Something isn't working label Apr 23, 2023
@ggerganov
Copy link
Owner

I'll fix these in a few days

@gab-luz
Copy link

gab-luz commented Apr 29, 2023

The same is happening to me when using q2 whisper version. mr. @ggerganov I've also tried switching to 4-bit branch and it still didn't work. Besides usual "./main -m models/ggml-model-whisper-large-q4_0.bin -f file.mp4", is there anything else to be done to run quantized model?

@gab-luz
Copy link

gab-luz commented Apr 29, 2023

The same error to me when running the quantized file version: [1] 37589 segmentation fault (core dumped) ./main -m models/ggml-model-whisper-large-q4_0.bin -f

@gab-luz
Copy link

gab-luz commented Apr 30, 2023

I've tried a wav file instead of mp4 file and it didn't work. Running ggml-large.bin works but it's not the quantized one, unfortunately.

@ggerganov
Copy link
Owner

Should be fixed now

@ZechenM
Copy link

ZechenM commented May 31, 2023

I am still getting the same error:

(py310-whisper) whisper.cpp % ./talk -p Santa
whisper_init_from_file_no_state: loading model from 'models/ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 512
whisper_model_load: n_text_head = 8
whisper_model_load: n_text_layer = 6
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 2
whisper_model_load: mem required = 310.00 MB (+ 6.00 MB per decoder)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx = 140.66 MB
whisper_model_load: model size = 140.54 MB
whisper_init_state: kv self size = 5.25 MB
whisper_init_state: kv cross size = 17.58 MB
whisper_init_state: loading Core ML model from 'models/ggml-base.en-encoder.mlmodelc'
whisper_init_state: first run on a device may take a while ...
whisper_init_state: Core ML model loaded
gpt2_model_load: loading model from 'models/ggml-gpt-2-117M.bin'
gpt2_model_load: failed to open 'models/ggml-gpt-2-117M.bin'
gpt2_init: failed to load model from 'models/ggml-gpt-2-117M.bin'

main: processing, 4 threads, lang = en, task = transcribe, timestamps = 0 ...

init: found 4 capture devices:
init: - Capture device #0: 'Zechen’s AirPods Pro #2'
init: - Capture device #1: 'Z’s iPhone Microphone'
init: - Capture device #2: 'MacBook Pro Microphone'
init: - Capture device #3: 'ZoomAudioDevice'
init: attempt to open default capture device ...
init: obtained spec for input device (SDL Id = 2):
init: - sample rate: 16000
init: - format: 33056 (required: 33056)
init: - channels: 1 (required: 1)
init: - samples per frame: 1024
zsh: segmentation fault ./talk -p Santa

@fedorenko-dmitriy
Copy link

Hi all. The same problem on Win10 when started python script from example. I test with all win builds from 1.5.4 to 1.6.2
`
PS C:\Users\123\CODE\test2> python -m test .\jfk.wav base
..\src\utils\whisper.cpp_win\main.exe -m ggml-base.bin -f .\jfk.wav
Error: Error processing audio: whisper_init_from_file_with_params_no_state: loading model from 'ggml-base.bin'
whisper_init_with_params_no_state: use gpu = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw = 0
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 512
whisper_model_load: n_text_head = 8
whisper_model_load: n_text_layer = 6
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 2 (base)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs = 99
whisper_model_load: CPU total size = 147.37 MB
whisper_model_load: model size = 147.37 MB
whisper_init_state: kv self size = 18.87 MB
whisper_init_state: kv cross size = 18.87 MB
whisper_init_state: kv pad size = 3.15 MB
whisper_init_state: compute buffer (conv) = 16.39 MB
whisper_init_state: compute buffer (encode) = 132.07 MB
whisper_init_state: compute buffer (cross) = 4.78 MB
whisper_init_state: compute buffer (decode) = 96.48 MB

system_info: n_threads = 4 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0

main: processing '.\jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...

whisper_print_timings: load time = 324.05 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 16.54 ms
whisper_print_timings: sample time = 120.25 ms / 141 runs ( 0.85 ms per run)
whisper_print_timings: encode time = 1851.88 ms / 1 runs ( 1851.88 ms per run)
whisper_print_timings: decode time = 5.56 ms / 1 runs ( 5.56 ms per run)
whisper_print_timings: batchd time = 307.09 ms / 138 runs ( 2.23 ms per run)
whisper_print_timings: prompt time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: total time = 2641.22 ms`

But if run the command directly it works normal

`PS C:\Users\123\CODE\test2> ..\src\utils\whisper.cpp_win\main.exe -m ggml-base.bin -f .\jfk.wav
whisper_init_from_file_with_params_no_state: loading model from 'ggml-base.bin'
whisper_init_with_params_no_state: use gpu = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw = 0
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 512
whisper_model_load: n_text_head = 8
whisper_model_load: n_text_layer = 6
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 2 (base)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs = 99
whisper_model_load: CPU total size = 147.37 MB
whisper_model_load: model size = 147.37 MB
whisper_init_state: kv self size = 18.87 MB
whisper_init_state: kv cross size = 18.87 MB
whisper_init_state: kv pad size = 3.15 MB
whisper_init_state: compute buffer (conv) = 16.39 MB
whisper_init_state: compute buffer (encode) = 132.07 MB
whisper_init_state: compute buffer (cross) = 4.78 MB
whisper_init_state: compute buffer (decode) = 96.48 MB

system_info: n_threads = 4 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0

main: processing '.\jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...

[00:00:00.000 --> 00:00:08.000] And so, my fellow Americans, ask not what your country can do for you,
[00:00:08.000 --> 00:00:11.000] ask what you can do for your country.

whisper_print_timings: load time = 366.80 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 40.23 ms
whisper_print_timings: sample time = 165.82 ms / 141 runs ( 1.18 ms per run)
whisper_print_timings: encode time = 3230.55 ms / 1 runs ( 3230.55 ms per run)
whisper_print_timings: decode time = 7.19 ms / 1 runs ( 7.19 ms per run)
whisper_print_timings: batchd time = 559.73 ms / 138 runs ( 4.06 ms per run)
whisper_print_timings: prompt time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: total time = 4438.11 ms`

Any suggestions? Thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants