Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed the issue of OpenBLAS not being enabled on Windows. #1128

Merged
merged 1 commit into from
Jul 25, 2023
Merged

Fixed the issue of OpenBLAS not being enabled on Windows. #1128

merged 1 commit into from
Jul 25, 2023

Conversation

bobqianic
Copy link
Collaborator

Fixed the issue of not being able to find OpenBLAS on the Windows platform. Even though the name of the previously released binary file was whisper-blas-bin-x64.zip, BLAS was actually not enabled. After enabling, the inference speed can increase by 3-4 times. When compiling on Windows, all you need to do is download the OpenBLAS binary package, unzip it to any directory, and add the package's path to a system variable called OPENBLAS_PATH.

Before:

C:\Users\qianp>C:\Users\qianp\Downloads\whisper-blas-bin-x64\bench.exe -m C:\Users\qianp\Downloads\ggml-m
odel-largev2.bin -w 0 -t 4
whisper_init_from_file_no_state: loading model from 'C:\Users\qianp\Downloads\ggml-model-largev2.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1280
whisper_model_load: n_text_head   = 20
whisper_model_load: n_text_layer  = 32
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: type          = 5
whisper_model_load: mem required  = 3557.00 MB (+   71.00 MB per decoder)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: model ctx     = 2950.97 MB
whisper_model_load: model size    = 2950.66 MB
whisper_init_state: kv self size  =   70.00 MB
whisper_init_state: kv cross size =  234.38 MB

system_info: n_threads = 4 / 20 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | COREML = 0 |

whisper_print_timings:     load time =  2369.90 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =     0.00 ms
whisper_print_timings:   sample time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:   encode time = 48837.34 ms /     1 runs (48837.34 ms per run)
whisper_print_timings:   decode time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time = 51340.71 ms

If you wish, you can submit these results here:

  https://github.com/ggerganov/whisper.cpp/issues/89

Please include the following information:

  - CPU model
  - Operating system
  - Compiler

C:\Users\qianp>C:\Users\qianp\Downloads\whisper-blas-bin-x64\bench.exe -m C:\Users\qianp\Downloads\ggml-model-largev2.bin -w 2 -t 4
ggml_mul_mat:   64 x   64: Q4_0     2.9 GFLOPS (128 runs) / Q4_1     3.3 GFLOPS (128 runs) / F16     3.5 GFLOPS (128 runs) / F32     3.3 GFLOPS (128 runs)
ggml_mul_mat:  128 x  128: Q4_0    15.4 GFLOPS (128 runs) / Q4_1    20.7 GFLOPS (128 runs) / F16    23.9 GFLOPS (128 runs) / F32    15.1 GFLOPS (128 runs)
ggml_mul_mat:  256 x  256: Q4_0    33.9 GFLOPS (128 runs) / Q4_1    77.0 GFLOPS (128 runs) / F16    96.2 GFLOPS (128 runs) / F32    31.8 GFLOPS (128 runs)
ggml_mul_mat:  512 x  512: Q4_0    63.9 GFLOPS (128 runs) / Q4_1   107.9 GFLOPS (128 runs) / F16   128.5 GFLOPS (128 runs) / F32    36.8 GFLOPS (128 runs)
ggml_mul_mat: 1024 x 1024: Q4_0    80.9 GFLOPS ( 38 runs) / Q4_1   125.8 GFLOPS ( 59 runs) / F16   131.1 GFLOPS ( 62 runs) / F32    35.7 GFLOPS ( 17 runs)
ggml_mul_mat: 2048 x 2048: Q4_0    86.3 GFLOPS (  6 runs) / Q4_1   129.8 GFLOPS (  8 runs) / F16   135.5 GFLOPS (  8 runs) / F32    33.3 GFLOPS (  3 runs)
ggml_mul_mat: 4096 x 4096: Q4_0    89.2 GFLOPS (  3 runs) / Q4_1    97.2 GFLOPS (  3 runs) / F16    62.3 GFLOPS (  3 runs) / F32    20.6 GFLOPS (  3 runs)

After:

C:\Users\qianp>C:\Users\qianp\Downloads\whisper_build\bin\Release\bench.exe -m C:\Users\qianp\Downloads\ggml-model-largev2.bin -w 0 -t 4
whisper_init_from_file_no_state: loading model from 'C:\Users\qianp\Downloads\ggml-model-largev2.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1280
whisper_model_load: n_text_head   = 20
whisper_model_load: n_text_layer  = 32
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 5
whisper_model_load: mem required  = 3557.00 MB (+   71.00 MB per decoder)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: model ctx     = 2951.27 MB
whisper_model_load: model size    = 2950.66 MB
whisper_init_state: kv self size  =   70.00 MB
whisper_init_state: kv cross size =  234.38 MB

system_info: n_threads = 4 / 20 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | COREML = 0 | OPENVINO = 0 |

whisper_print_timings:     load time =  1739.12 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =     0.00 ms
whisper_print_timings:   sample time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:   encode time = 11481.76 ms /     1 runs (11481.76 ms per run)
whisper_print_timings:   decode time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time = 13400.80 ms

If you wish, you can submit these results here:

  https://github.com/ggerganov/whisper.cpp/issues/89

Please include the following information:

  - CPU model
  - Operating system
  - Compiler
  
C:\Users\qianp>C:\Users\qianp\Downloads\whisper_build\bin\Release\bench.exe -m C:\Users\qianp\Downloads\ggml-model-largev2.bin -w 2 -t 4
  64 x   64: Q4_0     3.2 GFLOPS (128 runs) | Q4_1     3.4 GFLOPS (128 runs)
  64 x   64: Q5_0     3.3 GFLOPS (128 runs) | Q5_1     3.3 GFLOPS (128 runs) | Q8_0     3.3 GFLOPS (128 runs)
  64 x   64: F16      3.1 GFLOPS (128 runs) | F32      2.9 GFLOPS (128 runs)
 128 x  128: Q4_0     6.3 GFLOPS (128 runs) | Q4_1     5.7 GFLOPS (128 runs)
 128 x  128: Q5_0     5.8 GFLOPS (128 runs) | Q5_1     5.7 GFLOPS (128 runs) | Q8_0     5.3 GFLOPS (128 runs)
 128 x  128: F16      6.6 GFLOPS (128 runs) | F32      6.5 GFLOPS (128 runs)
 256 x  256: Q4_0    41.3 GFLOPS (128 runs) | Q4_1    39.1 GFLOPS (128 runs)
 256 x  256: Q5_0    38.0 GFLOPS (128 runs) | Q5_1    37.4 GFLOPS (128 runs) | Q8_0    35.9 GFLOPS (128 runs)
 256 x  256: F16     47.3 GFLOPS (128 runs) | F32     46.4 GFLOPS (128 runs)
 512 x  512: Q4_0   190.8 GFLOPS (128 runs) | Q4_1   188.3 GFLOPS (128 runs)
 512 x  512: Q5_0   187.6 GFLOPS (128 runs) | Q5_1   175.6 GFLOPS (128 runs) | Q8_0   183.0 GFLOPS (128 runs)
 512 x  512: F16    204.0 GFLOPS (128 runs) | F32    180.5 GFLOPS (128 runs)
1024 x 1024: Q4_0   349.2 GFLOPS (128 runs) | Q4_1   345.7 GFLOPS (128 runs)
1024 x 1024: Q5_0   337.9 GFLOPS (128 runs) | Q5_1   331.3 GFLOPS (128 runs) | Q8_0   342.0 GFLOPS (128 runs)
1024 x 1024: F16    350.1 GFLOPS (128 runs) | F32    245.8 GFLOPS (115 runs)
2048 x 2048: Q4_0   446.7 GFLOPS ( 27 runs) | Q4_1   430.4 GFLOPS ( 26 runs)
2048 x 2048: Q5_0   433.3 GFLOPS ( 26 runs) | Q5_1   440.8 GFLOPS ( 26 runs) | Q8_0   447.1 GFLOPS ( 27 runs)
2048 x 2048: F16    447.5 GFLOPS ( 27 runs) | F32    335.9 GFLOPS ( 20 runs)
4096 x 4096: Q4_0   534.0 GFLOPS (  4 runs) | Q4_1   493.6 GFLOPS (  4 runs)
4096 x 4096: Q5_0   498.4 GFLOPS (  4 runs) | Q5_1   502.5 GFLOPS (  4 runs) | Q8_0   513.1 GFLOPS (  4 runs)
4096 x 4096: F16    520.9 GFLOPS (  4 runs) | F32    414.5 GFLOPS (  4 runs)

Fixed the issue of not being able to find OpenBLAS on the Windows platform. Even though the name of the previously released binary file was whisper-blas-bin-x64.zip, BLAS was actually not enabled. After enabling, the inference speed can increase by 3-4 times.
@ggerganov ggerganov merged commit a195bf8 into ggerganov:master Jul 25, 2023
jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this pull request Oct 24, 2023
Fixed the issue of not being able to find OpenBLAS on the Windows platform. Even though the name of the previously released binary file was whisper-blas-bin-x64.zip, BLAS was actually not enabled. After enabling, the inference speed can increase by 3-4 times.
jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this pull request Oct 24, 2023
Fixed the issue of not being able to find OpenBLAS on the Windows platform. Even though the name of the previously released binary file was whisper-blas-bin-x64.zip, BLAS was actually not enabled. After enabling, the inference speed can increase by 3-4 times.
landtanin pushed a commit to landtanin/whisper.cpp that referenced this pull request Dec 16, 2023
Fixed the issue of not being able to find OpenBLAS on the Windows platform. Even though the name of the previously released binary file was whisper-blas-bin-x64.zip, BLAS was actually not enabled. After enabling, the inference speed can increase by 3-4 times.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants