Fixed the issue of OpenBLAS not being enabled on Windows. #1128

bobqianic · 2023-07-21T09:32:16Z

Fixed the issue of not being able to find OpenBLAS on the Windows platform. Even though the name of the previously released binary file was whisper-blas-bin-x64.zip, BLAS was actually not enabled. After enabling, the inference speed can increase by 3-4 times. When compiling on Windows, all you need to do is download the OpenBLAS binary package, unzip it to any directory, and add the package's path to a system variable called OPENBLAS_PATH.

Before:

C:\Users\qianp>C:\Users\qianp\Downloads\whisper-blas-bin-x64\bench.exe -m C:\Users\qianp\Downloads\ggml-m
odel-largev2.bin -w 0 -t 4
whisper_init_from_file_no_state: loading model from 'C:\Users\qianp\Downloads\ggml-model-largev2.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1280
whisper_model_load: n_text_head   = 20
whisper_model_load: n_text_layer  = 32
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: type          = 5
whisper_model_load: mem required  = 3557.00 MB (+   71.00 MB per decoder)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: model ctx     = 2950.97 MB
whisper_model_load: model size    = 2950.66 MB
whisper_init_state: kv self size  =   70.00 MB
whisper_init_state: kv cross size =  234.38 MB

system_info: n_threads = 4 / 20 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | COREML = 0 |

whisper_print_timings:     load time =  2369.90 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =     0.00 ms
whisper_print_timings:   sample time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:   encode time = 48837.34 ms /     1 runs (48837.34 ms per run)
whisper_print_timings:   decode time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time = 51340.71 ms

If you wish, you can submit these results here:

  https://github.com/ggerganov/whisper.cpp/issues/89

Please include the following information:

  - CPU model
  - Operating system
  - Compiler

C:\Users\qianp>C:\Users\qianp\Downloads\whisper-blas-bin-x64\bench.exe -m C:\Users\qianp\Downloads\ggml-model-largev2.bin -w 2 -t 4
ggml_mul_mat:   64 x   64: Q4_0     2.9 GFLOPS (128 runs) / Q4_1     3.3 GFLOPS (128 runs) / F16     3.5 GFLOPS (128 runs) / F32     3.3 GFLOPS (128 runs)
ggml_mul_mat:  128 x  128: Q4_0    15.4 GFLOPS (128 runs) / Q4_1    20.7 GFLOPS (128 runs) / F16    23.9 GFLOPS (128 runs) / F32    15.1 GFLOPS (128 runs)
ggml_mul_mat:  256 x  256: Q4_0    33.9 GFLOPS (128 runs) / Q4_1    77.0 GFLOPS (128 runs) / F16    96.2 GFLOPS (128 runs) / F32    31.8 GFLOPS (128 runs)
ggml_mul_mat:  512 x  512: Q4_0    63.9 GFLOPS (128 runs) / Q4_1   107.9 GFLOPS (128 runs) / F16   128.5 GFLOPS (128 runs) / F32    36.8 GFLOPS (128 runs)
ggml_mul_mat: 1024 x 1024: Q4_0    80.9 GFLOPS ( 38 runs) / Q4_1   125.8 GFLOPS ( 59 runs) / F16   131.1 GFLOPS ( 62 runs) / F32    35.7 GFLOPS ( 17 runs)
ggml_mul_mat: 2048 x 2048: Q4_0    86.3 GFLOPS (  6 runs) / Q4_1   129.8 GFLOPS (  8 runs) / F16   135.5 GFLOPS (  8 runs) / F32    33.3 GFLOPS (  3 runs)
ggml_mul_mat: 4096 x 4096: Q4_0    89.2 GFLOPS (  3 runs) / Q4_1    97.2 GFLOPS (  3 runs) / F16    62.3 GFLOPS (  3 runs) / F32    20.6 GFLOPS (  3 runs)

After:

C:\Users\qianp>C:\Users\qianp\Downloads\whisper_build\bin\Release\bench.exe -m C:\Users\qianp\Downloads\ggml-model-largev2.bin -w 0 -t 4
whisper_init_from_file_no_state: loading model from 'C:\Users\qianp\Downloads\ggml-model-largev2.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1280
whisper_model_load: n_text_head   = 20
whisper_model_load: n_text_layer  = 32
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 5
whisper_model_load: mem required  = 3557.00 MB (+   71.00 MB per decoder)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: model ctx     = 2951.27 MB
whisper_model_load: model size    = 2950.66 MB
whisper_init_state: kv self size  =   70.00 MB
whisper_init_state: kv cross size =  234.38 MB

system_info: n_threads = 4 / 20 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | COREML = 0 | OPENVINO = 0 |

whisper_print_timings:     load time =  1739.12 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =     0.00 ms
whisper_print_timings:   sample time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:   encode time = 11481.76 ms /     1 runs (11481.76 ms per run)
whisper_print_timings:   decode time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time = 13400.80 ms

If you wish, you can submit these results here:

  https://github.com/ggerganov/whisper.cpp/issues/89

Please include the following information:

  - CPU model
  - Operating system
  - Compiler
  
C:\Users\qianp>C:\Users\qianp\Downloads\whisper_build\bin\Release\bench.exe -m C:\Users\qianp\Downloads\ggml-model-largev2.bin -w 2 -t 4
  64 x   64: Q4_0     3.2 GFLOPS (128 runs) | Q4_1     3.4 GFLOPS (128 runs)
  64 x   64: Q5_0     3.3 GFLOPS (128 runs) | Q5_1     3.3 GFLOPS (128 runs) | Q8_0     3.3 GFLOPS (128 runs)
  64 x   64: F16      3.1 GFLOPS (128 runs) | F32      2.9 GFLOPS (128 runs)
 128 x  128: Q4_0     6.3 GFLOPS (128 runs) | Q4_1     5.7 GFLOPS (128 runs)
 128 x  128: Q5_0     5.8 GFLOPS (128 runs) | Q5_1     5.7 GFLOPS (128 runs) | Q8_0     5.3 GFLOPS (128 runs)
 128 x  128: F16      6.6 GFLOPS (128 runs) | F32      6.5 GFLOPS (128 runs)
 256 x  256: Q4_0    41.3 GFLOPS (128 runs) | Q4_1    39.1 GFLOPS (128 runs)
 256 x  256: Q5_0    38.0 GFLOPS (128 runs) | Q5_1    37.4 GFLOPS (128 runs) | Q8_0    35.9 GFLOPS (128 runs)
 256 x  256: F16     47.3 GFLOPS (128 runs) | F32     46.4 GFLOPS (128 runs)
 512 x  512: Q4_0   190.8 GFLOPS (128 runs) | Q4_1   188.3 GFLOPS (128 runs)
 512 x  512: Q5_0   187.6 GFLOPS (128 runs) | Q5_1   175.6 GFLOPS (128 runs) | Q8_0   183.0 GFLOPS (128 runs)
 512 x  512: F16    204.0 GFLOPS (128 runs) | F32    180.5 GFLOPS (128 runs)
1024 x 1024: Q4_0   349.2 GFLOPS (128 runs) | Q4_1   345.7 GFLOPS (128 runs)
1024 x 1024: Q5_0   337.9 GFLOPS (128 runs) | Q5_1   331.3 GFLOPS (128 runs) | Q8_0   342.0 GFLOPS (128 runs)
1024 x 1024: F16    350.1 GFLOPS (128 runs) | F32    245.8 GFLOPS (115 runs)
2048 x 2048: Q4_0   446.7 GFLOPS ( 27 runs) | Q4_1   430.4 GFLOPS ( 26 runs)
2048 x 2048: Q5_0   433.3 GFLOPS ( 26 runs) | Q5_1   440.8 GFLOPS ( 26 runs) | Q8_0   447.1 GFLOPS ( 27 runs)
2048 x 2048: F16    447.5 GFLOPS ( 27 runs) | F32    335.9 GFLOPS ( 20 runs)
4096 x 4096: Q4_0   534.0 GFLOPS (  4 runs) | Q4_1   493.6 GFLOPS (  4 runs)
4096 x 4096: Q5_0   498.4 GFLOPS (  4 runs) | Q5_1   502.5 GFLOPS (  4 runs) | Q8_0   513.1 GFLOPS (  4 runs)
4096 x 4096: F16    520.9 GFLOPS (  4 runs) | F32    414.5 GFLOPS (  4 runs)

Fixed the issue of not being able to find OpenBLAS on the Windows platform. Even though the name of the previously released binary file was whisper-blas-bin-x64.zip, BLAS was actually not enabled. After enabling, the inference speed can increase by 3-4 times.

Enable OpenBLAS on Windows

2270ead

Fixed the issue of not being able to find OpenBLAS on the Windows platform. Even though the name of the previously released binary file was whisper-blas-bin-x64.zip, BLAS was actually not enabled. After enabling, the inference speed can increase by 3-4 times.

ggerganov approved these changes Jul 25, 2023

View reviewed changes

ggerganov merged commit a195bf8 into ggerganov:master Jul 25, 2023

bobqianic mentioned this pull request Sep 11, 2023

Performance degradation after ggml sync #1273

Closed

iamthad mentioned this pull request Nov 7, 2023

Fix variable names in GitHub actions config #1440

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed the issue of OpenBLAS not being enabled on Windows. #1128

Fixed the issue of OpenBLAS not being enabled on Windows. #1128

bobqianic commented Jul 21, 2023

Fixed the issue of OpenBLAS not being enabled on Windows. #1128

Fixed the issue of OpenBLAS not being enabled on Windows. #1128

Conversation

bobqianic commented Jul 21, 2023