Add "-mcpu=native" when building for aarch64 #532

FlippFuzz · 2023-02-25T02:32:48Z

Performance test was done in #89 (comment) and #89 (comment)

While the test was done only for Ampere A1 on Oracle Cloud, there's a recommendation from ARM to just set -mcpu=native. We might as well do it for all ARM CPUs.
https://community.arm.com/arm-community-blogs/b/tools-software-ides-blog/posts/compiler-flags-across-architectures-march-mtune-and-mcpu
On Ampere A1 on Oracle Cloud, FP16 will be enabled with -mcpu=native which results in large performance gains.
If it's not acceptable to do this for all ARM CPUs, I can add an ifdef WHISPER_AMPERE_A1 check before enabling -mcpu=native.

ARM CPUs aren't very good at reporting their names and cannot easily identify it.

cat /proc/cpuinfo
processor       : 0
BogoMIPS        : 50.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x3
CPU part        : 0xd0c
CPU revision    : 1

jaybinks · 2023-02-25T03:37:04Z

Can confirm the same in my Ampere At in Oracle Cloud.

unmodified checkout :
``ubuntu@instance-20230225-1302:~/src/whisper.cpp$ ./main -m models/ggml-medium.bin -f samples/jfk.wav
whisper_init_from_file: loading model from 'models/ggml-medium.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1024
whisper_model_load: n_text_head = 16
whisper_model_load: n_text_layer = 24
whisper_model_load: n_mels = 80
whisper_model_load: f16 = 1
whisper_model_load: type = 4
whisper_model_load: mem required = 1720.00 MB (+ 43.00 MB per decoder)
whisper_model_load: kv self size = 42.00 MB
whisper_model_load: kv cross size = 140.62 MB
whisper_model_load: adding 1608 extra tokens
whisper_model_load: model ctx = 1462.35 MB
whisper_model_load: model size = 1462.12 MB

system_info: n_threads = 4 / 4 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 |

main: processing 'samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...

[00:00:00.000 --> 00:00:11.000] And so, my fellow Americans, ask not what your country can do for you, ask what you can do for your country.

whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: load time = 917.15 ms
whisper_print_timings: mel time = 78.78 ms
whisper_print_timings: sample time = 21.20 ms / 28 runs ( 0.76 ms per run)
whisper_print_timings: encode time = 60665.24 ms / 1 runs (60665.24 ms per run)
whisper_print_timings: decode time = 2004.89 ms / 28 runs ( 71.60 ms per run)
whisper_print_timings: total time = 63755.75 ms
``

using -mcpu=native :
``ubuntu@instance-20230225-1302:~/src/whisper.cpp$ ./main -m models/ggml-medium.bin -f samples/jfk.wav
whisper_init_from_file: loading model from 'models/ggml-medium.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1024
whisper_model_load: n_text_head = 16
whisper_model_load: n_text_layer = 24
whisper_model_load: n_mels = 80
whisper_model_load: f16 = 1
whisper_model_load: type = 4
whisper_model_load: mem required = 1720.00 MB (+ 43.00 MB per decoder)
whisper_model_load: kv self size = 42.00 MB
whisper_model_load: kv cross size = 140.62 MB
whisper_model_load: adding 1608 extra tokens
whisper_model_load: model ctx = 1462.35 MB
whisper_model_load: model size = 1462.12 MB

system_info: n_threads = 4 / 4 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 |

main: processing 'samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...

[00:00:00.000 --> 00:00:11.000] And so, my fellow Americans, ask not what your country can do for you, ask what you can do for your country.

whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: load time = 836.11 ms
whisper_print_timings: mel time = 79.38 ms
whisper_print_timings: sample time = 21.25 ms / 28 runs ( 0.76 ms per run)
whisper_print_timings: encode time = 23294.16 ms / 1 runs (23294.16 ms per run)
whisper_print_timings: decode time = 1188.38 ms / 28 runs ( 42.44 ms per run)
whisper_print_timings: total time = 25482.80 ms``

)

Add "-mcpu=native" when building for aarch64

ad94a83

ggerganov merged commit f420de1 into ggerganov:master Feb 27, 2023

FlippFuzz mentioned this pull request Mar 22, 2023

float16 does not appear to work on CPU with fp16 capabilities SYSTRAN/faster-whisper#65

Closed

mattsta pushed a commit to mattsta/whisper.cpp that referenced this pull request Apr 1, 2023

suppress generating non-timestamp tokens at the beginning (ggerganov#532

76148a5

)

anandijain pushed a commit to anandijain/whisper.cpp that referenced this pull request Apr 28, 2023

make : add "-mcpu=native" when building for aarch64 (ggerganov#532)

05b76f9

jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this pull request Oct 24, 2023

make : add "-mcpu=native" when building for aarch64 (ggerganov#532)

befe105

jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this pull request Oct 24, 2023

make : add "-mcpu=native" when building for aarch64 (ggerganov#532)

5eaed59

landtanin pushed a commit to landtanin/whisper.cpp that referenced this pull request Dec 16, 2023

make : add "-mcpu=native" when building for aarch64 (ggerganov#532)

842bfc4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add "-mcpu=native" when building for aarch64 #532

Add "-mcpu=native" when building for aarch64 #532

FlippFuzz commented Feb 25, 2023

jaybinks commented Feb 25, 2023 •

edited

Add "-mcpu=native" when building for aarch64 #532

Add "-mcpu=native" when building for aarch64 #532

Conversation

FlippFuzz commented Feb 25, 2023

jaybinks commented Feb 25, 2023 • edited

jaybinks commented Feb 25, 2023 •

edited