Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct parameter for cross compile for ARM Android ? #8

Closed
ekorudi opened this issue Oct 1, 2022 · 2 comments
Closed

Correct parameter for cross compile for ARM Android ? #8

ekorudi opened this issue Oct 1, 2022 · 2 comments
Labels
build Build related issues help wanted Extra attention is needed question Further information is requested

Comments

@ekorudi
Copy link

ekorudi commented Oct 1, 2022

What is correct parameter for cross compile for ARM Android ? I'm using Intel Ubuntu , android-ndk-r25b


ggml.c:232:16: warning: implicit declaration of function 'vfmaq_f32' is invalid in C99 [-Wimplicit-function-declaration]
        sum0 = vfmaq_f32(sum0, x0, y0);
               ^
ggml.c:232:14: error: assigning to 'float32x4_t' (vector of 4 'float32_t' values) from **incompatible type** 'int'
        sum0 = vfmaq_f32(sum0, x0, y0);
             ^ ~~~~~~~~~~~~~~~~~~~~~~~

./ggml.c:331:14: error: assigning to 'float16x8_t' (vector of 8 'float16_t' values) from **incompatible type** 'int'
        sum0 = vfmaq_f16(sum0, x0, y0);
             ^ ~~~~~~~~~~~~~~~~~~~~~~~             
@ggerganov ggerganov added help wanted Extra attention is needed question Further information is requested labels Oct 1, 2022
@ekorudi ekorudi changed the title Correct parameter for cross compile for ARM Android Correct parameter for cross compile for ARM Android ? Oct 2, 2022
@ekorudi
Copy link
Author

ekorudi commented Oct 3, 2022

Hi, I found a way to compile with following Makefile using raspi-4 branch

gcc="../arm-compiler/bin/aarch64-linux-android28-clang"
gpp="../arm-compiler/bin/aarch64-linux-android28-clang++"

main: ggml.o main.o
	$(gpp) -pthread -o main ggml.o main.o -static-libstdc++
	adb push main /data/local/tmp/main
        echo "run with adb shell"

ggml.o: ggml.c ggml.h
	$(gcc) -pthread -O3 -c ggml.c -mcpu=cortex-a75 -mfpu=neon-fp-armv8 

main.o: main.cpp ggml.h
	$(gpp) -pthread -O3 -std=c++11 -c main.cpp -static-libstdc++

clean:
	rm -f *.o main

Top from other terminal show below :

Tasks: 609 total,   2 running, 607 sleeping,   0 stopped,   0 zombie
  Mem:      5.5G total,      5.1G used,      379M free,      7.0M buffers
 Swap:      3.0G total,      548M used,      2.5G free,      2.0G cached
800%cpu 395%user   0%nice   4%sys 394%idle   0%iow   0%irq   8%sirq   0%host
  PID USER         PR  NI VIRT  RES  SHR S[%CPU] %MEM     TIME+ ARGS                                                                                                                                        
 3913 shell        20   0 1.7G 989M 2.9M R  392  17.3   8:23.97 main -l id -f najhari.wav -m models/ggml-small.bin
 8339 shell        20   0  36M 4.8M 3.5M R  6.3   0.0   0:02.33 top
26385 system       20   0 5.5G 126M 108M S  1.0   2.2   1:33.01 com.samsung.accessibility
17708 system       18  -2  10G 231M 231M S  0.6   4.0  36:52.31 system_server
  408 system       20   0  42M 2.8M 2.2M S  0.6   0.0   3:32.57 hwservicemanager
25707 root         20   0    0    0    0 I  0.3   0.0   0:00.53 [kworker/u16:9]

Result is :

Hardware: Samsung A31

recording length: 120 s

total time:

ggml-base.bin : 160 sec
ggml-small.bin : 580 sec
ggml-large.bin : phone restart, maybe OOM?

According to https://www.gsmarena.com/samsung_galaxy_a31-10149.php this phone have GPU,

  1. Can we use these hardware to speed up processing ?
  2. What is minimum hardware requirement to use large model ?

@ggerganov
Copy link
Owner

ggerganov commented Oct 3, 2022

Congrats! You might very well be the first person ever to run whisper on a mobile device 😄

For the large model you need to have about 5GB memory. From the top output that you posted, is looks like you are just at the limit. So maybe it won't be possible to load it on that device.

My implementation does not use GPU - it runs fully on the CPU. I don't plan on supporting GPUs for now, so we won't be able to benefit from that.

The rpi-4 branch was a quick hack to make it run on my RPi4 that I have at home. I think it might be possible to improve the performance by properly implementing SIMD routines that are efficient for the Android Arm architecture. Need some investigation on that.

Also, there is a bug in the FFT function on the rpi-4 branch that makes the transcription worse, especially for the smaller models. I have fixed this on master, but haven't merged it yet in the other branches:

77d929f

Anyway - glad to hear that at least it works!

P.S. Maybe try different number of threads and see how it affects the performance: -t 2, -t 4, -t 8, etc.

@ggerganov ggerganov added the build Build related issues label Oct 5, 2022
mattsta pushed a commit to mattsta/whisper.cpp that referenced this issue Apr 1, 2023
* fix: more-itertools name in requirements.txt

* feature: minimal environment.yml for conda

* Revert "feature: minimal environment.yml for conda"

This reverts commit 8fd7438b368b0eb5df85f667fea911f293fa5e6d.

Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
@jacob-salassi jacob-salassi mentioned this issue May 15, 2023
10 tasks
nanoflooder pushed a commit to nanoflooder/whisper.cpp that referenced this issue Mar 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Build related issues help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants