Android Inference is too slow #1070

saffie91 · 2023-06-30T17:19:12Z

Hi, trying out the Android Implementation, it works well overall and thank you for this contribution.

However the inference time is incredibly slow, even for a few words it takes about 30 seconds to get a response which makes it unusable.

I am using the small model (I feel the accuracy of the tiny and base model are not up to par in different languages which is the use case in our scenario).

Is there any way to speed up the inference time? Is GPU inference possible on android for example?

For reference, I am using these parameters:

params.print_progress = false;
params.print_special = false;
params.print_realtime = false;
params.print_timestamps = false;
params.translate = false;
params.single_segment = false;
params.max_tokens = 32;
params.language = "auto";
params.n_threads = std::max(1, std::min(8, (int32_t) std::thread::hardware_concurrency()));
params.audio_ctx = 768;
params.speed_up = false;
// params.temperature_inc = params.temperature_inc;
// params.prompt_tokens = nullptr;
params.prompt_n_tokens = 0;

Azeirah · 2023-07-06T20:57:38Z

GPU inference should be possible with CLBLAST, no? Might try that.

gersomonline · 2023-07-13T11:04:55Z

Are you running the app in release mode? It improves the performance significantly.

sandorkonya · 2023-07-14T11:42:17Z

@Azeirah if someone is interested:
Whisper GPU support via CLBLAST
CLBLAST on Android

sandorkonya · 2023-07-17T11:43:06Z

FYI: maybe this helps.

ningpengtao-coder · 2023-07-24T02:33:32Z

I try to compile clblast as a shared object of the android platform, use -DGGML_USE_CLBLAST to enable support for clblast, and load opencl through System.load("/vendor/lib64/libOpenCL.so"). Then run inference. Incorrect results and longer inference time.
whisper1(cpu): correct result.

whisper2(opencl gpu): incorrect result.

ValleZ · 2024-01-25T15:59:56Z

ggerganov/llama.cpp#5123

sandorkonya mentioned this issue Jul 17, 2023

whisper.Android WhisperCppDemo very slow android specific transcibe times for 3s recording a res of 31227ms #1022

Open

ningpengtao-coder mentioned this issue Jul 28, 2023

opencl android support question #1140

Open

Macoron mentioned this issue Aug 4, 2023

Really slow transcription on Android platform. Macoron/whisper.unity#47

Open

zhouwg mentioned this issue Mar 4, 2024

PoC:clean-room implementation of real-time AI subtitle for English online-TV(OTT TV) zhouwg/kantv#64

Closed

ElishaAz mentioned this issue May 16, 2024

Consider switching to Whisper models ElishaAz/Sayboard#75

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Android Inference is too slow #1070

Android Inference is too slow #1070

saffie91 commented Jun 30, 2023 •

edited

Azeirah commented Jul 6, 2023

gersomonline commented Jul 13, 2023

sandorkonya commented Jul 14, 2023

sandorkonya commented Jul 17, 2023

ningpengtao-coder commented Jul 24, 2023 •

edited

ValleZ commented Jan 25, 2024

Android Inference is too slow #1070

Android Inference is too slow #1070

Comments

saffie91 commented Jun 30, 2023 • edited

Azeirah commented Jul 6, 2023

gersomonline commented Jul 13, 2023

sandorkonya commented Jul 14, 2023

sandorkonya commented Jul 17, 2023

ningpengtao-coder commented Jul 24, 2023 • edited

ValleZ commented Jan 25, 2024

saffie91 commented Jun 30, 2023 •

edited

ningpengtao-coder commented Jul 24, 2023 •

edited