On-device Whisper inference on mobile (iPhone 13 Mini) #407
Replies: 5 comments 23 replies
-
this is really nice. Do you have plans to run in macOS? |
Beta Was this translation helpful? Give feedback.
-
@ggerganov I used your whisper.cpp to create an iOS SwiftUI transcription app - it's in the App Store: |
Beta Was this translation helpful? Give feedback.
-
Hey, curious how this project is going? What's the likelihood of getting something like this to provide real-time translation services on an IPhone? I work in the hospital with an large immigrant population and the translation services we use are SO painful to use. An app like that could be extremely lucrative as it would save physicians and nurses so much time each day... |
Beta Was this translation helpful? Give feedback.
-
About using GPT-3 for translation to other languages than English, I don’t
agree with a paid subscription, would rather pay for app itself and add my
OpenAI API key to use their server directly, not some other server in the
middle.
…On Fri, 11 Nov 2022 at 10:15, Georgi Gerganov ***@***.***> wrote:
@bjnortier <https://github.com/bjnortier>
AFAIK in the medical industry, privacy of protected health information
(PHI) is of great importance, so I believe you cannot simply use an
application that uploads PHI somewhere in the Cloud. Or at least, it will
be difficult to make it comply with regulations. Therefore, I imagine it
would be useful to have a local translation solution that does not require
internet connection.
Additionally, my experience with Whisper is that the transcription
accuracy is really high - I won't be surprised if it is actually the
highest among the generally available ASR frameworks (I have no experience
with other such software, so I could be wrong). If the translation accuracy
is at the level of transcription, then probably a translation application
for a mobile device could be actually useful. I've only played with
translating Bulgarian and the quality is not that great, but maybe other
languages could be better.
@geraldmd <https://github.com/geraldmd>
I will provide a proof-of-concept similar to the above for real-time
translation on iPhone. I think it can easily run the base model and with
some extra work - also the small model. But I don't have plans on making
a full-blown app with all the necessary features. Maybe there would be
interest in others to make a real app out of it.
—
Reply to this email directly, view it on GitHub
<#407 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACXJNRUTR6OUHXLJ2CP3ZG3WHYMEXANCNFSM6AAAAAARNDTCD4>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
@ggerganov This is really cool, awesome job!! Question; do you know the improvement in inference time vs the original python implementation? I'm somewhat surprised that the difference is so extreme, as the pytorch library is written in C/C++. Wouldn't we expect most of the heavy lifting to be done by C/C++ already? |
Beta Was this translation helpful? Give feedback.
-
Hey again,
I recently posted here about my reimplementation of the model in C/C++ and yesterday I even got it running on mobile, so I thought that people would be interested in that as well. First, here is short video demonstration:
whisper-iphone-13-mini-2.mp4
This demo runs the
base.en
model, without internet connection on the device. I was pretty happy with the performance and honestly - quite surprised. Running theencoder
part of the transformer on a single 30 seconds audio chunk takes about 1 second for thebase
model and about 3 seconds for thesmall
model. Thedecoder
part of course depends on the actual audio but it's typically faster compared to theencoder
when usingGreedy
sampling.The model implementation is in pure C/C++, wrapped in C-style API and called from the Objective-C application. I utilize NEON instructions +
Accelerate
framework for efficiency. The implementation [0] and this sample app [1] are open-source and available in thewhisper.cpp
repo.I think this kind of performance allows for some real-world mobile application using Whisper - at least if you can afford to put the model data in your app. So far I have tested this only on iPhone 13 Mini, so it would be interesting if someone gives this a try on other iPhones and reports some results.
Also, wanted to say again that this Whisper model is very interesting to me and you guys at OpenAI have done a great job. Reimplementing this during the past few weeks was a very fun project and I learned quite a lot of stuff about transformers and linear algebra optimisations.
Thanks again!
[0] - https://github.com/ggerganov/whisper.cpp
[1] - https://github.com/ggerganov/whisper.cpp/tree/master/examples/whisper.objc
Beta Was this translation helpful? Give feedback.
All reactions