Swift/iOS wrapper for TFLite libdeepspeech #3061

reuben · 2020-06-14T10:02:53Z

reuben · 2020-06-15T15:35:50Z

It looks like building a fat binary for x86_64 and arm64 is not useful because you still need two separate frameworks for device and simulator, as they use different SDKs.

I've edited the first comment to reflect this.

reuben · 2020-06-30T12:49:23Z

Some initial tests on device (iPhone Xs), averaged across 3 runs, with the 0.7.4 models:

no scorer, cold cache: RTF 0.60x
no scorer, warm cache: RTF 0.48x
with scorer, cold cache: RTF 0.55x
with scorer, warm cache: RTF 0.24x

reuben · 2020-06-30T18:41:19Z

@lissyx could you give me a quick overview of what it would take to test changes to macOS workers in an isolated environment? Like, say, adding a macos-heavy-b worker type and spawning tasks with it to test.

reuben · 2020-06-30T18:46:41Z

My probably incomplete idea:

Make PR against https://github.com/mozilla/community-tc-config/blob/master/config/projects/deepspeech.yml adding new worker instance with -b type.
Wait for it to be landed and deployed.
Make a copy of one of the existing worker images, change the worker type, make other modifications, start VM.
Spawn tasks against new worker type.

lissyx · 2020-06-30T20:30:00Z

My probably incomplete idea:

1. Make PR against https://github.com/mozilla/community-tc-config/blob/master/config/projects/deepspeech.yml adding new worker instance with -b type.

2. Wait for it to be landed and deployed.

3. Make a copy of one of the existing worker images, change the worker type, make other modifications, start VM.

4. Spawn tasks against new worker type.

You'd have to use the script that is on the worker #1 prov.sh, and you'd have to update the base image prior to that because I have not done it:

generic-worker version
taskclusterProxy
generic-worker.json config update

We don't have a nicer way to spin new workers mostly because it's not something we needed to do often and because it'd require again much more tooling. Given the current status of our macOS workers ...

Doing that in parallel of running existing infra is likely to be complicated because of ... resources (CPU / RAM). I thought disk would be an issue but that should be fine.

reuben · 2020-06-30T20:43:59Z

You'd have to use the script that is on the worker #1 prov.sh, and you'd have to update the base image prior to that because I have not done it:
* `generic-worker` version

* `taskclusterProxy`

* `generic-worker.json` config update

I assume I can find the appropriate versions and config changes by inspecting the currently running VMs, right? In that case, I just need the IP of worker 1 to get started.

We don't have a nicer way to spin new workers mostly because it's not something we needed to do often and because it'd require again much more tooling. Given the current status of our macOS workers ...

Yeah. I thought about making a VM copy of a worker to side-step these provisioning issues but I guess that's also prone to causing problems.

Doing that in parallel of running existing infra is likely to be complicated because of ... resources (CPU / RAM). I thought disk would be an issue but that should be fine.

I would probably run that worker on my personal machine while I test it, since it's not meant for general availability.

lissyx · 2020-06-30T20:47:30Z

I assume I can find the appropriate versions and config changes by inspecting the currently running VMs, right? In that case, I just need the IP of worker 1 to get started.

Indeed, you can fetch the json config. IPs on matrix.

I would probably run that worker on my personal machine while I test it, since it's not meant for general availability.

Be aware the existing workers if you copy them to your system are meant for VMWare Fusion Pro

reuben · 2020-06-30T20:52:59Z

Be aware the existing workers if you copy them to your system are meant for VMWare Fusion Pro

Ah, yes, that was also one of the complications. OK, thanks.

reuben · 2020-07-08T15:53:57Z

(Hopefully) finished wrapping the C API, now moving on to CI work. Wrapper is here if anyone wants to take a quick look and provide any suggestions: https://github.com/mozilla/DeepSpeech/blob/ios-build/native_client/swift/deepspeech_ios/DeepSpeech.swift

reuben · 2020-07-08T15:58:24Z

In particular I'd be very interested in any feedback from Swift developers on how the error handling looks and how the buffer handling around Model.speechToText and Stream.feedAudioContent looks.

reuben · 2020-07-08T16:20:41Z

PR for adding macos-heavy-b and macos-light-b worker type and instances: taskcluster/community-tc-config#308

reuben · 2020-07-11T20:22:01Z

Looks like adding Xcode to the worker brings the free space to under 10GB which stops the taskcluster client from picking up any jobs... Resizing the partition does not seem to work. My next step will be to create a worker from scratch with a larger disk image.

reuben · 2020-07-13T12:44:47Z

@dabinat tagging you because you mentioned interest in these bindings in the CoreML issue, in case you have anything to mention regarding the design of the bindings here.

erksch · 2020-07-17T13:49:56Z

@reuben super awesome that you are working on this! This is actually perfect timing as we are looking for an offline speech recognition for iOS right now. I know it's not finished yet, but could you provide a small guide on how I could try it out, is the .so library already available somewhere? Maybe then I could also help with writing an example app, if you wish.

lissyx · 2020-07-17T13:58:48Z

@reuben super awesome that you are working on this! This is actually perfect timing as we are looking for an offline speech recognition for iOS right now. I know it's not finished yet, but could you provide a small guide on how I could try it out, is the .so library already available somewhere? Maybe then I could also help with writing an example app, if you wish.

on taskcluster, if you browse to the iOS artifacts sections you should get it

lissyx · 2020-07-17T13:59:54Z

e.g., https://community-tc.services.mozilla.com/tasks/BO-_Adi2Th2DOswsw9m1Kw#artifacts

erksch · 2020-07-17T14:30:35Z

Thank you! I'll try it out!

erksch · 2020-07-17T16:53:23Z

I tried it out with the current 0.7.4 models from the release page and one of the audio files from there.

TensorFlow: v2.2.0-17-g0854bb5188
DeepSpeech: v0.9.0-alpha.2-34-gdd20d35c
2020-07-17 09:48:31.981587-0700 deepspeech_ios_test[9411:91040] Initialized TensorFlow Lite runtime.
/private/var/containers/Bundle/Application/F5F8492E-D9B8-4BC4-AF46-29CEB23FC3A6/deepspeech_ios_test.app/4507-16021-0012.wav
read 8192 samples
(lldb)

And then comes this error

Thread 5: EXC_BAD_ACCESS (code=1, address=0x177000075)

in this line of DeepSpeech.swift

178    public func feedAudioContent(buffer: UnsafeBufferPointer<Int16>) {
180        precondition(streamCtx != nil, "calling method on invalidated Stream")
181        
182        DS_FeedAudioContent(streamCtx, buffer.baseAddress, UInt32(buffer.count)) <<<<< Thread 5: EXC_BAD_ACCESS (code=1, address=0x177000075)
183   }

Just leaving this here, but I don't want to bother too much before this is even called finished :D

Update: A sorry I saw the version of DeepSpeech and I guess the models are not compatible.

reuben · 2020-07-19T17:37:29Z

The models should be compatible. I don't know what's going on there, can't reproduce it locally...

reuben · 2020-07-19T17:37:48Z

Does the log have any more details for this error?

reuben · 2020-07-19T17:38:39Z

Also, maybe double check the signing options in Xcode? At some point when writing the bindings I ran into some runtime exceptions due to incorrect signing options.

erksch · 2020-07-19T19:12:32Z

You're right. Since it crashes when communication with library happens, it's probably just included incorrectly.

So to go through what I tried:

Cloning the repo and checking out the ios-build branch
In XCode, set signing for the targets deepspeech_ios and deepspeech_ios_test to my team and adjust the bundle identifier to one of mine
Trying to run deepspeech_ios_test on my device
-> Build failed with

clang: error: no such file or directory: '[...]/DeepSpeech/native_client/swift/libdeepspeech.so'
Command Ld failed with a nonzero exit code

So downloading the ARM library from the link that you provided and moving it to the given destination
Trying to run again
-> Build succeeded but runtime error

dyld: Library not loaded: @rpath/deepspeech_ios.framework/deepspeech_ios
  Referenced from: /private/var/containers/Bundle/Application/E9B900F3-5F4C-466D-BB03-E97F5588A768/deepspeech_ios_test.app/deepspeech_ios_test
  Reason: image not found
(lldb) bt 
* thread #1, stop reason = signal SIGABRT

The error in the post before is after trying some things in the "Frameworks, Libraries and Embedded Content Section".
After doing a fresh start and just adding the library like explained above I get the following configs:

deepspeech_ios_test target

deepspeech_ios target

What do you have there? Maybe some things have to be switched to embed & sign?

erksch · 2020-07-19T19:22:42Z

After setting deepspeech_ios.framework to Embed & Sign in the deepspeech_ios_test target (which is just a random tryout), the code at least passes until the DS_FeedAudioContent, and the error occurs that I mentioned in the first post.

Here is the full log I got for that error.

* thread #2, queue = 'com.apple.avfoundation.avasset.completionsQueue', stop reason = EXC_BAD_ACCESS (code=2, address=0x130800076)
    frame #0: 0x0000000103891494 libdeepspeech.so`___lldb_unnamed_symbol5$$libdeepspeech.so + 360
  * frame #1: 0x000000010385df68 deepspeech_ios`DeepSpeechStream.feedAudioContent(buffer=(_position = 0x000000016502d200, count = 8192), self=(streamCtx = 0x0000000162f32f40)) at DeepSpeech.swift:181:9
    frame #2: 0x00000001028b20b0 deepspeech_ios_test`closure #1 in render(samples=Swift.UnsafeRawBufferPointer @ 0x000000016d5e0100, stream=(streamCtx = 0x0000000162f32f40)) at AppDelegate.swift:126:20
    frame #3: 0x00000001028b2138 deepspeech_ios_test`thunk for @callee_guaranteed (@unowned UnsafeRawBufferPointer) -> (@error @owned Error) at <compiler-generated>:0
    frame #4: 0x00000001028b2198 deepspeech_ios_test`partial apply for thunk for @callee_guaranteed (@unowned UnsafeRawBufferPointer) -> (@error @owned Error) at <compiler-generated>:0
    frame #5: 0x00000001b810c348 libswiftFoundation.dylib`Foundation.Data.withUnsafeBytes<A>((Swift.UnsafeRawBufferPointer) throws -> A) throws -> A + 504
    frame #6: 0x00000001028b097c deepspeech_ios_test`render(audioContext=0x000000016760bd10, stream=(streamCtx = 0x0000000162f32f40)) at AppDelegate.swift:124:22
    frame #7: 0x00000001028b30c8 deepspeech_ios_test`closure #1 in test(audioContext=0x000000016760bd10, stream=(streamCtx = 0x0000000162f32f40), audioPath="/private/var/containers/Bundle/Application/D6D001A2-07F7-4BD3-80E9-9DBECCA975E8/deepspeech_ios_test.app/4507-16021-0012.wav", start=2020-07-19 21:19:15 CEST, completion=0x00000001028b83c8 deepspeech_ios_test`partial apply forwarder for closure #1 () -> () in closure #1 () -> () in deepspeech_ios_test.AppDelegate.application(_: __C.UIApplication, didFinishLaunchingWithOptions: Swift.Optional<Swift.Dictionary<__C.UIApplicationLaunchOptionsKey, Any>>) -> Swift.Bool at <compiler-generated>) at AppDelegate.swift:174:9
    frame #8: 0x00000001028acfc0 deepspeech_ios_test`closure #1 in static AudioContext.load(asset=0x000000016365fd40, assetTrack=0x00000001637212f0, audioURL=Foundation.URL @ 0x0000000163674090, completionHandler=0x00000001028b375c deepspeech_ios_test`partial apply forwarder for closure #1 (Swift.Optional<deepspeech_ios_test.AudioContext>) -> () in deepspeech_ios_test.test(model: deepspeech_ios.DeepSpeechModel, audioPath: Swift.String, completion: () -> ()) -> () at <compiler-generated>) at AppDelegate.swift:59:17
    frame #9: 0x00000001028ad9b0 deepspeech_ios_test`thunk for @escaping @callee_guaranteed () -> () at <compiler-generated>:0
    frame #10: 0x0000000102c0fefc libclang_rt.asan_ios_dynamic.dylib`__wrap_dispatch_async_block_invoke + 196
    frame #11: 0x0000000104ec605c libdispatch.dylib`_dispatch_call_block_and_release + 32
    frame #12: 0x0000000104ec74d8 libdispatch.dylib`_dispatch_client_callout + 20
    frame #13: 0x0000000104ecec20 libdispatch.dylib`_dispatch_lane_serial_drain + 720
    frame #14: 0x0000000104ecf834 libdispatch.dylib`_dispatch_lane_invoke + 440
    frame #15: 0x0000000104edb270 libdispatch.dylib`_dispatch_workloop_worker_thread + 1344
    frame #16: 0x00000001816a7718 libsystem_pthread.dylib`_pthread_wqthread + 276

dabinat · 2020-07-20T07:39:53Z

I tried a simple test app where I loaded a pre-converted file of a few seconds into memory and called DeepSpeechModel.speechToText. There are no crashes or anything, but the resulting text string is empty.

It seemed from the header like all I had to do was initialize a DeepSpeechModel with the .tflite file and then call speechToText with the buffer. Did I miss a step? Do I need to setup a streaming context even if I’m not streaming?

reuben · 2020-07-20T07:41:40Z

It seemed from the header like all I had to do was initialize a DeepSpeechModel with the .tflite file and then call speechToText with the buffer. Did I miss a step? Do I need to setup a streaming context even if I’m not streaming?

That's correct, you don't need to setup a streaming context.

zaptrem · 2020-12-26T10:17:55Z

@reuben Thanks, that solved the Android issue! It also allowed me to compare memory usage between iOS and Android. On the D-Day speech, Android climbs to ~200mb usage almost immediately and stays there, whereas iOS climbs more slowly to a similar number (or crashes on longer speeches). It seems like you were right to assume there was no memory leak, though it's weird to me that memory use is >10x the size of the input file.

However, on longer files, the crash on iOS is still present. Memory usage on Android balloons to >500MB on the Reagan RNC speech and run for a long while (I'm virtualizing Android on a beefy PC), so I'll update if it finishes.

Additionally, (this might be related to the -bitexact issue), on iOS on the shorter speeches there are long periods of sparse (one word for a minute of speech) transcription in between chunks of near-perfect transcription. The audio quality doesn't noticeably decrease in those areas when I listen.

Finally, in the process of updating react-native-transcription to the latest DeepSpeech version I noticed that Android DeepSpeech 0.9.3 hadn't been published to Maven yet, only 0.9.2.

zaptrem · 2020-12-27T08:58:10Z

@reuben Another interesting bug I discovered while trying to find clues: the iOS example happily transcribed non-16khz wav files while Android example refused them. Also weird is Android and iOS give different transcriptions for the same noisy input despite using the exact same pretrained model and audio files.

fender · 2020-12-27T09:01:23Z

@zaptrem @CatalinVoss What's the installation process for testing on iOS? There isn't a cocoapod yet is there?

zaptrem · 2020-12-27T09:07:16Z

@fender You've gotta compile the framework yourself using these instructions. You can use the Swift example in the repo.

lissyx · 2020-12-27T12:03:04Z

@reuben Another interesting bug I discovered while trying to find clues: the iOS example happily transcribed non-16khz wav files while Android example refused them. Also weird is Android and iOS give different transcriptions for the same noisy input despite using the exact same pretrained model and audio files.

Its not surprising and not a bug, @reuben already explained it: android example wav reading is hardcoded to reduce dependencies required by CI, so it only handles 16khz and it's not robust to broken wav files.

zaptrem · 2021-01-12T09:48:06Z

@reuben Trying again in case I sent it too early in the morning last week.

@CatalinVoss Are you able to reproduce this with your binary?

CatalinVoss · 2021-01-13T06:22:05Z

Can you get a backtrace from inside deepspeech by typing bt on the lldb console? I do see one sporadic issue that crashes within tensorflow in my application, but it's fairly rare.

zaptrem · 2021-01-13T06:33:05Z

@CatalinVoss Here you go. I had to use a text file since I exceeded pastebin's size limit: deepSpeechCrashST.txt

CatalinVoss · 2021-01-13T06:44:12Z

Ah. The relevant part is at the bottom

deepspeech_ios`PathTrie::iterate_to_vec(std::__1::vector<PathTrie*, std::__1::allocator<PathTrie*> >&) + 64
    frame #4742: 0x000000010fee1114 deepspeech_ios`DecoderState::next(double const*, int, int) + 404
    frame #4743: 0x000000010fd4b470 deepspeech_ios`StreamingState::processBatch(std::__1::vector<float, std::__1::allocator<float> > const&, unsigned int) + 296
    frame #4744: 0x000000010fd4b300 deepspeech_ios`StreamingState::pushMfccBuffer(std::__1::vector<float, std::__1::allocator<float> > const&) + 236
    frame #4745: 0x000000010fd4ae88 deepspeech_ios`StreamingState::feedAudioContent(short const*, unsigned int) + 396
  * frame #4746: 0x0000000105f1c808 ReLearn`closure #1 in Transcription.render(samples=Swift.UnsafeRawBufferPointer @ 0x000000016b295c30, stream=(streamCtx = 0x000000011031e420)) at Transcription.swift:85:24
    frame #4747: 0x0000000105f1c830 ReLearn`thunk for @callee_guaranteed (@unowned UnsafeRawBufferPointer) -> (@error @owned Error) at <compiler-generated>:0
    frame #4748: 0x0000000105f1e3d0 ReLearn`partial apply for thunk for @callee_guaranteed (@unowned UnsafeRawBufferPointer) -> (@error @owned Error) at <compiler-generated>:0
    frame #4749: 0x00000001f031ca8c libswiftFoundation.dylib`Foundation.Data.withUnsafeBytes<A>((Swift.UnsafeRawBufferPointer) throws -> A) throws -> A + 504
    frame #4750: 0x0000000105f1bf68 ReLearn`Transcription.render(audioContext=0x0000000280435b30, stream=(streamCtx = 0x000000011031e420), self=0x00000002805e2130) at Transcription.swift:83:26
    frame #4751: 0x0000000105f1ceb8 ReLearn`closure #1 in Transcription.recognizeFile(audioContext=0x0000000280435b30, self=0x00000002805e2130, stream=(streamCtx = 0x000000011031e420), audioPath=Swift.String @ 0x000000016b2964a8, start=2021-01-13 01:27:08 EST) at Transcription.swift:107:18
    frame #4752: 0x0000000105f1a23c ReLearn`closure #1 in static AudioContext.load(asset=0x0000000280c2a500, assetTrack=0x00000002808f4420, audioURL=<unavailable; try printing with "vo" or "po">, completionHandler=0x0000000105f1e48c ReLearn`partial apply forwarder for closure #1 (Swift.Optional<react_native_transcription.AudioContext>) -> () in react_native_transcription.Transcription.(recognizeFile in _79B1BC2F893AB0135086535C16DBA135)(audioPath: Swift.String) -> () at <compiler-generated>) at AudioContext.swift:58:17
    frame #4753: 0x0000000105ef6490 ReLearn`thunk for @escaping @callee_guaranteed () -> () at <compiler-generated>:0
    frame #4754: 0x00000001101c9d10 libdispatch.dylib`_dispatch_call_block_and_release + 32
    frame #4755: 0x00000001101cb18c libdispatch.dylib`_dispatch_client_callout + 20
    frame #4756: 0x00000001101d2968 libdispatch.dylib`_dispatch_lane_serial_drain + 724
    frame #4757: 0x00000001101d3580 libdispatch.dylib`_dispatch_lane_invoke + 440
    frame #4758: 0x00000001101df0f0 libdispatch.dylib`_dispatch_workloop_worker_thread + 1344
    frame #4759: 0x00000001b96f3714 libsystem_pthread.dylib`_pthread_wqthread + 276
(lldb)

This looks like a crash in the beam search decoder. I'm not actually using the DeepSpeech decoder -- have my own -- so unfortunately I can't help :( it appears to be unrelated to the bug I saw

zaptrem · 2021-01-13T06:48:12Z

@CatalinVoss Thanks for pointing that out! I'm curious; what about your use case required building an alternate decoder? Is it open source?

@reuben What do you make of this error? Or is there a different person who worked on the beam decoder I should ping? Is this something I should move to its own, non-platform-specific bug report?

CatalinVoss · 2021-01-13T06:49:50Z

@CatalinVoss Thanks for pointing that out! I'm curious; what about your use case required building an alternate decoder? Is it open source?

Working on child literacy. Closed source I'm afraid

zaptrem · 2021-01-13T07:13:33Z

@CatalinVoss Cool stuff! I can see why it's necessary in that case, detecting specific mispronunciations is more than hotword-boosting can do. Coincidentally, I stumbled across this Google feature that might interest you about 10 minutes ago in a discussion with someone.

I ran the test again on an iPad Pro late 2018 (before it was an iPhone 11 Pro Max) and got a slightly different result. Does this still indicate Decoder issues?

frame #4739: 0x000000010bb6bcf8 deepspeech_ios`PathTrie::iterate_to_vec(std::__1::vector<PathTrie*, std::__1::allocator<PathTrie*> >&) + 64
    frame #4740: 0x000000010bb6bcf8 deepspeech_ios`PathTrie::iterate_to_vec(std::__1::vector<PathTrie*, std::__1::allocator<PathTrie*> >&) + 64
    frame #4741: 0x000000010ba99114 deepspeech_ios`DecoderState::next(double const*, int, int) + 404
    frame #4742: 0x000000010b903470 deepspeech_ios`StreamingState::processBatch(std::__1::vector<float, std::__1::allocator<float> > const&, unsigned int) + 296
    frame #4743: 0x000000010b903300 deepspeech_ios`StreamingState::pushMfccBuffer(std::__1::vector<float, std::__1::allocator<float> > const&) + 236
    frame #4744: 0x000000010b902e88 deepspeech_ios`StreamingState::feedAudioContent(short const*, unsigned int) + 396
  * frame #4745: 0x0000000101a80808 ReLearn`closure #1 in Transcription.render(samples=Swift.UnsafeRawBufferPointer @ 0x000000017032dc20, stream=(streamCtx = 0x000000010bfd2670)) at Transcription.swift:85:24
    frame #4746: 0x0000000101a80830 ReLearn`thunk for @callee_guaranteed (@unowned UnsafeRawBufferPointer) -> (@error @owned Error) at <compiler-generated>:0
    frame #4747: 0x0000000101a823d0 ReLearn`partial apply for thunk for @callee_guaranteed (@unowned UnsafeRawBufferPointer) -> (@error @owned Error) at <compiler-generated>:0
    frame #4748: 0x00000001942f5174 libswiftFoundation.dylib`Foundation.Data.withUnsafeBytes<A>((Swift.UnsafeRawBufferPointer) throws -> A) throws -> A + 392
    frame #4749: 0x0000000101a7ff68 ReLearn`Transcription.render(audioContext=0x0000000282e94930, stream=(streamCtx = 0x000000010bfd2670), self=0x0000000282f28cc0) at Transcription.swift:83:26
    frame #4750: 0x0000000101a80eb8 ReLearn`closure #1 in Transcription.recognizeFile(audioContext=0x0000000282e94930, self=0x0000000282f28cc0, stream=(streamCtx = 0x000000010bfd2670), audioPath=Swift.String @ 0x000000017032e458, start=2021-01-13 02:00:54 EST) at Transcription.swift:107:18
    frame #4751: 0x0000000101a7e23c ReLearn`closure #1 in static AudioContext.load(asset=0x000000028278c8a0, assetTrack=0x00000002822ea6d0, audioURL=<unavailable; try printing with "vo" or "po">, completionHandler=0x0000000101a8248c ReLearn`partial apply forwarder for closure #1 (Swift.Optional<react_native_transcription.AudioContext>) -> () in react_native_transcription.Transcription.(recognizeFile in _79B1BC2F893AB0135086535C16DBA135)(audioPath: Swift.String) -> () at <compiler-generated>) at AudioContext.swift:58:17
    frame #4752: 0x0000000101a5a490 ReLearn`thunk for @escaping @callee_guaranteed () -> () at <compiler-generated>:0
    frame #4753: 0x000000010bd83bcc libdispatch.dylib`_dispatch_call_block_and_release + 32
    frame #4754: 0x000000010bd856c0 libdispatch.dylib`_dispatch_client_callout + 20
    frame #4755: 0x000000010bd8d354 libdispatch.dylib`_dispatch_lane_serial_drain + 736
    frame #4756: 0x000000010bd8e0c0 libdispatch.dylib`_dispatch_lane_invoke + 448
    frame #4757: 0x000000010bd9a644 libdispatch.dylib`_dispatch_workloop_worker_thread + 1520
    frame #4758: 0x00000001db297804 libsystem_pthread.dylib`_pthread_wqthread + 276
(lldb)

CatalinVoss · 2021-01-13T07:23:37Z

Oh cool I had not seen that! Yes this still looks like decoder. :/

…

On Tue, Jan 12, 2021 at 23:13 zaptrem ***@***.***> wrote: @CatalinVoss <https://github.com/CatalinVoss> Cool stuff! I can see why it's necessary in that case, detecting specific mispronunciations is more than hotword-boosting can do. Coincidentally, I stumbled across this Google feature that might interest you <https://www.theverge.com/2019/11/14/20964401/google-search-pronunciation-guide-feedback-machine-learning-ai> about 10 minutes ago in a discussion with someone. I ran the test again on an iPad Pro late 2018 (before it was an iPhone 11 Pro Max) and got a slightly different result. Does this still indicate Decoder issues? frame #4739: 0x000000010bb6bcf8 deepspeech_ios`PathTrie::iterate_to_vec(std::__1::vector<PathTrie*, std::__1::allocator<PathTrie*> >&) + 64 frame #4740: 0x000000010bb6bcf8 deepspeech_ios`PathTrie::iterate_to_vec(std::__1::vector<PathTrie*, std::__1::allocator<PathTrie*> >&) + 64 frame #4741: 0x000000010ba99114 deepspeech_ios`DecoderState::next(double const*, int, int) + 404 frame #4742: 0x000000010b903470 deepspeech_ios`StreamingState::processBatch(std::__1::vector<float, std::__1::allocator<float> > const&, unsigned int) + 296 frame #4743: 0x000000010b903300 deepspeech_ios`StreamingState::pushMfccBuffer(std::__1::vector<float, std::__1::allocator<float> > const&) + 236 frame #4744: 0x000000010b902e88 deepspeech_ios`StreamingState::feedAudioContent(short const*, unsigned int) + 396 * frame #4745: 0x0000000101a80808 ReLearn`closure #1 in Transcription.render(samples=Swift.UnsafeRawBufferPointer @ 0x000000017032dc20, stream=(streamCtx = 0x000000010bfd2670)) at Transcription.swift:85:24 frame #4746: 0x0000000101a80830 ReLearn`thunk for @callee_guaranteed ***@***.*** UnsafeRawBufferPointer) -> ***@***.*** @owned Error) at <compiler-generated>:0 frame #4747: 0x0000000101a823d0 ReLearn`partial apply for thunk for @callee_guaranteed ***@***.*** UnsafeRawBufferPointer) -> ***@***.*** @owned Error) at <compiler-generated>:0 frame #4748: 0x00000001942f5174 libswiftFoundation.dylib`Foundation.Data.withUnsafeBytes<A>((Swift.UnsafeRawBufferPointer) throws -> A) throws -> A + 392 frame #4749: 0x0000000101a7ff68 ReLearn`Transcription.render(audioContext=0x0000000282e94930, stream=(streamCtx = 0x000000010bfd2670), self=0x0000000282f28cc0) at Transcription.swift:83:26 frame #4750: 0x0000000101a80eb8 ReLearn`closure #1 in Transcription.recognizeFile(audioContext=0x0000000282e94930, self=0x0000000282f28cc0, stream=(streamCtx = 0x000000010bfd2670), audioPath=Swift.String @ 0x000000017032e458, start=2021-01-13 02:00:54 EST) at Transcription.swift:107:18 frame #4751: 0x0000000101a7e23c ReLearn`closure #1 in static AudioContext.load(asset=0x000000028278c8a0, assetTrack=0x00000002822ea6d0, audioURL=<unavailable; try printing with "vo" or "po">, completionHandler=0x0000000101a8248c ReLearn`partial apply forwarder for closure #1 (Swift.Optional<react_native_transcription.AudioContext>) -> () in react_native_transcription.Transcription.(recognizeFile in _79B1BC2F893AB0135086535C16DBA135)(audioPath: Swift.String) -> () at <compiler-generated>) at AudioContext.swift:58:17 frame #4752: 0x0000000101a5a490 ReLearn`thunk for @escaping @callee_guaranteed () -> () at <compiler-generated>:0 frame #4753: 0x000000010bd83bcc libdispatch.dylib`_dispatch_call_block_and_release + 32 frame #4754: 0x000000010bd856c0 libdispatch.dylib`_dispatch_client_callout + 20 frame #4755: 0x000000010bd8d354 libdispatch.dylib`_dispatch_lane_serial_drain + 736 frame #4756: 0x000000010bd8e0c0 libdispatch.dylib`_dispatch_lane_invoke + 448 frame #4757: 0x000000010bd9a644 libdispatch.dylib`_dispatch_workloop_worker_thread + 1520 frame #4758: 0x00000001db297804 libsystem_pthread.dylib`_pthread_wqthread + 276 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3061 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACRFK6SXCOM2VR4MLOUKEDSZVB2ZANCNFSM4N5MDJDQ> .

zaptrem · 2021-01-22T08:57:03Z

Yes, I'm already doing that because the Metadata and Token arrays were (erroneously?) marked as private(set) instead of public private(set).

PRs are welcome.

@reuben It's a bit late, but I just proposed those quick changes here: #3510

I also think there's some more debug info I can provide about the decoder issue, I'll test it tomorrow.

zaptrem · 2021-01-23T08:57:57Z

@reuben Okay, poking at Xcode's debugging options/info is like trying to reverse engineer an alien spaceship at my current knowledge level, so I haven't learned much from a few hours trying that.

However, I did get some interesting results from messing around with the scorer/models. Disabling the external scorer allowed the transcription process to run for much longer before eventually crashing (similar to the length it took my Android build to complete the same transcription task) with the same error. Additionally, using the Chinese model and scorer lasted the same extended amount of time as disabling the external scorer. Also, sometimes across all tests the "reading samples" message varied exactly once near the beginning of the job like so:

...
reading 8192 samples
reading 7425 (or some other nearby number) samples
reading 8192 samples
...

With No/Chinese scorer there is a drop/spike in thread activity right before the crash.
This isn't present on the English model
).

Is the transcription job being run on the AVAudioSession Notify Thread instead of another thread when using the English model? Alternatives have constant utilization on Thread 4 with a spike on the AV Notify Thread before the crash, whereas English has constant utilization on the AV Notify Thread.

Also, the English model resulted in much higher CPU usage during transcription than Chinese/no scorer.

English runtime: 1 minute
None/Chinese runtime: 5 minutes

Unlikely theory: Maybe it's an ARM-specific issue? I've been testing Android with an x86 emulator and have no way to verify this one way or another (my ARM emulator isn't working).

zaptrem · 2021-01-25T18:44:08Z

@CatalinVoss Any idea why it would still be failing when the external scorer is disabled?

zaptrem · 2021-02-01T09:03:36Z

@reuben Does any of this help? Is there potentially an FFMPEG setting that is causing this, since it doesn't happen with non-converted WAV files? It's been over a month since your last response, so is there any more experimentation/testing I can do to help this along?

zaptrem · 2021-02-06T21:22:15Z

@reuben I tried converting from MKV files encoded with vorbis instead of the .ogg I was using before. Didn't work. I tried converting with Audacity manually. Didn't work.

I wonder if it has nothing to do with the conversion and it's actually just the length of the recording. Do you know of any long conversations recorded in PCM 16 natively we can test with?

I'm lead to this conclusion not just from my testing, but because @CatalinVoss suggested it's a decoder issue (implying to me the audio file had already been successfully read/inference had completed).

EDIT: Recording the WAV natively with Audacity results in no crash. Exporting the exact same recording as OGG and converting with FFMPEG crashes.

zaptrem · 2021-02-09T05:25:53Z

@CatalinVoss (and @reuben if you're still here) I continued testing and I think the issue might(????) have something to do a bug in render() caused by however FFMPEG/SoX converts 48000mhz to 16000mhz.

I rewrote the render function in iOS to be much simpler and (I would think) more robust due to the utilization of higher-level Apple APIs that are file-format independent. The crash disappeared (hooray!) but I'm now getting jibberish short transcriptions (whereas the same files on Android give me acceptable results). Any idea where I went wrong here?:

private func newRender(url: URL, stream: DeepSpeechStream) {
        let file = try! AVAudioFile(forReading: url)
        guard let format = AVAudioFormat(commonFormat: .pcmFormatInt16, sampleRate: file.fileFormat.sampleRate, channels: 1, interleaved: false) else { fatalError("Couldn't read the audio file") }
        print("reading file")
        var done = false;
        while(!done){
            let buf = AVAudioPCMBuffer(pcmFormat: format, frameCapacity: 8096)!
            do {
                try file.read(into: buf)
            } catch {
                print("end of file");
                done = true
                return;
            }
            print("read 8096 frames")
            let floatBufPointer = UnsafeBufferPointer(start: buf.int16ChannelData?[0], count: Int(buf.frameLength))
            stream.feedAudioContent(buffer: floatBufPointer);
        }
    }

CatalinVoss · 2021-02-09T19:45:26Z

Sorry for the delay.

I never used the audio file recognition path that goes through render() so an error there could explain why you're facing issues and I'm not. I'm just capturing mic output.

As a hackky workaround, are you just calling finishStream() on the stream once at the very end? Is there any way you can split your monsterous audio file into chunks, passing a few chunks at at a time and "finalize" the stream a few times in between?

Another way to isolate the issue may be to see if you encounter the issue with long-running mic detection (which, again, I don't see), but I'm not doing hours and hours of it either.

As for your newRender(), hard to debug without playing with it. This kinda stuff is tricky to get right, but you only have to do it once. I am not sure why you're calling this guy a floatBufPointer, since the stream takes 16-bit int samples. With what you'e doing now, it certainly looks like you need to be 100% sure that your input is PCM single channel at 16 kHz.

zaptrem · 2021-02-09T19:52:06Z

@CatalinVoss Thanks for getting back! I could split it into chunks, but it would likely mess with transcriptions mid-sentence and paragraph, as well as reset the time part of tokens (I'm using intermediateDecodeWithMetadata as finishStreamWithMetadata also has an unknown crash... one problem at a time).

It's called floatBufPointer because originally I was passing floats instead of 16bit ints and I forgot to change it. As you can see here:

 guard let format = AVAudioFormat(commonFormat: .pcmFormatInt16, sampleRate: file.fileFormat.sampleRate, channels: 1, interleaved: false) else { fatalError("Couldn't read the audio file") }

...

let floatBufPointer = UnsafeBufferPointer(start: buf.int16ChannelData?[0], count: Int(buf.frameLength))

I am passing the int16ChannelData. TypeOf tells me this is a pointer to an int16 array. I can also wrap it in a swift Array() and typeOf is int16[]. Weirdly enough, when I pass the Swift Array() I get a different (but still jibberish) result.

Would it be helpful if I opened a pull request with these changes so they're easy for others to look at?

zaptrem · 2021-02-11T09:30:28Z

@reuben Any idea what I’m doing wrong in the proposed Swift code above?

zaptrem · 2021-02-13T10:36:15Z

@reuben Can you respond with an estimate on when you can look at this? It's been over a month and a half. I've given it my best shot but have run out of ideas for now.

zaptrem · 2021-02-26T02:18:09Z

@reuben @lissyx I gave up on fixing this and just took the accuracy hit from splitting the audio file into 5-minute segments. I've published the app using this, ReLearn, to the Android/iOS App Stores. It uses DeepSpeech to transcribe long video recordings of lectures for free on-device in the background (it also transcribes audio in-person recordings on Android, but uses Apple's solution for that on iOS for now).

martin642 · 2021-02-26T16:47:41Z

Fantastic job

lustig-bakkt · 2021-09-05T18:41:29Z

@reuben @lissyx I gave up on fixing this and just took the accuracy hit from splitting the audio file into 5-minute segments. I've published the app using this, ReLearn, to the Android/iOS App Stores. It uses DeepSpeech to transcribe long video recordings of lectures for free on-device in the background (it also transcribes audio in-person recordings on Android, but uses Apple's solution for that on iOS for now).

@zaptrem I've been wanting to build something very similar to this but just ran up against Apple's one minute live transcription limit with SFSpeechRecognizer. How are you getting around this in ReLearn? Any chance you'd like to collaborate on some live speech / mind mapping type applications?

reuben self-assigned this Jun 14, 2020

kdavis-mozilla added this to To do in Deep Speech 0.8.0 via automation Jun 15, 2020

kdavis-mozilla moved this from To do to In progress in Deep Speech 0.8.0 Jun 18, 2020

reuben moved this from In progress to Review in progress in Deep Speech 0.8.0 Jun 30, 2020

reuben moved this from Review in progress to In progress in Deep Speech 0.8.0 Jun 30, 2020

mozilla deleted a comment from zaptrem Jan 12, 2021

zaptrem mentioned this issue Feb 11, 2021

DeepSpeech iOS Compile Error in XCode 12.4 #3525

Closed

weipin mentioned this issue Jun 27, 2021

Swift Client: Fix double deallocation #3668

Open

Swift/iOS wrapper for TFLite libdeepspeech #3061

Swift/iOS wrapper for TFLite libdeepspeech #3061

Comments

reuben commented Jun 14, 2020 • edited by stepkillah Loading

reuben commented Jun 15, 2020 • edited Loading

reuben commented Jun 30, 2020 • edited Loading

reuben commented Jun 30, 2020

reuben commented Jun 30, 2020 • edited Loading

lissyx commented Jun 30, 2020

reuben commented Jun 30, 2020

lissyx commented Jun 30, 2020

reuben commented Jun 30, 2020

reuben commented Jul 8, 2020

reuben commented Jul 8, 2020

reuben commented Jul 8, 2020

reuben commented Jul 11, 2020

reuben commented Jul 13, 2020

erksch commented Jul 17, 2020 • edited Loading

lissyx commented Jul 17, 2020

lissyx commented Jul 17, 2020

erksch commented Jul 17, 2020

erksch commented Jul 17, 2020 • edited Loading

reuben commented Jul 19, 2020

reuben commented Jul 19, 2020

reuben commented Jul 19, 2020

erksch commented Jul 19, 2020 • edited Loading

erksch commented Jul 19, 2020

dabinat commented Jul 20, 2020

reuben commented Jul 20, 2020

zaptrem commented Dec 26, 2020 • edited Loading

zaptrem commented Dec 27, 2020

fender commented Dec 27, 2020

zaptrem commented Dec 27, 2020

lissyx commented Dec 27, 2020

zaptrem commented Jan 12, 2021

CatalinVoss commented Jan 13, 2021

zaptrem commented Jan 13, 2021

CatalinVoss commented Jan 13, 2021

zaptrem commented Jan 13, 2021

CatalinVoss commented Jan 13, 2021

zaptrem commented Jan 13, 2021 • edited Loading

CatalinVoss commented Jan 13, 2021 via email

zaptrem commented Jan 22, 2021

zaptrem commented Jan 23, 2021 • edited Loading

zaptrem commented Jan 25, 2021

zaptrem commented Feb 1, 2021 • edited Loading

zaptrem commented Feb 6, 2021 • edited Loading

zaptrem commented Feb 9, 2021 • edited Loading

CatalinVoss commented Feb 9, 2021

zaptrem commented Feb 9, 2021 • edited Loading

zaptrem commented Feb 11, 2021

zaptrem commented Feb 13, 2021

zaptrem commented Feb 26, 2021

martin642 commented Feb 26, 2021

lustig-bakkt commented Sep 5, 2021

reuben commented Jun 14, 2020 •

edited by stepkillah

Loading

reuben commented Jun 15, 2020 •

edited

Loading

reuben commented Jun 30, 2020 •

edited

Loading

reuben commented Jun 30, 2020 •

edited

Loading

erksch commented Jul 17, 2020 •

edited

Loading

erksch commented Jul 17, 2020 •

edited

Loading

erksch commented Jul 19, 2020 •

edited

Loading

zaptrem commented Dec 26, 2020 •

edited

Loading

zaptrem commented Jan 13, 2021 •

edited

Loading

zaptrem commented Jan 23, 2021 •

edited

Loading

zaptrem commented Feb 1, 2021 •

edited

Loading

zaptrem commented Feb 6, 2021 •

edited

Loading

zaptrem commented Feb 9, 2021 •

edited

Loading

zaptrem commented Feb 9, 2021 •

edited

Loading