Skip to content

👂 RxJS operator for speech-to-text using DeepSpeech

License

Notifications You must be signed in to change notification settings

rxtoolkit/stt-deepspeech

Repository files navigation

@rxtk/toDeepSpeech

👂 An RxJS operator for real-time speech-to-text (STT/S2T) streaming using the opensource DeepSpeech library.

npm i @rxtk/stt-deepspeech
yarn add @rxtk/stt-deepspeech

⚠️ To run the DeepSpeech pipeline, you must download the corresponding DeepSpeech model, unzip it and pass the model directory to the toDeepSpeech operator like this: toDeepSpeech({modelDir: 'path/to/deepseech-models-0.7.0'}).

⚠️ node.js only. This has not been tested on Browsers but it might be possible to make it work. If you get it working, please make a PR!

API

toDeepSpeech

Stream audio speech data to DeepSpeech and get transcripts back:

import {map} from 'rxjs/operators';
import {toDeepSpeech} from '@rxtk/stt-deepspeech';

// The pipeline takes a stream of audio chunks encoded as LINEAR16 (PCM encoded as 16-bit integers) (Buffer, String, Blob or Typed Array)
const buffer$ = pcmChunkEncodedAs16BitIntegers$.pipe(
  map(chunk => Buffer.from(chunk, 'base64')),
  toDeepSpeech({modelDir: '/path/to/deepspeech-models-0.7.0'})
);
buffer$.subscribe(console.log); // log transcript output

⚠️ Pay attention to the endcoding of the audio data. The operator only accepts PCM data encoded as 16-bit integers. For example, LINEAR16 encoding usually works.

Guides