Add basic speech-to-text functionality #33

hauptdigital · 2020-04-21T07:21:45Z

In this pull request I added the basic speech-to-text functionality on the website. Users are now able to start and stop recording of notes, speak English text and the app will output transscribed text.

Test it here:
https://dev.deepspeech-notes.haupt.digital/

You have to speak very clearly and a good microphone will also help the accuracy of the language processing.

To create this feature, I developed this process using different technologies:

Web Audio API (https://developer.mozilla.org/de/docs/Web/API/Web_Audio_API)
Voice activity detection (node-vad npm module)
deepspeech (deepspeech npm module)
socket.io (socket.io and socket.io-client npm modules)
downsampler code from web-voice-processor npm module

To document the functionality, I added this table:

	File	Order	Task
Back end	server.js model.js	1	Create deepspeech language model on Express server start
Back end	server.js socker.js	2	Start socket on Express server start
Front end	Notes.js audio.js	3	Create media stream source from browser microphone and start socket on user interaction with microphone
Front end	audio.js voice-processor.js	4	Create mono audio buffer from microphone media stream
Front end	audio.js downsampler.js	5	Resample audio buffer to 16.000 sample rate
Front end	audio.js	6	Send buffer to backend via socket
Back end	audio.js	7	Identify voice bits in received audio with voice activity detection module
Back end	audio.js model.js	8	Transcribe identified voice bits into text
Back end	audio.js socket.js	9	Emit transcribed text to front end via socket
Front end	Notes.js audio.js	10	Display transcribed text on page
Front end	Notes.js audio.js	11	Close media stream, audio processing and socket on user interaction (deactivate microphone)

…anguage model processing

lmachens

Awesome 🎉
Recording works for me too.

I reviewed most of the files and have to say, that I am very impressed 👍 . Keep up the good work.
Some of my comments are code style-related. It's up to you :).

client/src/components/RecordButton.js

client/src/pages/Notes.js

lmachens · 2020-04-22T17:36:09Z

client/src/pages/Notes.js

+    if (!isRecording) {
+      setIsRecording(startRecording());
+      const socket = getSocket();
+      socket.on('recognize', (results) => {
+        updateNoteContent(results.text);
+      });
+    } else {
+      setIsRecording(stopRecording().isRecording);
+    }


Looks like you have a memory leak here.
This event listener is never destroyed.
Every time you start a new recording, it will create a new listener on the recognize event. This listener (or callback) remains in memory and might be executed.

It's important to remove that listener when you stop recording:
https://socket.io/docs/server-api/#socket-removeListener-eventName-listener
This issue happens in multiple files.

function handleRecognize(recognized) { updateNoteContent(recognized.text); } function handleRecordButtonClick() { const socket = getSocket(); if (!isRecording) { setIsRecording(startRecording()); socket.on('recognize', handleRecognize); } else { socket.removeListener('recognize', handleRecognize); setIsRecording(stopRecording().isRecording); } }

This would be a good case for an useEffect if you want to have it more React-like (I recommend this solution):

React.useEffect(() => { if (!isRecording) { return; } function handleRecognize(recognized) { updateNoteContent(recognized.text); } const socket = getSocket(); socket.on('recognize', handleRecognize); return () => { socket.removeListener('recognize', handleRecognize); } }, [isRecording]); function handleRecordButtonClick() { if (!isRecording) { setIsRecording(startRecording()); } else { setIsRecording(stopRecording().isRecording); } }

I applied your solution with useEffect and it works. Thanks! But I don't know why it works 😅

What is the meaning of this:

return () => { socket.removeListener('recognize', handleRecognize); }

To me it looks like this would directly remove the socket.on event listener. But when I run my application, I see that everything works as intended.

Take a look at the useEffect documentation to understand what the return function is used for :)

client/src/utils/audio.js

src/audio.js

Co-Authored-By: Leon Machens <leon.machens@googlemail.com>

…s into recordaudio

Co-Authored-By: Leon Machens <leon.machens@googlemail.com>

…s into recordaudio

hauptdigital added 2 commits April 19, 2020 19:34

Install web-voice-processor

c9b565d

Add preliminary speech-to-text functionality

3006249

hauptdigital self-assigned this Apr 21, 2020

hauptdigital added backend Backend frontend Frontend react React labels Apr 21, 2020

hauptdigital added this to In progress in Release 1 (MVP) via automation Apr 21, 2020

hauptdigital added this to the Sprint 3 milestone Apr 21, 2020

hauptdigital changed the title ~~Recordaudio~~ Add basic speech-to-text functionality Apr 21, 2020

hauptdigital added 6 commits April 21, 2020 09:25

Merge with master

a2f4fa3

Refactor frontend audio processing functionality

532f369

Add backend audio processing, socket interaction and text-to-speech l…

4cd4846

…anguage model processing

Add stop recording functionality and adjust components to work with it

08aa8c7

Complete stream reset functionality

2eecc00

Output transcribed text on notes site

9795f06

hauptdigital marked this pull request as ready for review April 22, 2020 16:28

lmachens reviewed Apr 22, 2020

View reviewed changes

hauptdigital and others added 13 commits April 22, 2020 21:13

Remove console.log from downsampling worker

88b5f6e

Rename property

0f2a30b

Fix memory leaks

5bed21d

Fix socket port access in react app

bf312b6

Rename create model function

fa6f550

Update src/audio.js

fa0b6d8

Co-Authored-By: Leon Machens <leon.machens@googlemail.com>

Merge branch 'recordaudio' of github.com:hauptdigital/deepspeech-note…

a3f9066

…s into recordaudio

Update get socket function to use terniary operator

6cc902e

Update src/audio.js

3a22db0

Co-Authored-By: Leon Machens <leon.machens@googlemail.com>

Update src/audio.js

4db7d73

Co-Authored-By: Leon Machens <leon.machens@googlemail.com>

Update src/audio.js

b9b6218

Co-Authored-By: Leon Machens <leon.machens@googlemail.com>

Merge branch 'recordaudio' of github.com:hauptdigital/deepspeech-note…

3977491

…s into recordaudio

Fix deepscan error

8d00d03

hauptdigital merged commit 44503f9 into master Apr 22, 2020

Release 1 (MVP) automation moved this from In progress to Done Apr 22, 2020

hauptdigital deleted the recordaudio branch April 22, 2020 20:27

hauptdigital mentioned this pull request Apr 23, 2020

Add microphone component with active / inactive state #23

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add basic speech-to-text functionality #33

Add basic speech-to-text functionality #33

hauptdigital commented Apr 21, 2020 •

edited

lmachens left a comment

lmachens Apr 22, 2020

hauptdigital Apr 22, 2020

lmachens Apr 23, 2020

Add basic speech-to-text functionality #33

Add basic speech-to-text functionality #33

Conversation

hauptdigital commented Apr 21, 2020 • edited

lmachens left a comment

Choose a reason for hiding this comment

lmachens Apr 22, 2020

Choose a reason for hiding this comment

hauptdigital Apr 22, 2020

Choose a reason for hiding this comment

lmachens Apr 23, 2020

Choose a reason for hiding this comment

hauptdigital commented Apr 21, 2020 •

edited