Merge branch 'vosk'

roboticslab-uc3m · Jul 2, 2023 · 4114c56 · 4114c56
2 parents 1a3fd37 + c89fbc4
commit 4114c56
Show file tree

Hide file tree

Showing 3 changed files with 261 additions and 203 deletions.
diff --git a/doc/speech-install.md b/doc/speech-install.md
@@ -7,14 +7,12 @@
 - [Install YARP 3.7+](https://github.com/roboticslab-uc3m/installation-guides/blob/master/install-yarp.md)
 with [Python bindings](https://github.com/roboticslab-uc3m/installation-guides/blob/master/install-yarp.md#install-python-bindings), the latter for `speechRecognition.py` (ASR)
 - [Install eSpeak with MBROLA Voices](https://github.com/roboticslab-uc3m/installation-guides/blob/master/install-espeak-mbrola.md) of `Espeak` (TTS)
-- [Install gstreamer with pocketsphinx](https://github.com/roboticslab-uc3m/installation-guides/blob/master/install-gstreamer-pocketsphinx.md) for `speechRecognition.py` (ASR)
 
 ## Install the Software on Ubuntu (working on all tested versions)
 
 Our software integrates the previous dependencies. Note that you will be prompted for your password upon using `sudo` a couple of times:
 
 ```bash
-pip install --user pyalsaaudio # For `speechRecognition.py` (ASR)
 cd  # go home
 mkdir -p repos; cd repos  # create $HOME/repos if it doesn't exist; then, enter it
 git clone https://github.com/roboticslab-uc3m/speech.git  # Download speech software from the repository
@@ -31,15 +29,6 @@ echo "export ROBOTICSLAB_SPEECH_DIR=`pwd`" >> ~/.bashrc
 
 For additional SPEECH options use `ccmake` instead of `cmake`.
 
-## Troubleshooting installation
-
-For `pip install --user pyalsaaudio`, some users have had to:
-
-```bash
-sudo apt install python-gi # requirement on some systems for pyalsaaudio
-sudo apt install libasound2-dev # requirement on some systems for pyalsaaudio
-```
-
 ## Troubleshooting selecting default soundcard
 
 This is a way set default sound output card using PulseAudio (not ALSA).

diff --git a/programs/speechRecognition/README.md b/programs/speechRecognition/README.md
@@ -1,25 +1,45 @@
 # Speech recognition
 
-## How to launch
+## Installation and usage
 
-1. First, follow the steps described on [installation instructions](/doc/speech-install.md)
-2. Be sure you have a microphone connected to your computer.
-3. Configure the input device selecting the microphone. You can use the  default `sound settings` of Ubuntu and select the input device. This requires using the Ubuntu interface. If you want to configure it remotely (by ssh), you can use the `alsamixer` software.
-In that case, run `alsamixer` on the bash,  press `F6`, select your Sound Card (e.g HDA Intel PCH), press `F4` and select your `Input Source (Front Mic)`. You can turn the input level up/down too.
-4. Run `speechRecognition.py`. By default it uses the `follow-me-en-us.dic` dictionary.  If you want to: know, change or add new dictionary words, you can find them in: `speech/share/speechRecognition/dictionary/` directory.
-5. Try to say some orders of  `follow-me`  demo using the microphone and check if `speechRecognition` detects the words.
-6. The final result in lower case comes out through a yarp port. You can read from the output port writing `yarp read ... /speechRecognition:o`.
+This is a Python 3 application that requires the `sounddevice` package to grab live frames from a mic. Install it with:
 
-## How to configure it
+```bash
+pip install sounddevice
+```
+
+Depending on the selected backend, additional dependencies might be required (see below).
+
+Launch the program with `--help` to see available options. You can display and select the preferred input device with `--list-devices` and `--device`, respectively (otherwise the system default will be chosen).
+
+This application opens two YARP ports: an `<prefix>/rpc:s` port that allows to request a dictionary/model change and to mute/unmute the microphone, and a `<prefix>/result:o` port that broadcasts the transcribed text. The default prefix is `/speechRecognition`, but it can be changed with the `--prefix` option.
+
+## PocketSphinx backend
+
+Install the `pocketsphinx` package with:
+
+```bash
+pip install pocketsphinx
+```
 
-Once `speechRecognition.py` has started, connect it to the yarp configuration dictionary port and change the language to use.
-For example, if you want to change to waiter Spanish orders, put:
+Then, launch the program with the `--backend pocketsphinx --dictionary xxx --language xxx` options. The dictionary and language combo relies on the adequate dictionary and model files being installed (check [share/speechRecognition](/share/speechRecognition/)). For example, to use the waiter Spanish orders dictionary, put:
 
 ```bash
-yarp rpc /speechRecognition/rpc:s
-setDictionary waiter es
+speechRecognition --backend pocketsphinx --dictionary waiter --language es
 ```
 
-## Troubleshooting
+## Vosk (Kaldi) backend
+
+Install the `vosk` package with:
+
+```bash
+pip install vosk
+```
+
+Then, launch the program with the `--backend vosk --model xxx` options. Model files are downloaded on demand from the [Vosk website](https://alphacephei.com/vosk/models). For example, to use the ~50 MB Spanish model, put:
+
+```bash
+speechRecognition --backend vosk --model small-es-0.42
+```
 
-Some pointers on muting/unmuting the microphone have been collected in [#13](https://github.com/roboticslab-uc3m/speech/issues/13).
+To list and download the desired models offline and test the Vosk engine, you can use the `vosk-transcriber` application.