<a href="https://colab.research.google.com/github/scgupta/yearn2learn/blob/master/speech/asr/python_speech_recognition_notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Speech Recognition in Python

There are several speech recognition systems with Python bindings or library. The [SpeechRecognition](https://pypi.org/project/SpeechRecognition/) package provide a nice abstraction over those. In this notebook we explore using CMU Sphinx (an offline model, i.e. running locally), and Google (an offline model, i.e. over the network/cloud), but through SpeechRecognition package APIs.

## Setup

We need to install SpeechRecognition and pocketsphinx python packages, and download some files to test these APIs.

1. **Install SpeechRecognition py package**

In [0]:
!pip3 install SpeechRecognition

2. **Install Pocketsphinx**
[Pocketsphinx](https://pypi.org/project/pocketsphinx/) is python bindings for [CMU Sphinx](https://cmusphinx.github.io/), and is one of the recognizer supported by SpeechRecognition.

On MacOS: (using homebew):

In [0]:
!brew install swig
!swig -version

On Linux:

In [0]:
!apt-get install -y swig libpulse-dev
!swig -version

Now pip install poocketsphinx

In [0]:
!pip3 install pocketsphinx
!pip3 list | grep pocketsphinx

3. **Download audio samples**

In [0]:
!curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.6.0/audio-0.6.0.tar.gz
!tar -xvzf audio-0.6.0.tar.gz

In [0]:
!ls -l ./audio/

# SpeechRecognition API

SpeechRecognition has only batch API. First step to create an audio record, eithher from a file or from mic, and the second step is to call `recognize_<speech engine name>()` function. It currently has APIs for [CMU Sphinx, Google, Microsoft, IBM, Houndify, and Wit](https://github.com/Uberi/speech_recognition).

1. **Import `speech_recognition` package, and create recognizer**

In [0]:
import speech_recognition as sr
print(sr.__version__)

2. **Speech-To-Text steps**

In [0]:
from enum import Enum, unique

@unique
class ASREngine(Enum):
    sphinx = 0
    google = 1

def speech_to_text(filename: str, engine: ASREngine, language: str, show_all: bool = False) -> str:
  r = sr.Recognizer()

  with sr.AudioFile(filename) as source:
    audio = r.record(source)

  asr_functions = {
      ASREngine.sphinx: r.recognize_sphinx,
      ASREngine.google: r.recognize_google,
  }

  response = asr_functions[engine](audio, language=language, show_all=show_all)

  return response

3. **Run various speech recognition engines**

In [0]:
testcases = [
  {
    'filename': 'audio/2830-3980-0043.wav',
    'text': 'experience proves this',
    'lang': 'en-US'
  },
  {
    'filename': 'audio/4507-16021-0012.wav',
    'text': 'why should one halt on the way',
    'lang': 'en-US'
  },
  {
    'filename': 'audio/8455-210777-0068.wav',
    'text': 'your power is sufficient i said',
    'lang': 'en-US'
  }
]

for t in testcases:
  filename = t['filename']
  text = t['text']
  lang = t['lang']

  print('audio file="{0}"    expected text="{1}"'.format(filename, text))
  for asr_engine in ASREngine:
    try:
      response = speech_to_text(filename, asr_engine, language=lang)
      print('{0}: "{1}"'.format(asr_engine.name, response))
    except sr.UnknownValueError:
      print('{0} could not understand audio'.format(asr_engine.name))
    except sr.RequestError as e:
      print('{0} error: {0}'.format(asr_engine.name, e))

Notice the difference in results from sniphix and google.

# Further Readings

For other speech recognition, you will need to create API credentials, which you have to pass to `recognize_<speech engine name>()`, you can checkout [this example](https://github.com/Uberi/speech_recognition/blob/master/examples/audio_transcribe.py).

It also has a nice abstraction for Microphone, implemented over [PyAudio](https://people.csail.mit.edu/hubert/pyaudio/) and [PortAudio](http://www.portaudio.com/). Here is an example to capture input from mic in [batch](https://github.com/Uberi/speech_recognition/blob/master/examples/microphone_recognition.py) and continously in [background](https://github.com/Uberi/speech_recognition/blob/master/examples/background_listening.py).