<a href="https://colab.research.google.com/github/leodenale/ColabExamples/blob/master/SpeechRecognitionAPI_AudioFile2Text.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Speech Recognition in Python using Google Speech API
## (WAV file to text test)

[Original article from Geeksforgeeks.org](https://www.geeksforgeeks.org/speech-recognition-in-python-using-google-speech-api/)

Speech Recognition is an important feature in several applications used such as home automation, artificial intelligence, etc. This article aims to provide an introduction on how to make use of the SpeechRecognition library of Python. This is useful as it can be used on microcontrollers such as Raspberri Pis with the help of an external microphone.

## Required Installations

The following must be installed:

1) **Python Speech Recognition module**:
 ```
 sudo pip install SpeechRecognition 
```

2) **PyAudio**: Use the following command for linux users
```
sudo apt-get install python-pyaudio python3-pyaudio
```

If the versions in the repositories are too old, install pyaudio using the following command
```
sudo apt-get install portaudio19-dev python-all-dev python3-all-dev && 
sudo pip install pyaudio
```
Use pip3 instead of pip for python3.
Windows users can install pyaudio by executing the following command in a terminal
```
pip install pyaudio
```

## Speech Input Using a Microphone and Translation of Speech to Text

1) **Configure Microphone (For external microphones)**: It is advisable to specify the microphone during the program to avoid any glitches.
Type **lsusb** in the terminal. A list of connected devices will show up. The microphone name would look like this
```
USB Device 0x46d:0x825: Audio (hw:1, 0)
```
Make a note of this as it will be used in the program.

2) **Set Chunk Size**: This basically involved specifying how many bytes of data we want to read at once. Typically, this value is specified in powers of 2 such as 1024 or 2048

3) **Set Sampling Rate**: Sampling rate defines how often values are recorded for processing

4) **Set Device ID to the selected microphone**: In this step, we specify the device ID of the microphone that we wish to use in order to avoid ambiguity in case there are multiple microphones. This also helps debug, in the sense that, while running the program, we will know whether the specified microphone is being recognized. During the program, we specify a parameter device_id. The program will say that device_id could not be found if the microphone is not recognized.

5) **Allow Adjusting for Ambient Noise**: Since the surrounding noise varies, we must allow the program a second or too to adjust the energy threshold of recording so it is adjusted according to the external noise level.

6) **Speech to text translation**: This is done with the help of Google Speech Recognition. This requires an active internet connection to work. However, there are certain offline Recognition systems such as PocketSphinx, but have a very rigorous installation process that requires several dependencies. Google Speech Recognition is one of the easiest to use.

The Above steps have been implemented below:

In [1]:
!pip3 install SpeechRecognition

Collecting SpeechRecognition
[?25l  Downloading https://files.pythonhosted.org/packages/26/e1/7f5678cd94ec1234269d23756dbdaa4c8cfaed973412f88ae8adf7893a50/SpeechRecognition-3.8.1-py2.py3-none-any.whl (32.8MB)
[K     |████████████████████████████████| 32.8MB 9.2MB/s 
[?25hInstalling collected packages: SpeechRecognition
Successfully installed SpeechRecognition-3.8.1


In [2]:
!apt install python-pyaudio python3-pyaudio

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following package was automatically installed and is no longer required:
  libnvidia-common-410
Use 'apt autoremove' to remove it.
The following additional packages will be installed:
  libportaudio2
Suggested packages:
  python-pyaudio-doc
The following NEW packages will be installed:
  libportaudio2 python-pyaudio python3-pyaudio
0 upgraded, 3 newly installed, 0 to remove and 16 not upgraded.
Need to get 113 kB of archives.
After this operation, 432 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 libportaudio2 amd64 19.6.0-1 [64.6 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic/universe amd64 python-pyaudio amd64 0.2.11-1build2 [24.1 kB]
Get:3 http://archive.ubuntu.com/ubuntu bionic/universe amd64 python3-pyaudio amd64 0.2.11-1build2 [24.2 kB]
Fetched 113 kB in 1s (125 kB/s)
Selecting previously unselected package libportaudio2:a

In [0]:
# Mounting Google Drive in Google Colab
from google.colab import drive
drive.mount('/mydrive')

In [11]:
# Navigating to project directory
cd ..

/


In [12]:
ls

[0m[01;34mbin[0m/      [01;34mdev[0m/   [01;34mlib32[0m/  [01;34mmydrive[0m/  [01;34mrun[0m/    [01;34msys[0m/                 [01;34musr[0m/
[01;34mboot[0m/     [01;34metc[0m/   [01;34mlib64[0m/  [01;34mopt[0m/      [01;34msbin[0m/   [01;34mtensorflow-2.0.0b1[0m/  [01;34mvar[0m/
[01;34mcontent[0m/  [01;34mhome[0m/  [01;34mmedia[0m/  [01;34mproc[0m/     [01;34msrv[0m/    [30;42mtmp[0m/
[01;34mdatalab[0m/  [01;34mlib[0m/   [01;34mmnt[0m/    [01;34mroot[0m/     [01;34mswift[0m/  [01;34mtools[0m/


In [13]:
cd mydrive/My Drive/Colab Notebooks/data

/mydrive/My Drive/Colab Notebooks/data


In [14]:
ls

speech1.wav  speech2.wav


In [16]:
# 
# Transcribe an Audio file to text
# 
# If we have an audio file that we want to translate to text, we simply have to replace the 
# source with the audio file instead of a microphone.
# Place the audio file and the program in the same folder for convenience. This works for WAV, 
# AIFF, of FLAC files.
# An implementation has been shown below
# 
# 
#Python 2.x program to transcribe an Audio file 
import speech_recognition as sr 
  
AUDIO_FILE = ("speech2.wav") 
  
# use the audio file as the audio source 
  
r = sr.Recognizer() 
  
with sr.AudioFile(AUDIO_FILE) as source: 
    #reads the audio file. Here we use record instead of 
    #listen 
    audio = r.record(source)   
  
try: 
    print("The audio file contains: " + r.recognize_google(audio)) 
  
except sr.UnknownValueError: 
    print("Google Speech Recognition could not understand audio") 
  
except sr.RequestError as e: 
    print("Could not request results from Google Speech Recognition service; {0}".format(e)) 



The audio file contains: the Birch canoe slid on the smooth planks glue the seat to the dark blue background it is easy to tell the depth of a well these days a chicken leg is a verb dish rice is often served in round Bowls the juice of lemons makes fine punch the box was the one beside the pump truck the Hogs are such hot corn and garbage 4 hours of study works
