# SPEECH RECOGNITION TUTORIAL

## OVERVIEW

### HOW SPEECH RECOGNITION WORKS?

##### Speech recognition systems have come a long way since their early days in the 1950s. They can now recognize speech from multiple speakers with large vocabularies. The process involves converting speech into digital data, using techniques like Hidden Markov Models (HMM) to transcribe audio into text. Neural networks and voice activity detectors are used to simplify and optimize the speech recognition process. 

## Installing Speech Recognition

In [None]:
%%cmd
pip install SpeechRecognition

UsageError: Cell magic `%%cmd` not found.


UsageError: Cell magic `%%cmd` not found.


## Verifying the installation

In [None]:
import speech_recognition as sr
sr.__version__


'3.10.0'

## WORKING WITH THE AUDIO FILES

### Supported File Types

##### - WAV: must be in PCM/LPCM format
##### - AIFF
##### - AIFF-C
##### - FLAC: must be native FLAC format; OGG-FLAC is not supported

#### First we downloaded an audio file and save it to the same directory, named "Example audio file.wav". Here we are going to use Recognizer class. 


In [10]:
#Using record() to Capture Data From a File

r = sr.Recognizer()

Example_audio_file = sr.AudioFile('Example audio file.wav')
with Example_audio_file as source:
   audio = r.record(source)

#Checking the type of audio

type(audio)

speech_recognition.audio.AudioData

In [11]:
#Recognizing speech in the audio

r.recognize_google(audio)

'does smallpox not a hole in the sock the fish twisted and turned on the bent hook press the pants and sew a button on the vest the Swan Dive was Far short of perfect the beauty of the view stunned the young boy two blue fish swim in the tank her purse was full of useless trash the cult reared and through the tall rider is snowed rain and hail the same morning read verse out loud for pleasure'

In [12]:
#Capturing Segments With offset and duration

with Example_audio_file as source:
    audio1 = r.record(source, duration=4)
    audio2 = r.record(source, duration=4)

r.recognize_google(audio1)
r.recognize_google(audio2)

#NOTE - The record() method, when used inside a with block, always moves ahead in the file stream. This means that if you record once for four seconds and then record again for four seconds, the second time returns the four seconds of audio after the first four seconds.


'twisted and turned on the bent hook'

#### The record() method, with the offset keyword argument, allows specifying a specific starting point in seconds to ignore from the beginning of the file before starting the recording.

In [13]:
#To capture only the second phrase in the file, you could start with an offset of four seconds and record for, say, three seconds.

with Example_audio_file as source:
    r.adjust_for_ambient_noise(source)
#Using above method to reduce noise in the audio file

    audio = r.record(source, offset=4, duration=3)

r.recognize_google(audio)

#NOTE - Use offset and duration carefully to avoid inaccurate transcriptions.

'twisted and turned on the bed'

## WORKING WITH MICROPHONES

##### In the above sessions we used only Recognizer class but now we are even going to use The Microphone class

### INSTALLING PyAudio

In [14]:
%%cmd
pip install pyaudio

UsageError: Cell magic `%%cmd` not found.


In [None]:
#For systems without a default microphone, specify the desired device index by calling

sr.Microphone.list_microphone_names()

#You can choose which one to use by supplying a device index. For Eg. mic = sr.Microphone(device_index=3)

['Microsoft Sound Mapper - Input',
 'Microphone Array (Realtek(R) Au',
 'Microsoft Sound Mapper - Output',
 'Speakers (Realtek(R) Audio)',
 'Primary Sound Capture Driver',
 'Microphone Array (Realtek(R) Audio)',
 'Primary Sound Driver',
 'Speakers (Realtek(R) Audio)',
 'Speakers (Realtek(R) Audio)',
 'Microphone Array (Realtek(R) Audio)',
 'Microphone Array (Realtek HD Audio Mic Array input)',
 'Headphones 1 (Realtek HD Audio 2nd output with HAP)',
 'Headphones 2 (Realtek HD Audio 2nd output with HAP)',
 'PC Speaker (Realtek HD Audio 2nd output with HAP)',
 'Stereo Mix (Realtek HD Audio Stereo input)',
 'Speakers 1 (Realtek HD Audio output with HAP)',
 'Speakers 2 (Realtek HD Audio output with HAP)',
 'PC Speaker (Realtek HD Audio output with HAP)',
 'Microphone (Realtek HD Audio Mic input)']

In [None]:
#Using listen() to Capture Microphone Input

import speech_recognition as sr
r = sr.Recognizer()
mic = sr.Microphone()
with mic as source:
    r.adjust_for_ambient_noise(source)
    audio = r.listen(source)
    
r.recognize_google(audio)

'hello my name is Harsh'

In [None]:
#If the Speech is unrecognizable then it will give UnknownValueError.

import speech_recognition as sr
r = sr.Recognizer()
mic = sr.Microphone()
with mic as source:
    r.adjust_for_ambient_noise(source)
    audio = r.listen(source)
    
r.recognize_google(audio)

UnknownValueError: 

# Speech Recognition Model

## Speech Recognition in English

In [None]:
import speech_recognition as sr

# Initialize the recognizer
r = sr.Recognizer()

# Record audio from the microphone
mic = sr.Microphone()
with mic as source:
    r.adjust_for_ambient_noise(source)
    print("Listening...")
    audio = r.listen(source)

# Perform speech recognition    
try:
    text = r.recognize_google(audio)
    print("You said:", text)
except sr.UnknownValueError:
    print("Speech recognition could not understand audio")
except sr.RequestError as e:
    print("Could not request results from speech recognition service; {0}".format(e))

Listening...
You said: my name is Harsh and I am a student in D Y Patil International University of BTech 3rd year


In [None]:
#If you want to export the result then use
with open('my_speech.txt',mode = 'w') as file:
    file.write(text)
    
print("It has stored speech into text in my_speech.txt file")        

It has stored speech into text in my_speech.txt file


## Speech Recognition in Hindi

In [None]:
import speech_recognition as sr

# Initialize the recognizer
r = sr.Recognizer()

# Record audio from the microphone
mic = sr.Microphone()
with mic as source:
    r.adjust_for_ambient_noise(source)
    print("Listening...")
    audio = r.listen(source)

# Perform speech recognition    
try:
    text = r.recognize_google(audio, language='hi-IN')
    print("You said:", text)
except sr.UnknownValueError:
    print("Speech recognition could not understand audio")
except sr.RequestError as e:
    print("Could not request results from speech recognition service; {0}".format(e))

# export the result

with open('my_speech_Hindi.txt',mode = 'w', encoding='utf-8') as file:
    file.write(text)
#in the above line encoding parameter is set to 'utf-8' to handle the Unicode characters.     

print("It has stored speech into text in my_speech_Hindi.txt file")


Listening...
You said: मेरा नाम हर्ष है और मैं डीवाई पाटिल अंतरराष्ट्रीय यूनिवर्सिटी में पढ़ता हूं
It has stored speech into text in my_speech_Hindi.txt file


## Speech Recognition in both languages (Hindi and English)

In [None]:
import speech_recognition as sr

# Initialize the recognizer
r = sr.Recognizer()

# Record audio from the microphone
mic = sr.Microphone()
with mic as source:
    r.adjust_for_ambient_noise(source)
    print("Listening...")
    audio = r.listen(source)

# Perform speech recognition    
try:
    text_hi = r.recognize_google(audio, language='hi-IN')
    print("You said:", text_hi)
    
    text_en = r.recognize_google(audio, language='en-US')
    print("You said:", text_en)
    
except sr.UnknownValueError:
    print("Speech recognition could not understand audio")
except sr.RequestError as e:
    print("Could not request results from speech recognition service; {0}".format(e))

# export the result
with open('my_speech_hindi_english_1.txt',mode = 'w', encoding='utf-8') as file:
    file.write(text_hi)
    
print("It has stored speech into text in my_speech_hindi_english_1.txt file")

with open('my_speech_hindi_english_2',mode = 'w', encoding='utf-8') as file:
    file.write(text_en)
    
print("It has stored speech into text in my_speech_hindi_english_2.txt file")

Listening...
You said: आई एम ए स्टूडेंट इन डी पाटील इंटरनेशनल यूनिवर्सिटी
You said: I am a student in D Y Patil International University
It has stored speech into text in my_speech_hindi_english_1.txt file
It has stored speech into text in my_speech_hindi_english_2.txt file


## Link used to download audio
http://www.voiptroubleshooter.com/open_speech/american.html