# Speech Understanding 
# Lecture 12: Text-to-Speech Synthesis

### Mark Hasegawa-Johnson, KCGI

1. <a href="#section1">Installing gTTs, the Google speech synthesizer</a>
1. <a href="#section2">Using gTTs</a>
1. <a href="#section3">Use SpeechRecognizer to check the output</a>
1. <a href="#section4">Create an electronic parrot</a>
1. <a href="#homework">Homework</a>


<a id='section1'></a>

## 1. Installing gTTs, the Google speech synthesizer

For speech synthesis, we will use Google's text-to-speech synthesis system (gTTs).  You need to be connected to the internet in order to use it. Documentation for gTTs is here: https://gtts.readthedocs.io/en/latest/ 

gTTs is installed like this (either in the window below, or in a terminal):

In [1]:
!pip install gTTs

Collecting gTTs
  Obtaining dependency information for gTTs from https://files.pythonhosted.org/packages/59/a8/e3434904445eacf03b857ac001755d8ffac49b4f3339d63592b4eda009dc/gTTS-2.5.1-py3-none-any.whl.metadata
  Downloading gTTS-2.5.1-py3-none-any.whl.metadata (4.1 kB)
Downloading gTTS-2.5.1-py3-none-any.whl (29 kB)
Installing collected packages: gTTs
Successfully installed gTTs-2.5.1


<a id="section2"></a>

## 2. Using gTTs

gTTs can't play the audio directly.  We need to create the audio output, save it to a file, and then play back the file.

In [2]:
import gtts, librosa, IPython

desired_text = "これは合成音声です"
tts = gtts.gTTS(text=desired_text, lang="ja")
tts.save("speech.mp3")
    
speech_wave, speech_rate = librosa.load("speech.mp3")
IPython.display.Audio(data=speech_wave, rate=speech_rate)

The `wb` modifier in `open` is important.  It specifies that the file is
* binary (`b`)
* writable (`w`)

<a id="section3"></a>

## 3. Use SpeechRecognizer to check the output

Often, in the real world, you need to generate synthetic speech prompts for a customer, but you don't have time to listen to all of them to make sure they're OK.

When that happens, you can use `SpeechRecognizer` to check each of the files automatically.  If `SpeechRecognizer` detects any mistake, you can check manually to see if it's OK.

First, we need to convert the file from `mp3` format to a format that `SpeechRecognizer` can handle.  `SpeechRecognizer` can handle wav and flac files.  We can use librosa to read in the mp3, then use soundfile to write it out, as shown [here](https://librosa.org/doc/main/ioformats.html#write-out-audio-files).

In [13]:
!pip install soundfile
!pip install SpeechRecognition
!pip install pyaudio

Collecting pyaudio
  Downloading PyAudio-0.2.14.tar.gz (47 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m47.1/47.1 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hBuilding wheels for collected packages: pyaudio
  Building wheel for pyaudio (pyproject.toml) ... [?25lerror
  [1;31merror[0m: [1msubprocess-exited-with-error[0m
  
  [31m×[0m [32mBuilding wheel for pyaudio [0m[1;32m([0m[32mpyproject.toml[0m[1;32m)[0m did not run successfully.
  [31m│[0m exit code: [1;36m1[0m
  [31m╰─>[0m [31m[18 lines of output][0m
  [31m   [0m running bdist_wheel
  [31m   [0m running build
  [31m   [0m running build_py
  [31m   [0m creating build
  [31m   [0m creating build/lib.macosx-11.1-arm64-cpython-311
  [31m   [0m creating build/lib.macosx-11.1-arm64-cpython-311/pyaudio
  

In [8]:
import librosa, soundfile
data, samplerate = librosa.load('speech.mp3')
soundfile.write('speech.wav',data,samplerate)

Now we can call `speech_recognition` to check the file content:

In [14]:
import speech_recognition

r = speech_recognition.Recognizer()

with speech_recognition.AudioFile("speech.wav") as source:
    audio = r.record(source)
    text = r.recognize_google(audio, language="ja")
    print('The person in this audio file said "%s"'%(text))
    if text == desired_text:
        print('This matches the desired text')
    else:
        print('This does not match the desired text, which was "%s"'%(desired_text))

The person in this audio file said "これは 合成音声です"
This does not match the desired text, which was "これは合成音声です"


<a id='section4'></a>

## 4. Create an electronic parrot

Now that we have both speech input and speech output, let's create our first speech app!

This app will just be an electronic parrot.  The electronic parrot will listen to what you say, and try to repeat it.

In [11]:
import speech_recognition, gtts, IPython, librosa

def electronic_parrot(L='en'):
    print("Hello!  I'm an electronic parrot 🦜.")
    print("Please say something, and I will try to repeat it.")
    
    r = speech_recognition.Recognizer()

    while True:
        with speech_recognition.Microphone() as source:
            print("Listening...")
            r.adjust_for_ambient_noise(source)
            try:
                audio = r.listen(source)
                text = r.recognize_google(audio, language=L)
            except speech_recognition.UnknownValueError:
                print('I did not understand that, I will try again')
                continue
            except sr.RequestError:
                print('Sorry, I could not reach the internet, I will try again')
                continue
            except sr.WaitTimeoutError:
                continue
            break
            
    print('I heard you say "%s", and now I will try to repeat it'%(text))
    gtts.gTTS(text='You said "%s"'%(text), lang="en").save("parrot.mp3")
    speech_wave, speech_rate = librosa.load("parrot.mp3")
    return IPython.display.Audio(data=speech_wave, rate=speech_rate)


In [15]:
electronic_parrot('en')

Hello!  I'm an electronic parrot 🦜.
Please say something, and I will try to repeat it.


AttributeError: Could not find PyAudio; check installation

<a id='homework'></a>

## Homework

Edit the text file called `homework12.py`.

This file should `def` a function called `synthesize`, with a signature as shown here:

In [3]:
import homework12, importlib
importlib.reload(homework12)
help(homework12.synthesize)

Help on function synthesize in module homework12:

synthesize(text, lang, filename)
    Use gtts.gTTs(text=text, lang=lang) to synthesize speech, then write it to filename.
    
    @params:
    text (str) - the text you want to synthesize
    lang (str) - the language in which you want to synthesize it
    filename (str) - the filename in which it should be saved



Test whether your code works by running the following block:

In [4]:
import homework12, soundfile, IPython, importlib
importlib.reload(homework12)

homework12.synthesize("This is speech synthesis!","en","english.mp3")
y, sr = soundfile.load("english.mp3")
IPython.display.Audio(data=y, rate=sr)

AttributeError: module 'soundfile' has no attribute 'load'

### Receiving your grade

In order to receive a grade for your homework, you need to:

1. Run the following code block on your machine.  The result may list some errors, and then in the very last line, it will show a score.  That score (between 0% and 100%) is the grade you have earned so far.  If you want to earn a higher grade, please continue editing `homework3.py`, and then run this code block again.
1. When you are happy with your score (e.g., when it reaches 100%), choose `File` $\Rightarrow$ `Save and Checkpoint`.  Then use `GitHub Desktop` to commit and push your changes.
1. Make sure that the 100% shows on your github repo on github.com.  If it doesn't, you will not receive credit.

In [10]:
import importlib, grade
importlib.reload(grade)

EEEE
ERROR: test_method_creates_correct_synthesis_english (grade.Test)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jhasegaw/Dropbox/mark/teaching/kcgi/intro_speech_understanding/2023_fall/lec12/grade.py", line 44, in test_method_creates_correct_synthesis_english
    self.synthesize("This is speech synthesis!","en","english.mp3")
  File "/Users/jhasegaw/Dropbox/mark/teaching/kcgi/intro_speech_understanding/2023_fall/lec12/grade.py", line 24, in synthesize
    self.homework12.synthesize(text, lang, filename)
  File "/Users/jhasegaw/Dropbox/mark/teaching/kcgi/intro_speech_understanding/2023_fall/lec12/homework12.py", line 12, in synthesize
    raise RuntimeError("FAIL! You need to change this function so that it works!")
RuntimeError: FAIL! You need to change this function so that it works!

ERROR: test_method_creates_correct_synthesis_spanish (grade.Test)
---------------------------------------------------------

0 successes out of 4 tests run
Score: 0%
0 successes out of 4 tests run
Score: 0%


<module 'grade' from '/Users/jhasegaw/Dropbox/mark/teaching/kcgi/intro_speech_understanding/2023_fall/lec12/grade.py'>