<a href="https://colab.research.google.com/github/satuelisa/NLPF/blob/main/NLPF_10_P.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!apt-get install espeak
!apt-get install python3-pyaudio # on windows, pip install pyaudio

Reading package lists... Done
Building dependency tree       
Reading state information... Done
espeak is already the newest version (1.48.04+dfsg-5).
The following package was automatically installed and is no longer required:
  libnvidia-common-460
Use 'apt autoremove' to remove it.
0 upgraded, 0 newly installed, 0 to remove and 19 not upgraded.
Reading package lists... Done
Building dependency tree       
Reading state information... Done
python3-pyaudio is already the newest version (0.2.11-1build2).
The following package was automatically installed and is no longer required:
  libnvidia-common-460
Use 'apt autoremove' to remove it.
0 upgraded, 0 newly installed, 0 to remove and 19 not upgraded.


In [2]:
!pip install pyttsx3  # once per machine, all three
!pip install speechrecognition
!pip install ffmpeg-python

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


Let's try text to speech first although that is not our focus today.

In [3]:
import speech_recognition as sr
import pyttsx3
r = sr.Recognizer()
eng = pyttsx3.init()
message = 'Hello, world.'
target = 'hello.mp3' # a .wav would also work
eng.save_to_file(message, target)
eng.say(message) # not audible on colab
eng.runAndWait()

Since `say` does not result in anything audible on Colab, we save it to a file to use a workaround with `display`.

In [4]:
from IPython.display import Audio
from IPython.display import display
play = Audio(target, autoplay = True)
display(play)

Now, let's try out the Google backend for speech to text. Again, on Colab, `sr.Microphone()` would not access anything, so we need another workaround using `ffmpeg` and an [odd hack by Ricardo](https://ricardodeazambuja.com/deep_learning/2019/03/09/audio_and_video_google_colab/).

In [5]:
from IPython.display import HTML, Audio
from google.colab.output import eval_js
from base64 import b64decode
import numpy as np
from scipy.io.wavfile import read as wav_read # pip install scipy (colab has it by default)
import io
import ffmpeg

In [6]:
AUDIO_HTML = """
<script>
var my_div = document.createElement("DIV");
var my_p = document.createElement("P");
var my_btn = document.createElement("BUTTON");
var t = document.createTextNode("Press to start recording");

my_btn.appendChild(t);
//my_p.appendChild(my_btn);
my_div.appendChild(my_btn);
document.body.appendChild(my_div);

var base64data = 0;
var reader;
var recorder, gumStream;
var recordButton = my_btn;

var handleSuccess = function(stream) {
  gumStream = stream;
  var options = {
    //bitsPerSecond: 8000, //chrome seems to ignore, always 48k
    mimeType : 'audio/webm;codecs=opus'
    //mimeType : 'audio/webm;codecs=pcm'
  };            
  //recorder = new MediaRecorder(stream, options);
  recorder = new MediaRecorder(stream);
  recorder.ondataavailable = function(e) {            
    var url = URL.createObjectURL(e.data);
    var preview = document.createElement('audio');
    preview.controls = true;
    preview.src = url;
    document.body.appendChild(preview);

    reader = new FileReader();
    reader.readAsDataURL(e.data); 
    reader.onloadend = function() {
      base64data = reader.result;
      //console.log("Inside FileReader:" + base64data);
    }
  };
  recorder.start();
  };

recordButton.innerText = "Recording... press to stop";

navigator.mediaDevices.getUserMedia({audio: true}).then(handleSuccess);


function toggleRecording() {
  if (recorder && recorder.state == "recording") {
      recorder.stop();
      gumStream.getAudioTracks()[0].stop();
      recordButton.innerText = "Saving the recording... pls wait!"
  }
}

// https://stackoverflow.com/a/951057
function sleep(ms) {
  return new Promise(resolve => setTimeout(resolve, ms));
}

var data = new Promise(resolve=>{
//recordButton.addEventListener("click", toggleRecording);
recordButton.onclick = ()=>{
toggleRecording()

sleep(3000).then(() => {
  // wait three seconds for the data to be available...
  console.log("Inside data:" + base64data)
  resolve(base64data.toString())

});

}
});
      
</script>
"""

Remember to **allow the browser to use your microphone** or this will *not* work.

In [7]:
def get_audio():
  display(HTML(AUDIO_HTML))
  data = eval_js("data")
  binary = b64decode(data.split(',')[1])
  
  process = (ffmpeg
    .input('pipe:0')
    .output('pipe:1', format='wav')
    .run_async(pipe_stdin=True, pipe_stdout=True, pipe_stderr=True, quiet=True, overwrite_output=True)
  )
  output, err = process.communicate(input=binary)
  
  riff_chunk_size = len(output) - 8
  # Break up the chunk size into four bytes, held in b.
  q = riff_chunk_size
  b = []
  for i in range(4):
      q, r = divmod(q, 256)
      b.append(r)

  # Replace bytes 4:8 in proc.stdout with the actual size of the RIFF chunk.
  riff = output[:4] + bytes(b) + output[8:]

  rate, audio = wav_read(io.BytesIO(riff))

  return audio, rate

In [11]:
audio, rate = get_audio() # talk right after you execute this cell
from scipy.io.wavfile import write
output = 'output.wav'
write(output, rate, audio)

Did the file turn out fine?

In [12]:
from IPython.display import Audio
from IPython.display import display
play = Audio(output, autoplay = True)
display(play)

Can we convert this to text?

In [13]:
with sr.AudioFile(output) as src:
  rec = r.record(src)
try:
  print(r.recognize_google(rec)) # you could also try the sphinx backend
except sr.UnknownValueError:
  print('It did not understand you')
except sr.RequestError:
  print('Nope', e)

testing now saying something's


If that is what you said, then it's a small win.

So, that is how we can **use** text to speech and speech to text. The final weekly reflection forces us into the territory of *how* does it *work*?