# Recording Studio

This notebook is intended to record, preprocess and save the audios that will be later used by pyramidman assistant. It will make use of the speech recognizing as well for practical purposes, but theses will not be explained in this notebook, but rather in the third one.

This notebook focuses on making a proper listener in another thread that writes the audio data into a Queue that is later consumed by the main thread and in having a simple Recording Studio for making the audios for pyramidman

In [2]:
%load_ext autoreload
%autoreload 2

from pyramidman.audio_parameters import AudioParameters
from pyramidman.basic_audio_IO import play_audio, record_audio
from pyramidman.audio_utils import get_available_microphones, get_sysdefault_microphone_index, get_all_devices_str
from pyramidman.queue_utils import record_with_queue
from pyramidman.unwrapper import unwrap
from pyramidman.speech_recognizing import recognize_speech_from_mic
from pyramidman.hieroglyph import plot_timeseries_range_slider, create_tabs, plot_spectrogram
from pyramidman.hieroglyph import add_word_annotations

from pyramidman.Ihy import get_audio_menu_wav_file
from pyramidman.signal_processing import get_spectrogram

from pyramidman.queue_utils import put_data_in_queue_closure, listen_in_a_thread
from pyramidman.audio_utils import calibrate_microphone, sample_noise

from pyramidman.utils import get_folder_files
import speech_recognition as sr
from pyramidman.deepspeech_tools import transcribe, DeepSpeechArgs

import plotly
import time

%matplotlib qt
import numpy as np
import matplotlib.pyplot as plt

import plotly.graph_objs as go
from IPython.display import display

import matplotlib.pyplot as plt
from scipy import signal
from scipy.io import wavfile
import ipywidgets as widgets

from queue import Queue
import noisereduce as nr
import librosa

from pyramidman.noisereduce_optimized import reduce_noise_optimized
from pyramidman.noisereduce_optimized import noise_STFT_and_statistics
from pyramidman.noisereduce_optimized import reduce_noise_optimized_closure
from pyramidman.signal_processing import butter_highpass_filter


Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)



# 1. Instantiate and calibrate microphone

Ideally, we would like a background process in a thread that whenever a sentence is finished, it is translated and plotted. This is the following code.

In [3]:
audio_params = AudioParameters()
audio_params.set_sysdefault_microphone_index()
audio_params.set_default_input_parameters()

In [4]:
audio_params.input_device_index = 6
audio_params.sample_rate = 48000

In [5]:
mic = audio_params.get_microphone()
r = sr.Recognizer()

calibrate_microphone(mic, r, duration = 1, warmup_duration = 3)

Calibrating microphone for 1 seconds.
Calibrated energy threshold:  5182.9281012967


# 2. Reduce noise and trim signal

In this section we show some of the functions used in order to reduce the noise of a recorded signal. The goal is mainly didactic, later in the Recording studio, we can play more with these functions to find the optimal preprocessing of the signal.

## 2.1 Record sample audio

In [6]:
recorded_filepath = "../audios/temp/recording.wav"
record_audio(audio_params, seconds = 5, filename = recorded_filepath)

Recording
Finished recording


In [7]:
play_audio(audio_params, recorded_filepath)

## 2.2 Reduce noise

We can apply several filters and advanced techniques. In this case we mainly used an advanced library.

In [8]:
noise_data = sample_noise(audio_params,r, mic, duration = 2, warmup = 2)

Load the audio data

In [9]:
rate, data = wavfile.read(recorded_filepath)
data = data.astype(float)

### Perform the noise reduction

In [10]:
reduced_noise = reduce_noise_optimized(audio_clip=data, noise_clip=noise_data)
reduced_noise = reduced_noise.astype(np.int16)

In [11]:
recorded_filepath_noise_reduced = "../audios/temp/recording_reduced.wav"
wavfile.write(recorded_filepath_noise_reduced, audio_params.sample_rate, reduced_noise)

### Closurized version

So that it is callable with one the data.

In [12]:
noise_params_list = noise_STFT_and_statistics(noise_data)
noise_stft, noise_stft_db, mean_freq_noise, std_freq_noise, noise_thresh = noise_params_list

reduce_noise_optimized_closurized = reduce_noise_optimized_closure(noise_data)

reduce_noise_optimized_closurized(data)



## 2.3 Trim signal

Mainly at the beggining and end of the signal in order to reduce the length of the signal given later to the transcriber. It is mainly based on detecting when the power of the signal is below a given threshold. More advanced approaches based on the power at given frequencies could be used.

In [13]:
audio_noise_reduced_and_trimmed, index = librosa.effects.trim(reduced_noise.astype(float),top_db=20, ref=np.max, frame_length=512*4, hop_length=256*4)

In [14]:
recorded_filepath_noise_reduced_and_trimmed = "../audios/temp/recording_reduced_and_trimmed.wav"
wavfile.write(recorded_filepath_noise_reduced_and_trimmed, audio_params.sample_rate, audio_noise_reduced_and_trimmed.astype(np.int16))

### We also just trim the original signal for comparison

In [15]:
audio_trimmed, index = librosa.effects.trim(data.astype(float),top_db=20, ref=np.max, frame_length=512*4, hop_length=256*4)

recorded_filepath_trimmed = "../audios/temp/recording_trimmed.wav"
wavfile.write(recorded_filepath_trimmed, audio_params.sample_rate, audio_trimmed.astype(np.int16))


In [16]:
tabs_1 = get_audio_menu_wav_file(recorded_filepath)
tabs_2 = get_audio_menu_wav_file(recorded_filepath_noise_reduced)
tabs_3 = get_audio_menu_wav_file(recorded_filepath_trimmed)
tabs_4 = get_audio_menu_wav_file(recorded_filepath_noise_reduced_and_trimmed)

tabs = create_tabs([tabs_1, tabs_2, tabs_3, tabs_4], ["Original", "Filtered","Trimmed", "Filtered and Trimmered"])
display(tabs)

Tab(children=(Tab(children=(FigureWidget({
    'data': [{'line': {'color': 'deepskyblue'},
              'name…

## 2.4 Transcribe to know which approach is better

In [17]:
args = DeepSpeechArgs()

In [18]:
play_audio(audio_params, recorded_filepath)
transcribe(args, recorded_filepath)["sentence"]

'i'

In [19]:
play_audio(audio_params, recorded_filepath_noise_reduced)
transcribe(args, recorded_filepath_noise_reduced)["sentence"]

'i'

In [20]:
play_audio(audio_params, recorded_filepath_trimmed)
transcribe(args, recorded_filepath_trimmed)["sentence"]

'i'

In [21]:
play_audio(audio_params, recorded_filepath_noise_reduced_and_trimmed)
transcribe(args, recorded_filepath_noise_reduced_and_trimmed)["sentence"]

'i'

## Low pass filter to remove the fucking low frequency components noise

In [38]:
recorded_filepath = "../audios/temp/recording_filter_noise.wav"
record_audio(audio_params, seconds = 3, filename = recorded_filepath)


Recording
Finished recording


In [40]:
rate, data = wavfile.read(recorded_filepath)
data = data.astype(float)
reduced_noise = butter_highpass_filter(data, cutoff = 100, fs = 48000, order=5)
reduced_noise = reduced_noise.astype(np.int16)

recorded_filepath_noise_reduced = "../audios/temp/recording_filter_noise_reduced.wav"
wavfile.write(recorded_filepath_noise_reduced, audio_params.sample_rate, reduced_noise)

In [41]:
tabs_1 = get_audio_menu_wav_file(recorded_filepath)
tabs_2 = get_audio_menu_wav_file(recorded_filepath_noise_reduced)

tabs = create_tabs([tabs_1, tabs_2], ["Original", "Filtered"])
display(tabs)

Tab(children=(Tab(children=(FigureWidget({
    'data': [{'line': {'color': 'deepskyblue'},
              'name…

In [42]:

play_audio(audio_params, recorded_filepath)
play_audio(audio_params, recorded_filepath_noise_reduced)

# 3. Recording studio.

We have created a simple plotly UI to record and save the audios for the pyramidman assistant. This can be reused in the future for extension of capabilities.

In [26]:
## Global variables
mic = audio_params.get_microphone()
r = sr.Recognizer()

# Box with the recordings to show.
figure_box = widgets.Box([go.FigureWidget()], layout = {"width":"70%", "height":"600px"})

######### Panel widgets 

## Recording menu
recording_title_output = widgets.Output()
with recording_title_output:
    print("Recording options:")
duration_input = widgets.FloatText(value=4, description='Duration:', disabled=False)
offset_input = widgets.FloatText(value=1, description='Offset:', disabled=False)
button_record = widgets.Button(value=False, description='Start', button_style='', icon='check')
saving_file_name = widgets.Text(value="example.wav", description='Save file:', disabled=False)

## Files menu
file_title_output = widgets.Output()
with file_title_output:
    print("File options:")
folder_input = widgets.Text(value="../audios/temp/", description='Folder:', disabled=False, layout={"width":"200px","!padding-left":"0px"})
files_in_folder = get_folder_files(folder_input.value) 
files_dropdown = widgets.Dropdown(options= files_in_folder ,value=files_in_folder[0],description='',disabled=False, layout = {"width":"200px", "text-align":"center"})

button_play = widgets.Button(value=False, description='Play', button_style='', icon='check', layout = {"width":"50%"})
button_plot = widgets.Button(value=False, description='Plot', button_style='', icon='check', layout = {"width":"50%"})

## Preprocessing menu
processing_output = widgets.Output()
with processing_output:
    print("Processing options:")
noisy_audio_input = widgets.Text(value="noise.wav", description='Noise:', disabled=False)
reduce_noise_button = widgets.Button(value=False, description='Reduce noise', button_style='', icon='check')

ngrad_freq_input = widgets.IntText(value=2, description='ngrad_freq:', disabled=False, layout = {"width":"150px"})
ngrad_time_input = widgets.IntText(value=4, description='ngrad_time:', disabled=False, layout = {"width":"150px"})

fft_length_input = widgets.IntText(value=2048, description='fft_length:', disabled=False, layout = {"width":"150px"})
hop_length_input = widgets.IntText(value=512, description='hop_length:', disabled=False, layout = {"width":"150px"})

n_std_thresh_input = widgets.FloatText(value=1.0, description='n_std_thresh:', disabled=False,layout = {"width":"150px"})
prop_decrease_input = widgets.FloatText(value=0.8, description='prop_decrease:', disabled=False,layout = {"width":"150px"})

ngrad_filter_box = widgets.HBox([ngrad_freq_input, ngrad_time_input])
windows_length_box =  widgets.HBox([fft_length_input, hop_length_input])
threshold_prop_box = widgets.HBox([n_std_thresh_input,prop_decrease_input ])
# Create main Box 
play_plot_buttons_box = widgets.HBox([button_play, button_plot])
recording_box = widgets.VBox([recording_title_output,saving_file_name,duration_input,offset_input, button_record], layout={'border': '1px solid black'})
folder_box = widgets.VBox([file_title_output, folder_input, files_dropdown, play_plot_buttons_box],  layout={'border': '1px solid black'})
processing_box = widgets.VBox([processing_output,ngrad_filter_box,windows_length_box,threshold_prop_box, noisy_audio_input,reduce_noise_button],  layout={'border': '1px solid black'})

panel_box = widgets.VBox([folder_box, recording_box,processing_box])
recorder_box = widgets.HBox([panel_box, figure_box])

# Callback functions
def selected_filename():
    return folder_input.value + files_dropdown.value

def selected_noisy_filename():
    return folder_input.value + files_dropdown.value

def reduce_noise_callback(button):
    rate, audio_data = wavfile.read( selected_filename())
    rate, noisy_data = wavfile.read(selected_noisy_filename())
    audio_data = audio_data.astype(float)
    noisy_data = noisy_data.astype(float)

    reduced_noise = nr.reduce_noise(audio_clip = audio_data, noise_clip = noisy_data,
                                    n_grad_freq=ngrad_freq_input.value, n_grad_time=ngrad_time_input.value,
                                    n_fft=fft_length_input.value, win_length=fft_length_input.value, hop_length=hop_length_input.value,
                                    n_std_thresh=n_std_thresh_input.value, prop_decrease= prop_decrease_input.value,
                                    pad_clipping=True, verbose = False)
    
    reduced_noise = reduced_noise.astype(np.int16)
    
    reduced_noise_filename =  selected_filename().split(".wav")[0] + "_rn.wav"
    wavfile.write(reduced_noise_filename, audio_params.sample_rate, reduced_noise)
    
    files_in_folder = get_folder_files(folder_input.value) 
    files_dropdown.options= files_in_folder
    files_dropdown.value= reduced_noise_filename.split("/")[-1]
        
def play_button_callback(button):
    play_audio(audio_params, selected_filename())

def plot_file_callback(button):
    figure_box.children = [get_audio_menu_wav_file( selected_filename())]

def folder_input_submit_callback(folder_input):
    files_in_folder = get_folder_files(folder_input.value) 
    files_dropdown.options= files_in_folder
    if len(files_in_folder)>0:
        files_dropdown.value=files_in_folder[0]
    else:
        files_dropdown.value = None
    
def record_button_callback(button):
    if button.description == "Start":
        button.description = "Recording"
        with mic as source:
            audio = r.record(source,duration = duration_input.value, offset = offset_input.value)
            
        with open(folder_input.value + saving_file_name.value, "wb") as f:
            f.write(audio.get_wav_data())
        
        button.description = "Start"
        files_in_folder = get_folder_files(folder_input.value) 
        files_dropdown.options = files_in_folder
        files_dropdown.value= saving_file_name.value
    

# Assign callback functions
button_record.on_click(record_button_callback)
button_play.on_click(play_button_callback)
button_plot.on_click(plot_file_callback)
folder_input.on_submit(folder_input_submit_callback)
reduce_noise_button.on_click(reduce_noise_callback)

# Display recorder.
plot_file_callback(None)
display(recorder_box)

HBox(children=(VBox(children=(VBox(children=(Output(), Text(value='../audios/temp/', description='Folder:', la…

FileNotFoundError: [Errno 2] No such file or directory: '../audios/meeting_facilitator/'

TraitError: Invalid selection: value not found

TypeError: can only concatenate str (not "NoneType") to str

# 4. Listen in background

Create a thread that records in the background and puts the sentences read into queue that has as input the 

## 4.1 Tuning the parameters of the listening.

### Initializing data

We initialize a queue where the recorded sentences will be added from the recordings taken in another thread.
We give that queue to a closure that will return a function that stores the recordings in such queue.

In [22]:
recordings_queue = Queue()

put_data_in_queue_callback = put_data_in_queue_closure(recordings_queue)

## We start listening

The listen in thread function will call the listen() function in another thread and then apply the put_audio_data_in_queue_callback() to every sentence. So the sentences will be stored in the queue, to be processed later.

In [23]:
# Stop listening will stop the thread
stop_listening = listen_in_a_thread(r, mic, put_data_in_queue_callback, phrase_time_limit = 10)

### Consume the audios put in the queue

In [24]:
args = DeepSpeechArgs()
transcriber = lambda x:  transcribe(args, x)

def transcribe_queue(q, transcriber, folder_recordings = '../audios/temp/'):
    while(True):
        i = 0
        audio = q.get()
        filename_audio = f'{folder_recordings}{i}.wav'
        with open(filename_audio, "wb") as f:
            f.write(audio.get_wav_data())

        print("Transcribing...: ", end="")
        metadata = transcribe(args, filename_audio)
        sentence = metadata["sentence"]
        print(sentence)
        i+=1 

transcribe_queue(recordings_queue, transcriber)

Transcribing...: 

Loading model from file ../models/deepspeech/deepspeech-0.6.0-models/output_graph.pbmm
Loaded model in 0.0233s.
Loading language model from files ../models/deepspeech/deepspeech-0.6.0-models/lm.binary ../models/deepspeech/deepspeech-0.6.0-models/trie
Loaded language model in 0.000511s.
Running inference.
Inference took 4.725s for 30.464s audio file.


but peritonitis in hell we are evil jester and one
Transcribing...: 

Loading model from file ../models/deepspeech/deepspeech-0.6.0-models/output_graph.pbmm
Loaded model in 0.0183s.
Loading language model from files ../models/deepspeech/deepspeech-0.6.0-models/lm.binary ../models/deepspeech/deepspeech-0.6.0-models/trie
Loaded language model in 0.000966s.
Running inference.
Inference took 4.909s for 30.464s audio file.


as a being that the way i see waterskin then for he is there i guess i lassiter as rose
Transcribing...: 

Loading model from file ../models/deepspeech/deepspeech-0.6.0-models/output_graph.pbmm
Loaded model in 0.0201s.
Loading language model from files ../models/deepspeech/deepspeech-0.6.0-models/lm.binary ../models/deepspeech/deepspeech-0.6.0-models/trie
Loaded language model in 0.00023s.
Running inference.
Inference took 4.699s for 30.464s audio file.


the spectre geistlicher my fine his banking eleanor there theayter
Transcribing...: 

Loading model from file ../models/deepspeech/deepspeech-0.6.0-models/output_graph.pbmm
Loaded model in 0.0186s.
Loading language model from files ../models/deepspeech/deepspeech-0.6.0-models/lm.binary ../models/deepspeech/deepspeech-0.6.0-models/trie
Loaded language model in 0.000752s.
Running inference.
Inference took 4.868s for 30.464s audio file.


the utes pedometer is to latimer active my gratitude by
Transcribing...: 

Loading model from file ../models/deepspeech/deepspeech-0.6.0-models/output_graph.pbmm
Loaded model in 0.0181s.
Loading language model from files ../models/deepspeech/deepspeech-0.6.0-models/lm.binary ../models/deepspeech/deepspeech-0.6.0-models/trie
Loaded language model in 0.000522s.
Running inference.
Inference took 4.599s for 30.464s audio file.


as the aristote ordainer this coat carelesness
Transcribing...: 

Loading model from file ../models/deepspeech/deepspeech-0.6.0-models/output_graph.pbmm
Loaded model in 0.0239s.
Loading language model from files ../models/deepspeech/deepspeech-0.6.0-models/lm.binary ../models/deepspeech/deepspeech-0.6.0-models/trie
Loaded language model in 0.000992s.
Running inference.
Inference took 4.545s for 30.464s audio file.


as no
Transcribing...: 

Loading model from file ../models/deepspeech/deepspeech-0.6.0-models/output_graph.pbmm
Loaded model in 0.0065s.
Loading language model from files ../models/deepspeech/deepspeech-0.6.0-models/lm.binary ../models/deepspeech/deepspeech-0.6.0-models/trie
Loaded language model in 0.000155s.
Running inference.
Inference took 4.683s for 30.464s audio file.


were the fashion of the estate it is it
Transcribing...: 

Loading model from file ../models/deepspeech/deepspeech-0.6.0-models/output_graph.pbmm
Loaded model in 0.0182s.
Loading language model from files ../models/deepspeech/deepspeech-0.6.0-models/lm.binary ../models/deepspeech/deepspeech-0.6.0-models/trie
Loaded language model in 0.000563s.
Running inference.
Inference took 4.640s for 30.464s audio file.


but the ever theater amaravati beloeil iblees
Transcribing...: 

Loading model from file ../models/deepspeech/deepspeech-0.6.0-models/output_graph.pbmm
Loaded model in 0.0215s.
Loading language model from files ../models/deepspeech/deepspeech-0.6.0-models/lm.binary ../models/deepspeech/deepspeech-0.6.0-models/trie
Loaded language model in 0.00101s.
Running inference.
Inference took 4.683s for 30.464s audio file.


then there was like over the guests i see ye there the wonderstone tones
Transcribing...: 

Loading model from file ../models/deepspeech/deepspeech-0.6.0-models/output_graph.pbmm
Loaded model in 0.0196s.
Loading language model from files ../models/deepspeech/deepspeech-0.6.0-models/lm.binary ../models/deepspeech/deepspeech-0.6.0-models/trie
Loaded language model in 0.000797s.
Running inference.
Inference took 4.715s for 30.464s audio file.


o god to let lily only clear maple
Transcribing...: 

Loading model from file ../models/deepspeech/deepspeech-0.6.0-models/output_graph.pbmm
Loaded model in 0.0213s.
Loading language model from files ../models/deepspeech/deepspeech-0.6.0-models/lm.binary ../models/deepspeech/deepspeech-0.6.0-models/trie
Loaded language model in 0.000853s.
Running inference.
Inference took 4.677s for 30.464s audio file.


desperate the onely that i not but live greatly a male nurse nation neologist ariege in his 
Transcribing...: 

Loading model from file ../models/deepspeech/deepspeech-0.6.0-models/output_graph.pbmm
Loaded model in 0.0189s.
Loading language model from files ../models/deepspeech/deepspeech-0.6.0-models/lm.binary ../models/deepspeech/deepspeech-0.6.0-models/trie
Loaded language model in 0.000902s.
Running inference.


KeyboardInterrupt: 

### Stop listening

In [25]:
# calling this function requests that the background listener stop listening
stop_listening(wait_for_stop=False)
recordings_queue.empty()

True