## TinyML keyword spotting helper functions
Some functions useful for handling audio data, used in TinyML keyword spotting exercises.

### Load audio files

As I was having some problems with the HTML audio recording code, I recorded my keyword samples with Audacity, 
and then upload them with this little snippet.

How to use: 

```python
audio, sr = get_audio('my_file.wav')
```

In [None]:
from scipy.io.wavfile import read as wav_read
def get_audio(file):
    '''return sample rate and audio data'''
    sr, audio = wav_read(file)
    return audio, sr

### WAV file converter

With Audacity or similar software, the default settings are often the CD quality, that is 44.1kHz stereo, or in some cases even 48kHz stereo 24/32 bit.  
This code snippet allows easy conversion to a lower (or upper) sample rate `resampled_sr` and/or mix down to `mono` channel, maintaining the bit resolution.

How to use:

```python
wave_converter('my_file1.wav', mix2mono = True)
'''output file "resampled_my_file1.wav", mono, 16000 Hz sample rate'''

wave_converter('my_file2.wav', resampled_sr = 8000, prefix = "small_")
'''output file "small_my_file2.wav", 8000 Hz sample rate, number of channels unchanged (e.g. a stereo file
will remain stereo)'''
```


In [None]:
import scipy.io.wavfile as wavfile
import librosa

def wave_converter(filename, resampled_sr=16000, prefix="resampled_", mix2mono = False):

    '''
    Resample WAV soundfile to a different sample rate.
    
        Input: original sound file
        Output: resampled sound file
        Parameters: 
            - name of the file to be converted, 
            - destination sample rate, default = 16kHz
            - prefix to identify resampled files
            - mix to mono channel, default = False (leave as-is)
            
        Notes: for simplicity it needs to be run in the folder with the files we are converting
    '''    
    
    resampled_file = prefix + filename 
    origin_sr, origin_data = wavfile.read(filename)
    origin_type = origin_data.dtype
    
    resampled_data = librosa.resample(origin_data.T.astype('float'), origin_sr, resampled_sr) # transpose array to librosa shape
    if mix2mono == True:
        resampled_data = librosa.to_mono(resampled_data)        
    resampled_data = resampled_data.T.astype(origin_type) # transpose back to scipy.io.wavfile shape
    
    wavfile.write(resampled_file, resampled_sr, resampled_data)
    
    # print('Resampled wavefile saved to {}'.format(resampled_file))

### Batch file conversion

In case you want to try training the model with lower sample rate (DESTINATION_SR), this helps processing the entire dataset folder (ORIGINAL_DIR), recreating the folder structure in another folder (DESTINATION_DIR) at the same level. 
The `wave_converter` function was modified to make batch processing easier.

When doing this in JupyterLab, it helps to increase the IOPub message rate (default at 1000 msg/s), by setting the config variable at launch:

```bash
jupyter lab --Notebook.App.iopub_msg_rate_limit=100000 
```



In [4]:
# set up the parameters

ORIGINAL_DIR = 'dataset' # this is where we store our subfolders with WAV files
PREFIX = '8k_' # prefix to distinguish the folder and also the processed files
DESTINATION_DIR = PREFIX+ORIGINAL_DIR
DESTINATION_SR = 8000 # desired sample rate in Hertz
FILE_TYPE = '.wav' # do not change this
MIX2MONO = False # if set to False, will leave files in their original format, either stereo or mono. Set to True will convert stereo files to mono

In [5]:
def wave_converter_v2(file_path, resampled_sr=16000, prefix="resampled_", mix2mono = False):

    '''
    Modified version of the wave_converter function, more suitable for batch processing.
    
    Resample WAV soundfile to a different sample rate.
    
        Input: original sound file path
        Output: resampled sound data, sample rate and file 
        name
        Parameters: 
            - name of the file to be converted, 
            - destination sample rate, default = 16kHz
            - prefix to identify resampled files
            - mix to mono channel, default = False (leave as-is)        
    '''    
       
    origin_sr, origin_data = wavfile.read(file_path)
    origin_type = origin_data.dtype
    
    filename = os.path.split(file_path)[1] # get the actual file name
    resampled_file = prefix + filename 
    
    resampled_data = librosa.resample(origin_data.T.astype('float'), origin_sr, resampled_sr) # transpose array to librosa shape
    if mix2mono == True:
        resampled_data = librosa.to_mono(resampled_data)        
    resampled_data = resampled_data.T.astype(origin_type) # transpose back to scipy.io.wavfile shape
    
    return resampled_file, resampled_sr, resampled_data
    

In [6]:
import scipy.io.wavfile as wavfile
import librosa
import os

file_counter = 0
folder_counter = 0

if not os.path.exists(DESTINATION_DIR): # check if destination folder exists
    os.mkdir(DESTINATION_DIR)
    print('created {} folder'.format(DESTINATION_DIR))

for folder in os.listdir(ORIGINAL_DIR):
        
    if os.path.isdir(os.path.join(ORIGINAL_DIR,folder)): # process only folders
        
        folder_counter += 1
        
        for file in os.listdir(os.path.join(ORIGINAL_DIR,folder)):
            
            file_path = os.path.join(ORIGINAL_DIR, folder, file)
            
            if os.path.isfile(file_path) and file.endswith(FILE_TYPE): # check if it's a wav file

                resampled_file, resampled_sr, resampled_data = wave_converter_v2(
                                                                file_path, 
                                                                resampled_sr=DESTINATION_SR, 
                                                                prefix=PREFIX, 
                                                                mix2mono = MIX2MONO)

                if not os.path.exists(os.path.join(DESTINATION_DIR, folder)): # create a subfolder if necessary
                    os.mkdir(os.path.join(DESTINATION_DIR, folder))

                resampled_file_path = os.path.join(DESTINATION_DIR, folder, resampled_file)

                wavfile.write(resampled_file_path, resampled_sr, resampled_data)

                file_counter += 1
                
                # print("Saved converted file {}".format(resampled_file_path))
    else:
        pass

print("Finished processing {} files in {} folders.".format(file_counter, folder_counter))

created 8k_dataset folder




Finished processing 105835 files in 38 folders.


### Batch inference testing

Helper function for testing your own audio samples with the model here:

https://github.com/tinyMLx/colabs/blob/master/3-5-13-PretrainedModel.ipynb


__Assumptions:__

* your samples are saved in a folder, and the file names follow a specific pattern `audio_a_cat01.wav`: 
    * `a` is the speaker identifier, 
    * `cat` is the actual keyword, 
    * `01` is the sample number, assuming we have multiple samples for each word.

* `run_tflite_inference_singleFile` function, `WANTED_WORDS` and `MODEL_TFLITE` variables are already defined.

* `run_tflite_inference_singleFile` function is modified to return the top_prediction_str and model_type variables:

```python
def run_tflite_inference_singleFile(tflite_model_path, custom_audio, sr_custom_audio, model_type="Float"):
  
  # (function code here) 
  #  
  # print('%s model guessed the value to be %s' % (model_type, top_prediction_str))

  return model_type, top_prediction_str # used in the batch script later
```

In [None]:
MODEL_TYPE = "Quantized"

SPEAKERS = {'a': 'Alice', 'b': 'Bob'}

SAMPLES_DIR = 'speech_custom'

import os
import re

for file in os.listdir(SAMPLES_DIR):
    if os.path.isdir(file): # skip folders, process files only 
        pass
    else:                        
        try:
            speaker, word, number = re.findall('custom_(\w)_([a-z]+)(\d+).wav', file)[0] # extract info                                    
        except:
            print("File name {} is not in a correct format".format(file))
            # pass
        sample_name = file[:-4] # create variable names
        sample_rate = 'sr_' + sample_name # create variable names
        globals()[sample_name], globals()[sample_rate] = get_audio(os.path.join(SAMPLES_DIR, file)) # define variables
        
        if word in [word for word in WANTED_WORDS.split(',')]: # WANTED_WORDS and MODEL_TFLITE defined earlier
    
            model_type, top_prediction_str = run_tflite_inference_singleFile(MODEL_TFLITE, globals()[sample_name], globals()[sample_rate], model_type=MODEL_TYPE)
        
            if top_prediction_str.upper() == word.upper():
                result = "CORRECTLY"
            else:
                result = "INCORRECTLY"
                        
            print('\nWord: {},\nSpeaker: {},\nSample number: {},\nFile: {}'.format(word.upper(), speakers[speaker], number, file))
        
            print("\n{} model guessed the value to be {}.".format(model_type, top_prediction_str.upper()))
        
            print('\nWord identified {}\n'.format(result))