# Extracting Features

---

## Imports and Functions

In [1]:
import numpy as np
import pandas as pd
import librosa

  "class": algorithms.Blowfish,


In [2]:
SAMPLE_RATE = 22_050
DURATION = 6
SIGNAL_SHAPE = SAMPLE_RATE * DURATION

def collect_signal(file_name):
    file_dir = '../audiofiles/'
    file_path = file_dir + file_name
    
    try:
        signal = librosa.load(file_path, sr = SAMPLE_RATE, duration = DURATION)[0]
    except:
        signal = [0]
        
    padded_signal = pad_signal(signal)
    
    return padded_signal
    
    
def pad_signal(signal):
    if len(signal) != 1:
        num_missing = SIGNAL_SHAPE - len(signal)
        signal = np.pad(array = signal, pad_width = (0, num_missing), mode = 'constant')
    
    return signal


def to_mfcc(signal):
    mfcc = librosa.feature.mfcc(y = signal, n_mfcc = 20, n_fft = 2048, hop_length = 512)
    
    return mfcc


def to_melspec(signal):
    melspec = librosa.feature.melspectrogram(y = signal, n_mels = 128, n_fft = 2048, hop_length = 512)
    log_melspec = librosa.power_to_db(melspec)
    
    return log_melspec

---

## Generate Padded Signal

In [3]:
wav = pd.read_csv('../data/files_and_labels.csv')
wav.head()

Unnamed: 0,file_name,id,sound
0,f0003_0_cough.wav,f0003,0
1,f0003_0_laughter.wav,f0003,1
2,f0003_0_sigh.wav,f0003,2
3,f0003_0_sneeze.wav,f0003,3
4,f0003_0_sniff.wav,f0003,4


Our goal is to extract the Mel Frequency Cepstral Coefficients (MFCCs) and Mel Spectrograms from the audio files to feed into difference convolutional neural networks. Both MFCCs and Mel Spectrograms are generated from the signal data (the waveform represented as a list of $y$-values), so we must first extract the signal from each audio file. The other thing we must do is trim or pad the signal so they are all of the same length because the neural network will expect inputs of the same shape. The only way to guarantee this is if we make each signal the same shape.

Since six seconds seems like a sufficient amount of time for the audio to play the labeled sound, we will trim all audio data to six seconds. Any audio data that is shorter than six seconds will be padded with silence until it is six seconds long. The function `collect_signal` takes in the file name and does exactly this: extract the signal and trim or pad it accordingly.

In [4]:
# extact signal from each file
# then trim or pad to six seconds
wav['signal'] = wav['file_name'].apply(lambda file_name: collect_signal(file_name))

Below we see that there are 39 audio files that `librosa` failed to read. Even attempting to open these files in a generic media player fails to produce any results. It could the files are corrupted or somehow in the wrong format. Since this is such a small portion of our data, it feels safe to simply drop these from out dataset

In [6]:
# files that librosa failed to read
failed_to_read = wav[wav['signal'].str.len() == 1]
failed_to_read

Unnamed: 0,file_name,id,sound,signal
108,f0066_0_cough.wav,f0066,0,[0]
189,f0098_0_sneeze.wav,f0098,3,[0]
1524,f0593_1_cough.wav,f0593,0,[0]
1526,f0593_1_sigh.wav,f0593,2,[0]
1529,f0593_1_throatclearing.wav,f0593,5,[0]
6606,f2298_0_cough.wav,f2298,0,[0]
6607,f2298_0_laughter.wav,f2298,1,[0]
6608,f2298_0_sigh.wav,f2298,2,[0]
6609,f2298_0_sneeze.wav,f2298,3,[0]
6610,f2298_0_sniff.wav,f2298,4,[0]


In [7]:
failed_to_read.shape

(39, 4)

In [8]:
# drop audio files that were unreadable
# where signal = [0] and have length == 1
wav = wav[wav['signal'].str.len() != 1].copy()

## Generate MFCCs and Mel Spectrograms

In [20]:
mfccs = wav['signal'].apply(lambda signal: to_mfcc(signal))

In [21]:
mfccs = np.array(mfccs.to_list())
np.save(file = '../data/mfccs.npy', arr = mfccs)

In [10]:
melspecs = wav['signal'].apply(lambda signal: to_melspec(signal))

In [15]:
melspecs = np.array(melspecs.to_list())
np.save(file = '../data/melspecs.npy', arr = melspecs)

In [19]:
wav[['sound']].to_csv('../data/target.csv', index = False)