# Step 1: Generating spectrograms from audio files (and some cleaning)

This script creates spectrograms from a list of input audio files.
#### The following minimal structure and files are required in the project directory:

    ├── audio
    │   ├── call_1.wav     <- call files (Mono, not stereo sound!)
    │   ├── call_2.wav     <- 
    │   ├── call_3.wav     <- 
    │   └── ...            <- 
    ├── data               <- empty directory for output files      
        ├── info_file.csv  <- A ";"-separated csv file containing metadata about the calls

#### The following structure is required for info_file.csv:

(must contain at least the two columns: "filename" and "label"). If labels are completely unknown, there should still be a label column with some NA identifiers (e..g "unknown")

    | filename   | label   | ...    |  .... 
    -----------------------------------------
    | call_1.wav | alarm   |  ...   |  ....   
    | call_2.wav | contact |  ...   |  ....  
    | ...        |  ...    |  ...   |  ....   

#### The following files are generated in this script:

    ├── data               
    │   ├── df.pkl <- pickled pandas dataframe with metadata, raw_audio and spectrograms

## Import statements, constants and functions

In [5]:
import pandas as pd
import numpy as np
import pickle
import os
from pathlib import Path
import umap
import sys 
sys.path.insert(0, '..')

from functions.audio_functions import generate_mel_spectrogram, read_wavfile
from functions.preprocessing_functions import calc_zscore, pad_spectro

In [2]:
BANDPASS_FILTER = False  # should bandpass-filtered spectrograms be generated?
MEDIAN_SUB = False  # should median-subtracted spectrograms be generated (reduce impulse noise)?
STRETCH = False    # should time-stretched spectrograms be generated (all stretched to max. duration in dataset)?

LABEL_COL = "label"     # name of column that contains labels
NA_DESCRIPTORS = [0, np.nan, "NA", "na", "not available", # values that indicate "no label"
                  "None", "Unknown", "unknown", None, ""]     # add your NA descriptor if not yet in list
                                                     
NEW_NA_INDICATOR = "unknown" # all vocalizations without label will be relabelled as "unknown"

In [6]:
P_DIR = str(Path(os.getcwd()).parents[0])            # project directory
AUDIO_IN = os.path.join(os.path.sep, P_DIR, 'audio') # --> audio directory, contains audio (.wav) files
DATA = os.path.join(os.path.sep, P_DIR, 'data')      # --> empty data directory, output files will be put here


# Check if directories are present

if not os.path.isdir(AUDIO_IN):
    print("No audio directory found")

if not os.path.isdir(DATA):
    os.mkdir(DATA)

## 1. Read in files

In [7]:
df = pd.read_csv(os.path.join(os.path.sep, DATA, 'info_file.csv'), sep=";", index_col=[0])

In [8]:
df

Unnamed: 0,filename,label,indv,ori_label
26,HM_HMB_R11_AUDIO_file_4_(2017_08_23-06_44_59)_...,al,HMB,ALARM
27,HM_HMB_R11_AUDIO_file_4_(2017_08_23-06_44_59)_...,al,HMB,ALARM
28,HM_HMB_R11_AUDIO_file_4_(2017_08_23-06_44_59)_...,al,HMB,ALARM
29,HM_HMB_R11_AUDIO_file_4_(2017_08_23-06_44_59)_...,al,HMB,ALARM
30,HM_HMB_R11_AUDIO_file_4_(2017_08_23-06_44_59)_...,al,HMB,ALARM
...,...,...,...,...
18941,HM_VHMF001_HTB_R20_20190707-20190719_file_7_(2...,cc,VHMF001,cc $
13340,HM_RT_R12_file_5_(2017_08_24-06_44_59)_ASWMUX2...,cc,RT,CC
32998,HM_VHMM023_MBLS_R02_20190707-20190719_file_10_...,cc,VHMM023,cc
29707,HM_VHMM014_LSTB_R19_20190707-20190719_file_8_(...,cc,VHMM014,cc


#### Check if all audio files are in AUDIO_IN directory

In [38]:
audiofiles = df['filename'].values
files_in_audio_directory = os.listdir(AUDIO_IN)

In [39]:
missing_files = list(set(audiofiles) - set(files_in_audio_directory))
if len(missing_files)>0:
    print(len(missing_files), "files with no matching audio in audio folder: ", missing_files)

In [40]:
audio_filepaths = [os.path.join(os.path.sep, AUDIO_IN,x) for x in audiofiles]

## 2. Adding audio (and samplerate) to dataframe

In [41]:
raw_audio,samplerate_hz = map(list,zip(*[read_wavfile(x) for x in audio_filepaths]))

df['raw_audio'] = raw_audio
df['samplerate_hz'] = samplerate_hz

In [42]:
# Removing NA rows

nrows = df.shape[0]
df.dropna(subset=['raw_audio'], inplace=True)
print("Dropped ", nrows-df.shape[0], " rows due to missing/failed audio")

Dropped  0  rows due to missing/failed audio


## 3. Removing very long calls

It's advisable to remove very long calls, as all calls will be zero-padded to the maximum duration in the dataset.

In [43]:
# Extract duration of calls
df['duration_s'] = [x.shape[0] for x in df['raw_audio']]/df['samplerate_hz']

[Can be helpful to plot the distribution to find a good cutoff]

In [44]:
#%matplotlib inline
#n, bins, patches = plt.hist(df['duration_s'])

In our case, dataset was already cleaned and all calls were between 0-0.5s, so no need to remove long calls. Set MIN_DUR and MAX_DUR to values that make sense for your dataset.

In [45]:
MIN_DUR = 0  # --> minimum duration of calls in seconds
MAX_DUR = 0.5 # --> maximum duration of calls in seconds

In [46]:
print("Dropped ", df.loc[df['duration_s']<MIN_DUR,:].shape[0], "rows below ", MIN_DUR, "s (min_dur)")
df = df.loc[df['duration_s']>=MIN_DUR,:]
print("Dropped ", df.loc[df['duration_s']>MAX_DUR,:].shape[0], "rows above ", MAX_DUR, "s (max_dur)")
df = df.loc[df['duration_s']<=MAX_DUR,:]

Dropped  0 rows below  0 s (min_dur)
Dropped  0 rows above  0.5 s (max_dur)


## 4. Generate mel-spectrograms

In this step, spectrograms are generated from audio files via short-time fourier transformation. Spectrograms capture the frequency components of a signal over time. A spectrogram is a 2D matrix, where each value represents the signal intensity in a specific time (columns) and frequency bin (row). In this case, the frequency axis of the spectrograms are also Mel-transformed (a logarithmic scale) and signal intensity is expressed on a Decibel scale.

The following parameters define how the spectrograms are computed. You can leave this as default or choose your own parameters.

### 4.1. Set spectrogramming parameters

In [47]:
N_MELS = 40 # --> number of mel bins (usually 20-40)
            # The frequency bins are transformed to this number of logarithmically spaced mel bins.

FFT_WIN = 0.03 # --> length of audio chunk when applying STFT in seconds
               # FFT_WIN * samplerate = number of audio datapoints that go in one fft (=n_fft)

FFT_HOP = FFT_WIN/8 # --> hop_length in seconds
                    # FFT_HOP * samplerate = n of audio datapoints between successive ffts (=hop_length)

WINDOW = 'hann' # --> name of window function
                # each frame of audio is windowed by a window function. We use the window function 'hanning',  

FMIN = 0 # --> lower bound for frequency (in Hz) when generating Mel filterbank
FMAX = int(np.min(df['samplerate_hz'])/2) #--> upper bound for frequency (in Hz) when generating Mel filterbank
                                                 # this is set to 0.5 times the samplerate (-> Nyquist rule)
                                                 # If input files have different samplerates, the lowest samplerate is used
                                                 # to ensure all spectrograms have the same frequency resolution.

Save these parameters as they will be needed for all subsequent scripts

In [48]:
lines = ['N_MELS = '+str(N_MELS),
         'FFT_WIN = '+str(FFT_WIN),
         'FFT_HOP = '+str(FFT_HOP),
         'WINDOW = "'+str(WINDOW)+'"',
         'FMIN = '+str(FMIN),
         'FMAX = '+str(FMAX)]

with open(os.path.join(os.path.sep, P_DIR, 'spec_params.py'), 'w') as f:
    for line in lines:
        f.write(line)
        f.write('\n')

### 4.2. Generate spectrograms

In [49]:
spectrograms = df.apply(lambda row: generate_mel_spectrogram(row['raw_audio'],
                                                                    row['samplerate_hz'],
                                                                    N_MELS,
                                                                    WINDOW,
                                                                    FFT_WIN,
                                                                    FFT_HOP,
                                                                    FMAX), 
                               axis=1)


df['spectrograms'] = spectrograms

In [50]:
# Removing NA rows

nrows = df.shape[0]
df.dropna(subset=['spectrograms'], inplace=True)
print("Dropped ", nrows-df.shape[0], " rows due to failed spectrogram generation")

Dropped  0  rows due to failed spectrogram generation


## [Optional: 5. Generate denoised files (median subtraction):]

In [51]:
if MEDIAN_SUB:
    df['denoised_spectrograms'] = [(spectrogram - np.median(spectrogram, axis=0)) for spectrogram in df['spectrograms']]

## [Optional : 6. Generate denoised files (bandpass filter):]

In [52]:
if BANDPASS_FILTER:
    from scipy.signal import butter, lfilter

    # Bandpass filters for calculating audio intensity
    LOWCUT = 300.0
    HIGHCUT = 3000.0

    # Butter bandpass filter implementation:
    # from https://scipy-cookbook.readthedocs.io/items/ButterworthBandpass.html

    def butter_bandpass(lowcut, highcut, fs, order=5):
        nyq = 0.5 * fs
        low = lowcut / nyq
        high = highcut / nyq
        b, a = butter(order, [low, high], btype='band')
        return b, a

    def butter_bandpass_filter(data, lowcut, highcut, fs, order=5):
        b, a = butter_bandpass(lowcut, highcut, fs, order=order)
        y = lfilter(b, a, data)
        return y
    
    df['filtered_audio'] = df.apply(lambda row: butter_bandpass_filter(row['raw_audio'], 
                                                                   LOWCUT, 
                                                                   HIGHCUT, 
                                                                   row['samplerate_hz'],
                                                                   order=6),
                               axis=1)

    filtered_spectrograms = df.apply(lambda row: generate_mel_spectrogram(row['filtered_audio'],
                                                                        row['samplerate_hz'],
                                                                        N_MELS,
                                                                        WINDOW,
                                                                        FFT_WIN,
                                                                        FFT_HOP,
                                                                        FMAX), 
                                   axis=1)


    df['filtered_spectrograms'] = filtered_spectrograms

## [Optional: 7. Generate stretched spectrograms (all stretched to equal length)]

In [53]:
if STRETCH:
    MAX_DURATION = np.max(df['duration_s'])

    df['stretched_spectrograms'] = df.apply(lambda row: generate_stretched_mel_spectrogram(row['raw_audio'],
                                                                                           row['samplerate_hz'],
                                                                                           row['duration_s'],
                                                                                           N_MELS,
                                                                                           WINDOW,
                                                                                           FFT_WIN,
                                                                                           FFT_HOP,
                                                                                           MAX_DURATION),
                                            axis=1)

## 8. Clean labels

Transform all labels into strings and label all calls without label as "unknown"

In [54]:
df['original_label'] = df[LABEL_COL] #  original labels are saved in "original_label" column

df['label'] = ["unknown" if x in NA_DESCRIPTORS else x for x in df[LABEL_COL]]  # "unknown" for all NA labels
labels = df['label'].fillna(NEW_NA_INDICATOR) # double-check
df['label'] = labels.astype(str) # transform to strings and save in df as "label" column

## 9. Save dataframe

In [55]:
df.to_pickle(os.path.join(os.path.sep, DATA, 'df.pkl'))