### INTRODUCTION
* Extraction of features is a very important part in analyzing and finding relations between different things. The data provided of audio cannot be understood by the models directly to convert them into an understandable format feature extraction is used. It is a process that explains most of the data but in an understandable way. Feature extraction is required for classification, prediction and recommendation algorithms.

### PACKAGES TO BE USED
* We’ll be using librosa for analyzing and extracting features of an audio signal. For playing audio we will use pyAudio so that we can play music on jupyter directly.

In [None]:
import os
import random

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib.image as mpimg
plt.style.use('fivethirtyeight')
sns.set_style('whitegrid')

import librosa
import librosa.display
import IPython.display as ipd

import sklearn
from sklearn.preprocessing import minmax_scale

import warnings
warnings.filterwarnings('ignore')

In [None]:
train_folder = '../input/birdsong-recognition/train_audio'
birds = [path for path in os.listdir(os.path.join(train_folder))][-5:]

In [None]:
sample_sound = {}

for i,bird in enumerate(birds):
    folder = os.path.join(train_folder, bird)
    for path in os.listdir(os.path.join(folder)):
        #get 1 sample sound per bird
        sample_sound[bird] = os.path.join(folder, path)
        break

In [None]:
sample_sound

### LET'S HEAR SOME BIRDS

In [None]:
print(birds[0], ' sample sound.')
ipd.Audio(sample_sound[birds[0]])

In [None]:
print(birds[1], ' sample sound.')
ipd.Audio(sample_sound[birds[1]])

In [None]:
print(birds[2], ' sample sound.')
ipd.Audio(sample_sound[birds[2]])

In [None]:
print(birds[3], ' sample sound.')
ipd.Audio(sample_sound[birds[3]])

In [None]:
print(birds[4], ' sample sound.')
ipd.Audio(sample_sound[birds[4]])

### LOAD THE SAMPLES AND CHECK INFO.

In [None]:
sound_data = {}

for i, val in enumerate(sample_sound.values()):
    y, sr = librosa.load(val)
    sound_data[birds[i]] = {'y':y, 'sr': sr}

In [None]:
sound_data

In [None]:
for i in range(len(birds)):
    print('X shape: ', sound_data[birds[i]]['y'].shape)
    print('Sampling Rate (KHz): ', sound_data[birds[i]]['sr'])
    print('='*50)

### FEATURE EXTRACTION AND VISUALIZATION
---

### SOUNDWAVES
* Waveplots let us know the loudness of the audio at a given time.

In [None]:
#function to generate random color
def gen_color():
    color = "%06x" % random.randint(0, 0xFFFFFF)
    color = '#'+ color
    return color

In [None]:
fig, ax = plt.subplots(len(birds),1,figsize=(14,10))
plt.tight_layout(3)

for i in range(len(birds)):
    librosa.display.waveplot(y = sound_data[birds[i]]['y'],
                             sr = sound_data[birds[i]]['sr'],
                             ax = ax[i], color = gen_color())
    ax[i].set_title(birds[i].capitalize())


### SPECTOGRAM
* A spectrogram is a visual representation of the spectrum of frequencies of sound or other signals as they vary with time. It’s a representation of frequencies changing with respect to time for given music signals.

In [None]:
for i in range(len(birds)):
    #perform a short fourier transform on signal amplitude
    sound_data[birds[i]]['stft'] = librosa.stft(sound_data[birds[i]]['y'])
    # convert to db
    sound_data[birds[i]]['ydb'] = librosa.amplitude_to_db(abs(sound_data[birds[i]]['stft']))

*  STFT(Short Time Fourier Transform) converts signal such that we can know the amplitude of given frequency at a given time. Using STFT we can determine the amplitude of various frequencies playing at a given time of an audio signal. 

In [None]:
#show data
sound_data[birds[1]]

In [None]:
fig, ax = plt.subplots(len(birds),1,figsize=(20,15))
plt.tight_layout(3)

for i in range(len(birds)):
    librosa.display.specshow(sound_data[birds[i]]['ydb'],
                             sr = sound_data[birds[i]]['sr'],
                             x_axis='time', y_axis='hz',
                             ax = ax[i])
    ax[i].set_title(birds[i].capitalize())

### ZERO CROSSING RATE
* The zero crossing rate is the rate of sign-changes along a signal, i.e., the rate at which the signal changes from positive to negative or back. This feature has been used heavily in both speech recognition and music information retrieval.

In [None]:
for i in range(len(birds)):
    sound_data[birds[i]]['zcr'] = librosa.zero_crossings(sound_data[birds[i]]['y'], pad=False).sum()

In [None]:
for i in range(len(birds)):
    print(birds[i].capitalize(), 'Zero Crossing Rate: ', sound_data[birds[i]]['zcr'])

### SPECTRAL CENTROID
*  If the frequencies in music are same throughout then spectral centroid would be around a centre and if there are high frequencies at the end of sound then the centroid would be towards its end.

In [None]:
for i in range(len(birds)):
    sound_data[birds[i]]['spec_c'] = librosa.feature.spectral_centroid(sound_data[birds[i]]['y'], sr= sound_data[birds[i]]['sr'])[0]
    frames = range(len(sound_data[birds[i]]['spec_c']))
    sound_data[birds[i]]['t_frame'] = librosa.frames_to_time(frames)

In [None]:
fig, ax = plt.subplots(len(birds),1,figsize=(14,15))
plt.tight_layout(3)

for i in range(len(birds)):
    librosa.display.waveplot(y = sound_data[birds[i]]['y'],
                             sr = sound_data[birds[i]]['sr'],
                             ax = ax[i], color = gen_color())
    # Normalising the spectral centroid for visualisation
    ax[i].plot(sound_data[birds[i]]['t_frame'], minmax_scale(sound_data[birds[i]]['spec_c'], axis=0), lw=1)
    ax[i].set_title(birds[i].capitalize())
    ax[i].legend(['Spectral Centroid', 'SoundWave'], loc ='upper left');

### SPECTRAL ROLLOFF
* Spectral rolloff is the frequency below which a specified percentage of the total spectral energy, e.g. 85%, lies.

In [None]:
for i in range(len(birds)):
    sound_data[birds[i]]['spec_r'] = librosa.feature.spectral_rolloff(sound_data[birds[i]]['y'], sr= sound_data[birds[i]]['sr'])[0]
    frames = range(len(sound_data[birds[i]]['spec_r']))
    sound_data[birds[i]]['tr_frame'] = librosa.frames_to_time(frames)

In [None]:
fig, ax = plt.subplots(len(birds),1,figsize=(14,15))
plt.tight_layout(3)

for i in range(len(birds)):
    librosa.display.waveplot(y = sound_data[birds[i]]['y'],
                             sr = sound_data[birds[i]]['sr'],
                             ax = ax[i], color = gen_color())
    # Normalising the spectral centroid for visualisation
    ax[i].plot(sound_data[birds[i]]['tr_frame'], minmax_scale(sound_data[birds[i]]['spec_r'], axis=0), lw=1)
    ax[i].set_title(birds[i].capitalize())
    ax[i].legend(['Spectral Roll-off', 'SoundWave'], loc ='upper left');

### BPM

In [None]:
for i in range(len(birds)):
    sound_data[birds[i]]['bpm'] = librosa.beat.beat_track(sound_data[birds[i]]['y'], sr=sound_data[birds[i]]['sr'])[0]
    print(birds[i],' BPM: ',sound_data[birds[i]]['bpm'])

### HARMONICS AND PERCEPTUAL

*  Harmonics - Partial tones that are whole multiples of the fundamental frequency.
*  Perceptrual shock wave -  Represents the sound rhythm and emotion.

In [None]:
for i in range(len(birds)):
    sound_data[birds[i]]['y_harm'], sound_data[birds[i]]['y_perc'] = librosa.effects.hpss(sound_data[birds[i]]['y'])

In [None]:
fig, ax = plt.subplots(len(birds),1, figsize=(10,15))
plt.tight_layout(3)
for i in range(len(birds)):
    ax[i].set_title(birds[i].capitalize())
    ax[i].plot(sound_data[birds[i]]['y_perc'], color= 'steelblue', lw=1);
    ax[i].plot(sound_data[birds[i]]['y_harm'], color= 'salmon', lw=1);


### CHROMA FREQUENCIES
*  Chroma-based features are a powerful tool for analyzing music whose pitches can be meaningfully categorized (often into twelve categories) and whose tuning approximates to the equal-tempered scale. One main property of chroma features is that they capture harmonic and melodic characteristics of music, while being robust to changes in timbre and instrumentation.

In [None]:
for i in range(len(birds)):
    base =  sound_data[birds[i]]
    base['ch_fr'] = librosa.feature.chroma_stft(base['y'], sr = base['sr'])
    print(birds[i], ' Chromogram Shape: ', base['ch_fr'].shape)

In [None]:
fig, ax = plt.subplots(len(birds), 1 , figsize=(10,15))
plt.tight_layout(3)

for i in range(len(birds)):
    ax[i].set_title(birds[i].capitalize())
    librosa.display.specshow(sound_data[birds[i]]['ch_fr'], x_axis='time', y_axis='chroma', cmap='cividis', ax = ax[i])

### MFCC
* The mel frequency cepstral coefficients (MFCCs) of a signal are a small set of features (usually about 10–20) which concisely describe the overall shape of a spectral envelope.

In [None]:
for i in range(len(birds)):
    base =  sound_data[birds[i]]
    base['mfcc'] = librosa.feature.mfcc(base['y'], sr = base['sr'])
    print(birds[i], ' MFCC Shape: ', base['mfcc'].shape)

In [None]:
fig, ax = plt.subplots(len(birds), 1 , figsize=(10,15))
plt.tight_layout(3)

for i in range(len(birds)):
    ax[i].set_title(birds[i].capitalize())
    librosa.display.specshow(sound_data[birds[i]]['mfcc'], x_axis='time', y_axis='log', cmap='viridis', ax = ax[i])

### REFERENCE
* https://towardsdatascience.com/how-to-apply-machine-learning-and-deep-learning-methods-to-audio-analysis-615e286fcbbc
* https://en.wikipedia.org