<a href="https://colab.research.google.com/github/mtedder/AudioAI-Project/blob/master/notebooks/AudioAI_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Vinyl Audio QC AI/ML Model Pipeline

This notebook prepare the data, feature extraction and build a ML model to detect quality control defects in Vinyl records The goal is detect the following audio QC metrics

*   Skips
*  Jumps
*   Sticks
*   Intrusive background noise

Reference Links:

* [Podcast talk with Librosa creater Brian McFee](https://twimlai.com/twiml-talk-263-librosa-audio-and-music-processing-in-python-with-brian-mcfee/)
* [audio-analysis](https://www.ntirawen.com/2018/12/audio-analysis-using-deep-learning.html)
* [tensorflow-sound-classification](https://www.iotforall.com/tensorflow-sound-classification-machine-learning-applications/)
* [Audio-content-representations](https://www.researchgate.net/figure/Audio-content-representations-On-the-top-a-digital-audio-signal-is-illustrated-with-its_fig2_319700841)
* [Music Feature Extraction in Python](https://towardsdatascience.com/extract-features-of-music-75a3f9bc265d)
* [Spectrogram, Cepstrum and Mel-Frequency](https://archive.org/details/SpectrogramCepstrumAndMel-frequency_636522)
* [Audio Voice Processing Deep Learning](https://www.analyticsvidhya.com/blog/2017/08/audio-voice-processing-deep-learning/)
* [Music Classification](https://medium.com/@sdoshi579/classification-of-music-into-different-genres-using-keras-82ab5339efe0)

Audio Software Libraries
* [Librosa](https://pypi.org/project/librosa/)
* [PyAudio](https://pypi.org/project/PyAudio/)

In [0]:
#Uncommet this code if you need colboratory tensorflow version to match the versions available in google Cloud ML
#!pip install tensorflow==1.12.0

##1. Imports

In [0]:
#Constants
ANNOTATIONS_DIR = "annotations"
DATA_DIR = "data"
RAW_DATA_DIR = "raw_sound_files"
DATA_CSV_FILE = "data.csv"

In [0]:
#Create required directories
!mkdir annotations
!mkdir data
!mkdir raw_sound_files

In [0]:
#Get annotations json files from repo storage
!wget -P annotations/ https://raw.githubusercontent.com/mtedder/AudioAI-Project/master/annotations/01Label.json
!wget -P annotations/ https://raw.githubusercontent.com/mtedder/AudioAI-Project/master/annotations/02Label.json
!wget -P annotations/ https://raw.githubusercontent.com/mtedder/AudioAI-Project/master/annotations/03Label.json
!wget -P annotations/ https://raw.githubusercontent.com/mtedder/AudioAI-Project/master/annotations/04Label.json
!wget -P annotations/ https://raw.githubusercontent.com/mtedder/AudioAI-Project/master/annotations/05Label.json
!wget -P annotations/ https://raw.githubusercontent.com/mtedder/AudioAI-Project/master/annotations/06Label.json

In [0]:
#Get raw sound file clips from repo storage
!wget -P raw_sound_files/ https://raw.githubusercontent.com/mtedder/AudioAI-Project/master/raw_sound_files/01Label.wav
!wget -P raw_sound_files/ https://raw.githubusercontent.com/mtedder/AudioAI-Project/master/raw_sound_files/02Label.wav
!wget -P raw_sound_files/ https://raw.githubusercontent.com/mtedder/AudioAI-Project/master/raw_sound_files/03Label.wav
!wget -P raw_sound_files/ https://raw.githubusercontent.com/mtedder/AudioAI-Project/master/raw_sound_files/04Label.wav
!wget -P raw_sound_files/ https://raw.githubusercontent.com/mtedder/AudioAI-Project/master/raw_sound_files/05Label.wav
!wget -P raw_sound_files/ https://raw.githubusercontent.com/mtedder/AudioAI-Project/master/raw_sound_files/06Label.wav

In [0]:
# Ref: https://librosa.github.io/librosa/index.html
import librosa
import librosa.display
import numpy as np
import pandas as pd # data processing, json file I/O (e.g. pd.read_json)
import os
import csv

# import tensorflow as tf
import keras
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Conv1D, MaxPooling1D, LSTM, Dense, Input, SimpleRNN, TimeDistributed, Flatten, Dropout
from sklearn.preprocessing import LabelEncoder
# Import the `pyplot` module
import matplotlib.pyplot as plt

In [0]:
#!python --version
# print("Tensorflow version" + tf.__version__)
# import keras;
# print(keras.__version__)


##2. Data Preparation

In [0]:
# %cd annotations
# !rm -rf annotations/.ipynb_checkpoints
# !ls -al
# %cd ../

###Read annotation JSON files & Extract features and Labels

In [0]:
dataset = []
for filename in os.listdir(f'./' + ANNOTATIONS_DIR + '/'):
#   print(filename)
  dataframe = pd.read_json(ANNOTATIONS_DIR + '/' + filename)
  for index, row in dataframe.iterrows():
      label = row['labels']    
      y, sr = librosa.load(row['path'], mono=True, sr=22050, duration=2.97)
      S = librosa.feature.melspectrogram(y, sr=sr)
      x = np.array(S.reshape(128, 128, 1))
      dataset.append((x,label))    

dataset = np.array(dataset)
#split into input and labels arrays
X, Y = zip(*dataset)
x_test = np.array(X)
Y_test = np.array(Y)

In [0]:
# Replace this code when more data is available
# Create batch of sequences for training input - repeated dummy input for testing pipeline
x_temp_test= []
x_temp_test.append(x_test)
x_temp_test.append(x_test)
x_temp_test.append(x_test)
x_temp_test.append(x_test)
x_temp_test.append(x_test)
x_temp_test.append(x_test)
X_test = np.array(x_temp_test)

In [0]:
# Normalize data?
##
##

In [0]:
# Create test and training datasets
#Return a random sample of items from an axis of object - http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sample.html#pandas.DataFrame.sample
# train_dataset = dataset.sample(frac=0.5,random_state=0)

# train_labels = train_dataset.pop('labels')

#Drop specified labels from rows or columns with the given indecies (index)- http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop.html#pandas.DataFrame.drop
# test_dataset = dataset.drop(train_dataset.index)

#get labels from the features test dataset
# test_labels = test_dataset.pop('labels')

**One-Hot Encode the labels**

In [0]:
# integer encode
label_encoder = LabelEncoder()
integer_encoded = label_encoder.fit_transform(Y_test)
print(integer_encoded)

Y_test = np.array(keras.utils.to_categorical(integer_encoded, 4))

##Feature Engineering
[librosa ref functions](https://github.com/mtedder/AudioAI-Project/blob/master/notebooks/audioai_playground.ipynb)

[audio ai isolating vocals](https://towardsdatascience.com/audio-ai-isolating-vocals-from-stereo-music-using-convolutional-neural-networks-210532383785)

###Load File

In [0]:
# Load a 30 second sample of the desired audio file
FILE_NAME = "01Label.wav"
FILE_PATH = "raw_sound_files/"

y, sr = librosa.load(FILE_PATH+FILE_NAME, mono=True, sr=22050, duration=2.97)#default sample rate 22050Hz & duration affects t axis bin size in spectrograms
# y, sr = librosa.load(FILE_PATH, mono=True, sr=16384, duration=2.97)#down sample to 16384Hz & duration  affects t axis bin size in spectrograms
# y, sr = librosa.load(FILE_PATH, mono=True, sr=256, duration=2.97)#down sample to 256Hz & duration  affects t axis bin size in spectrograms


print(y.shape)
print(sr)

###STFT Plot

Short-time Fourier transform (STFT)

In [0]:
D = np.abs(librosa.stft(y))
db = librosa.amplitude_to_db(D,ref=np.max)
# Display a spectrogram
# Make a new figure
plt.figure(figsize=(24,8))

librosa.display.specshow(db,y_axis='log', x_axis='time')
plt.title('Power spectrogram')
plt.colorbar(format='%+2.0f dB')
plt.tight_layout()

###Mel spectrogram Plot
[Librosa demo](https://nbviewer.jupyter.org/github/librosa/librosa/blob/master/examples/LibROSA%20demo.ipynb)

In [0]:
###Mel spectrogram
# Ref: http://librosa.github.io/librosa/generated/librosa.feature.melspectrogram.html#librosa-feature-melspectrogram

# Let's make and display a mel-scaled power (energy-squared) spectrogram
# S = librosa.feature.melspectrogram(y, sr=sr, n_mels=128)
S = librosa.feature.melspectrogram(y, sr=sr)

# Convert to log scale (dB). We'll use the peak power (max) as reference.
log_S = librosa.power_to_db(S, ref=np.max)

# Make a new figure
plt.figure(figsize=(24,8))

# Display the spectrogram on a mel scale
# sample rate and hop length parameters are used to render the time axis
# librosa.display.specshow(log_S, sr=sr, x_axis='time', y_axis='mel')
librosa.display.specshow(log_S, sr=sr, x_axis='time', y_axis='mel')

# Put a descriptive title on the plot
plt.title('mel power spectrogram')

# draw a color bar
plt.colorbar(format='%+02.0f dB')

# Make the figure layout compact
plt.tight_layout()

###Extract chroma_stft, rmse, spec_cent, spec_bw, rolloff & mfcc and store in csv file

In [0]:
#Extract Audio features from the sound file listed in the data dataframe and create data.csv file
# Ref: https://medium.com/@sdoshi579/classification-of-music-into-different-genres-using-keras-82ab5339efe0
# for index, row in data.iterrows():
#     Get raw audio filename from dataframe
#     label = row['labels']
# Strip file extension from file name
json_file = FILE_NAME[:FILE_NAME.index(".")]
#     y, sr = librosa.load(row['path'], mono=True, sr=None, duration=30)
chroma_stft = librosa.feature.chroma_stft(y=y, sr=sr)
rmse = librosa.feature.rmse(y=y)
spec_cent = librosa.feature.spectral_centroid(y=y, sr=sr)
spec_bw = librosa.feature.spectral_bandwidth(y=y, sr=sr)
rolloff = librosa.feature.spectral_rolloff(y=y, sr=sr)
zcr = librosa.feature.zero_crossing_rate(y)
mfcc = librosa.feature.mfcc(y=y, sr=sr)
to_append = f'{json_file} {np.mean(chroma_stft)} {np.mean(rmse)} {np.mean(spec_cent)} {np.mean(spec_bw)} {np.mean(rolloff)} {np.mean(zcr)}'
#     Add mfcc values
for e in mfcc:
  to_append += f' {np.mean(e)}'
to_append += f' {label}'
#Save features and labels to csv file
file = open(DATA_DIR + '/' + DATA_CSV_FILE, 'a', newline='')
with file:
    writer = csv.writer(file)
    writer.writerow(to_append.split())

##3. Feature Extraction

##4. Build Model

[Based on this paper](https://www.researchgate.net/publication/319700841_A_Tutorial_on_Deep_Learning_for_Music_Information_Retrieval)

Choi, Keunwoo & Fazekas, György & Cho, Kyunghyun & Sandler, Mark. (2017). A Tutorial on Deep Learning for Music Information Retrieval. 

Model structure architecture form reference section 5.7.2 : CRNN: c2 -p2 -c2 -p2 -r1 -r2 -d1

**LSTM**

The input to every LSTM layer must be three-dimensional.

The three dimensions of this input are:

* Samples. One sequence is one sample. A batch is comprised of one or more samples.
* Time Steps. One time step is one point of observation in the sample.
* Features. One feature is one observation at a time step.


[Sequence Classification with LSTM Recurrent Neural Networks in Python with Keras](https://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/)

[CNN LSTM Model](https://machinelearningmastery.com/cnn-long-short-term-memory-networks/)

[CNN+LSTM ](https://towardsdatascience.com/get-started-with-using-cnn-lstm-for-forecasting-6f0f4dde5826)

[Time Series Analysis with LSTM using Python's Keras Library](https://stackabuse.com/time-series-analysis-with-lstm-using-pythons-keras-library/)

[How to Reshape Input Data for Long Short-Term Memory Networks in Keras](https://machinelearningmastery.com/reshape-input-data-long-short-term-memory-networks-keras/)

[Introduction to 1D Convolutional Neural Networks in Keras for Time Sequences](https://blog.goodaudience.com/introduction-to-1d-convolutional-neural-networks-in-keras-for-time-sequences-3a7ff801a2cf)

[understanding lstm](https://towardsdatascience.com/understanding-lstm-and-its-quick-implementation-in-keras-for-sentiment-analysis-af410fd85b47)

[audio ai isolating vocals](https://towardsdatascience.com/audio-ai-isolating-vocals-from-stereo-music-using-convolutional-neural-networks-210532383785)

[audio-classifier-convNet notebook](https://github.com/ajhalthor/audio-classifier-convNet/blob/master/env_sound_discrimination.ipynb)

*Videos**

[Sound play with Convolution Neural Networks](https://youtu.be/GNza2ncnMfA)

[Convolution Neural Networks - EXPLAINED](https://youtu.be/m8pOnJxOcqY)

[A friendly introduction to Convolutional Neural Networks and Image Recognition](https://youtu.be/2-Ol7ZB0MmU)

In [0]:
#Create Model
#Model - groups layers into an object with training and inference features.


input_shape = (None, 128,128,1)#(None, 128,128,1)#x.shape#x_test.shape#X_test.shape
print(input_shape)

#Sequential - Linear stack of layers.
model = Sequential()

# Layer 1 - c2 2D convolution NN & 2D pooling layer
# Wrip in TimeDistributed layer to feed the lstm
# Ref:https://keras.io/layers/convolutional/
# TimeDistributed - https://keras.io/layers/wrappers/
model.add(TimeDistributed(Conv2D(24, (5, 5), strides=(1, 1), padding='same', name='conv1', activation='relu'), name='input_layer', input_shape=input_shape))

# p1 2D pooling layer
# Ref:https://keras.io/layers/pooling/
model.add(TimeDistributed(MaxPooling2D(pool_size = (4,2), strides=(4,2),name='pool1'), name='layer2'))

# Layer 2 - c2 2D convolution NN & 2D pooling layer
# Ref:https://keras.io/layers/convolutional/
model.add(TimeDistributed(Conv2D(48, (5, 5), strides=(1, 1), padding='valid', name='conv2', activation='relu'), name='layer3'))

# p2 2D pooling layer
# Ref:https://keras.io/layers/pooling/
model.add(TimeDistributed(MaxPooling2D(pool_size = (4,2), strides=(4,2),name='pool2'), name='layer4'))

# Prepare output from previous MaxPooling2D to input into lstm
model.add(TimeDistributed(Flatten(), name='layer5'))

# Layer 3 - r1 1D recurrent NN
# Ref: https://keras.io/layers/recurrent/
model.add(LSTM(50, name='layer6', activation='relu'))

# Layer 4 - r2 2D recurrent NN
# Ref: https://keras.io/layers/recurrent/
# model.add(LSTM(50, name='lstm2', activation='relu'))

# Layer 5 - begin vanilla feed forward NN
model.add(Dense(64, name='layer7', activation='relu'))#dense hidden layer
model.add(Dropout(rate=0.5))

# Layer 6 - output class layer
model.add(Dense(4, name='output_layer', activation='softmax'))#output layer

#Compile Configures the model for training - https://www.tensorflow.org/api_docs/python/tf/keras/models/Sequential#compile
model.compile(
	optimizer="Adam",
	loss="categorical_crossentropy",
	metrics=['accuracy','categorical_crossentropy'])

#print a simple description of the model - https://www.tensorflow.org/api_docs/python/tf/keras/models/Model#summary
print(model.summary())

**Model Sanity Check**

In [0]:
#Run an inference on the untrained model using dummy data to test model plumbing
#Input needs to be 5 dimensional
# result = model.predict(X_test.reshape([-1, 6,128,128,1]))#-1 is used to infer one missing length from the other
result = model.predict(X_test)#-1 is used to infer one missing length from the other
print(result)

##5. Train Model

In [0]:
#Call the model.fit function
#Train using fit which Trains the model for a fixed number of epochs (iterations on a dataset) - https://www.tensorflow.org/api_docs/python/tf/keras/models/Model#fit
epoch = 10 ## the higher this number is the more accurate the prediction will be 10000 is a good number to set it at just takes a while to train
history = model.fit(X_test, Y_test, batch_size=1, nb_epoch=epoch)

##6. Evaluate Model

In [0]:
#Visualize the model's training progress
#History.history attribute is a record of training loss values and metrics values at successive epochs, as well as validation loss values and validation metrics values (if applicable)
#https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/History
hist = pd.DataFrame(history.history)
# Add epoch column to hist dataframe
hist['epoch'] = history.epoch

hist.tail()

In [0]:
#Is this model good? Visualize model performance,
import matplotlib.pyplot as plt

# Plot categorical crossentropy (loss) vs. Epochs
plt.figure()
plt.xlabel('Epoch')
plt.ylabel('categorical crossentropy')
plt.plot(hist['epoch'], hist['categorical_crossentropy'], label='categorical crossentropy')
plt.legend()
# plt.ylim([0,1.25])

# Plot accuracy vs. Epochs
plt.figure()
plt.xlabel('Epoch')
plt.ylabel('accuracy')
plt.plot(hist['epoch'], hist['acc'], label = 'accuracy')
plt.legend()

"#Test model using the test data and evaluate\n",
"#Returns the loss value & metrics values for the model in test mode - https://www.tensorflow.org/api_docs/python/tf/keras/models/Model#evaluate\n",
print(model.metrics_names)

# evaluate - Returns the loss value & metrics values for the model in test mode.
eval = model.evaluate(x=X_test, y=Y_test, verbose=0)
print(eval)
# print("Testing Loss/Error: {:S} Out".format(loss))

##7. Inference – Make Predictions

In [0]:
#Run an inference on the untrained model using dummy data to test model plumbing
# X_test = X_test.reshape([-1, 6,128,128,1])#-1 is used to infer one missing length from the other
result = model.predict(X_test)
print(result)

#Deploy to ML Cloud Engine

##8. Prepare Model for Saving

##9. Upload model to existing GCS bucket

##Upload GOOGLE_APPLICATION_CREDENTIALS (Optional)

In [0]:
#Upload GOOGLE_APPLICATION_CREDENTIALS json file from local computer and save to this notebook
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

##Set GOOGLE_APPLICATION_CREDENTIALS environment variable (Optional)

In [0]:
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = #INSERT YOUR CREDENTIALS FILENAME HERE!!

##10. Request online prediction from deployed model