# Base Model

Summary: Purpose of this notebook is to dive into the base model developed by vijay033 and whos code is found in the Git Repo https://github.com/vijay033/Noise-Suppression-Auto-Encoder.

The original code was very disorganized and included repetitive library imports as well as unclear descriptions for what is occuring. This notebook polishes up the original and include better descriptions for what the author had done to develop their model and what will act as a base model to compare our groups models. 

Note, the author initially included two models, titled Architecture 1 and Architecture 2. He discarded the first and advises to instead reduce network size and latency. This first Architecture in this case was removed and only the second will be used as a baseline.

## Dependencies 

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

import math
from PIL import Image
import time
import random
from scipy.stats import norm

%matplotlib inline
import IPython.display
from ipywidgets import interact, interactive, fixed

from scipy.io import wavfile
from scipy.signal import butter, lfilter
import scipy.ndimage

import gzip
import copy
import os
import glob
from tqdm import tqdm
import warnings
warnings.filterwarnings('ignore')

import keras
from keras.layers import Input,Conv2D,MaxPooling2D,UpSampling2D,concatenate
from keras.layers import BatchNormalization
from keras.layers import Dropout
from keras.models import Model
from keras.optimizers import SGD, Adam, RMSprop 
import keras.layers as layers
import keras.models as models
from keras.initializers import orthogonal
import tensorflow as tf # used to display GPU count
from tensorflow import keras
from tensorflow.python.keras.saving import saving_utils as _saving_utils
from tensorflow.python.framework import convert_to_constants as convert_variables_to_constants_v2

print(tf.version.VERSION)

2024-04-26 18:51:52.191615: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-04-26 18:51:52.860214: I tensorflow/core/platform/cpu_feature_guard.cc:183] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX, in other operations, rebuild TensorFlow with the appropriate compiler flags.


2.12.0


In [2]:
os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'

print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

Num GPUs Available:  2


## Data

Data comes from Kaggel - Libri Speech Noise Dataset (https://www.kaggle.com/datasets/7e537768e8abf483cb224e60d10b73c0e9b8620556c5797556724a27c3f508c4/data).

Paths to processed data (.wav to .npy)

In [3]:
# paths to processed data 
path_mat_train = '/data/csc6621/24-team-c/dataset/LibriNoise_Train_Test_NPY/mat_train/'
path_mat_test = '/data/csc6621/24-team-c/dataset/LibriNoise_Train_Test_NPY/mat_test/'
path_mat_ytrain = '/data/csc6621/24-team-c/dataset/LibriNoise_Train_Test_NPY/mat_ytrain/'
path_mat_ytest = '/data/csc6621/24-team-c/dataset/LibriNoise_Train_Test_NPY/mat_ytest/'

Preparing training data for model input

In [4]:
train_image = []
path = path_mat_train
for filename in tqdm(glob.glob(os.path.join(path, '*.npy'))):
    train_image.append(os.path.join(filename))
    
ytrain_image = []
path = path_mat_ytrain
for filename in tqdm(glob.glob(os.path.join(path, '*.npy'))):
    ytrain_image.append(os.path.join(filename))
    
test_image = []
path = path_mat_test
for filename in tqdm(glob.glob(os.path.join(path, '*.npy'))):
    test_image.append(os.path.join(filename))
    
ytest_image = []
path = path_mat_ytest
for filename in tqdm(glob.glob(os.path.join(path, '*.npy'))):
    ytest_image.append(os.path.join(filename))

100%|██████████| 7000/7000 [00:00<00:00, 1396903.99it/s]
100%|██████████| 7000/7000 [00:00<00:00, 511955.36it/s]
100%|██████████| 105/105 [00:00<00:00, 782241.42it/s]
100%|██████████| 105/105 [00:00<00:00, 850196.76it/s]


Dimension size of inputted data

In [5]:
ROW = 257
COL = 62

INPUT_DIM = (ROW,COL,3)
INPUT_DIM[:2]

(257, 62)

## Functions

Global variables needed for following functions

In [6]:
FFT_LENGTH = 512
WINDOW_LENGTH = 512
WINDOW_STEP = int(WINDOW_LENGTH / 2)
magnitudeMin = float("inf")
magnitudeMax = float("-inf")
phaseMin = float("inf")
phaseMax = float("-inf")

phaseMax = 3.141592653589793 
phaseMin = -3.141592653589793
magnitudeMax = 2211683.973249525
magnitudeMin = 0.0

In [7]:
def amplifyMagnitudeByLog(d):
    """
    Function takes magnitude value and returns amplified version of it using a log amplification method.
    
    Paramaters:
    -d: magnitude value
    
    Returns:
    - Amplified magnitude value
    
    """
    return 188.301 * math.log10(d + 1)

def weakenAmplifiedMagnitude(d):
    """
    Function responsible for inversing the amplified magnitude by log value. 
    
    Parameters:
    - d: amplified magnitude value
    
    Returns:
    - inversed magnitude value
    """
    
    return math.pow(10, d/188.301)-1

def generateLinearScale(magnitudePixels, phasePixels, magnitudeMin, magnitudeMax, phaseMin, phaseMax):
    """
    Function generates a linear-scale representation of the spectrogram thats good for visualization. 
    
    Returns:
    - spectorgram image
    
    """
    
    height = magnitudePixels.shape[0]
    width = magnitudePixels.shape[1]
    magnitudeRange = magnitudeMax - magnitudeMin
    phaseRange = phaseMax - phaseMin
    rgbArray = np.zeros((height, width, 3), 'uint8')
    
    for w in range(width):
        for h in range(height):
            magnitudePixels[h,w] = (magnitudePixels[h,w] - magnitudeMin) / (magnitudeRange) * 255 * 2
            magnitudePixels[h,w] = amplifyMagnitudeByLog(magnitudePixels[h,w])
            phasePixels[h,w] = (phasePixels[h,w] - phaseMin) / (phaseRange) * 255
            red = 255 if magnitudePixels[h,w] > 255 else magnitudePixels[h,w]
            green = (magnitudePixels[h,w] - 255) if magnitudePixels[h,w] > 255 else 0
            blue = phasePixels[h,w]
            rgbArray[h,w,0] = int(red)
            rgbArray[h,w,1] = int(green)
            rgbArray[h,w,2] = int(blue)
    return rgbArray

def recoverLinearScale(rgbArray, magnitudeMin, magnitudeMax, phaseMin, phaseMax):
    """
    Function is the inverse operation for generate linear scale. Takes a spectrogram array in linear scale and
    reconstructs the original magntitude and phase of the array. 
    
    Returns:
    - reconstructed magnitude and phase values 
    
    """
    width = rgbArray.shape[1]
    height = rgbArray.shape[0]
    magnitudeVals = rgbArray[:,:,0].astype(float) + rgbArray[:,:,1].astype(float)
    phaseVals = rgbArray[:,:,2].astype(float)
    phaseRange = phaseMax - phaseMin
    magnitudeRange = magnitudeMax - magnitudeMin

    
    for w in range(width):
        for h in range(height):
            phaseVals[h,w] = (phaseVals[h,w] / 255 * phaseRange) + phaseMin
            magnitudeVals[h,w] = weakenAmplifiedMagnitude(magnitudeVals[h,w])
            magnitudeVals[h,w] = (magnitudeVals[h,w] / (255*2) * magnitudeRange) + magnitudeMin
    return magnitudeVals, phaseVals

In [8]:
def recoverSignalFromSpectrogram(numpyarray):
    """
    Function recovers signal from spectrogram.
    
    Paramaters:
    - numpyarray: spectrogram data stored as a NumPy array. 
    
    Return: 
    - recovered: recovered signal.
    """

    data = np.array(numpyarray, dtype='uint8')
    
    # get spectogram width and height 
    width = data.shape[1]
    height = data.shape[0]
    
    # Calling recoverLinearScale to recover magnitude and phase values from normalized spectrogram data
    magnitudeVals, phaseVals = recoverLinearScale(data, magnitudeMin, magnitudeMax, phaseMin, phaseMax)
    

    recovered = np.zeros(WINDOW_LENGTH * width // 2 + WINDOW_STEP, dtype=np.int16)
    recovered = np.array(recovered,dtype=np.int16)
    
    # iterating over each column (width or frequency bin) of the spectrogram
    # for each freq bin  it constructs representation of the signal by combining the magnitude & phase 
    # use inverse FFT to convert constructed representation back to time-domain signal
    for w in range(width):
        toInverse = np.zeros(height, dtype=np.complex_)
        for h in range(height):
            magnitude = magnitudeVals[height-h-1,w]
            phase = phaseVals[height-h-1,w]
            toInverse[h] = magnitude * math.cos(phase) + (1j * magnitude * math.sin(phase))
        signal = np.fft.irfft(toInverse)
        recovered[w*WINDOW_STEP:w*WINDOW_STEP + WINDOW_LENGTH] += signal[:WINDOW_LENGTH].astype(np.int16)
    return recovered

In [9]:
def data_gen_train(train_batch_size):
    while True:
        for start in range(0,nb_train_samples,train_batch_size):
            x_batch = []
            y_batch = []
            end = min(start + train_batch_size, nb_train_samples)
            for img_path in range(start, end):
                img_train = np.load(train_image[img_path])
                img_train = img_train/255
                x_batch.append(img_train)
                img_ytrain = np.load(ytrain_image[img_path])
                img_ytrain = img_ytrain/255
                y_batch.append(img_ytrain)
            yield (np.array(x_batch), np.array(y_batch)) 
            
def data_gen_test(test_batch_size):
    while True:
        for start in range(0,nb_test_samples,test_batch_size):
            x_batch = []
            y_batch = []
            end = min(start + test_batch_size, nb_test_samples)
            for img_path in range(start, end):
                img_test = np.load(test_image[img_path])
                img_test = img_test/255
                x_batch.append(img_test)
                img_ytest= np.load(ytest_image[img_path])
                img_ytest = img_ytest/255
                y_batch.append(img_ytest)
            yield (np.array(x_batch), np.array(y_batch)) 

In [10]:
class CustomCallback(keras.callbacks.Callback):
    """
    Custome callback class that inherits from 'keras.callbacks.Callback' and is responsible for generating predictions for 
    random test images at the end of each epoch during model training. 
    """
    def on_epoch_end(self, autoencoder_train, epoch:int, logs=None)->None:
        """
        Function is called at the end of each epoch. Function is responsible for randomly selecting test image, 
        preprocessing, and normalizing image [0,1] and dividing by 255 to reshape it. Then model is used to predict the 
        output and that output is post-processed back to normal scale and then into audio data 
        (uses recoverSignalFromSpectrogram for that). Then audio is saved as a WAV file. Also, saves current model trained.
        
        Paramaters:
        - autoencoder_train: AE model being trained.
        - epochs: current epoch number.
        - logs: dict containing training metrics for current epoch; optional.
        
        """
        
        rate = 16000 # sampling rate 
        
        for j in range(5):
            r_num = random.randint(0, nb_test_samples-1)
            test_file = test_image[r_num] # randomly selecting test image
            img_test = np.load(test_file) 
            
            
            img_test = img_test/255 # normalizing 
            img_test = img_test.reshape(-1, ROW,COL,3) # reshape
            
            
            decoded_imgs = autoencoder.predict(img_test) # predict
            
            decoded_imgs = decoded_imgs.reshape(ROW,COL,3) # rehsape
            decoded_imgs = decoded_imgs*255 # denormalization
            decoded_imgs = decoded_imgs.astype(np.int16) 
            
            data = recoverSignalFromSpectrogram(decoded_imgs)
            
            scipy.io.wavfile.write('./'+"predict_{}".format(j)+'.wav', rate, data) # saving random predicted image into audio
        
        autoencoder.save('model.h5') # save current model iteration

In [11]:
def Conv2DLayer(x, filters, kernel, strides, padding, block_id, kernel_init=orthogonal()):
    """
    Function applies convolution, activation, dropout, and batch normalization sequentially and returns a output tensor.
    
    Paramaters:
    - x: input tensor
    - filters: number of filters
    - kernel: size of kernel
    - strides: stride of convolution
    - padding: padding mode
    - block_id: identifier of the block
    - kernel_init: kernel initializer (default orthogonal)
    
    """
    prefix = f'block_{block_id}_'
    x = layers.Conv2D(filters, kernel_size=kernel, strides=strides, padding=padding,
                      kernel_initializer=kernel_init, name=prefix+'conv')(x)
    x = layers.LeakyReLU(name=prefix+'lrelu')(x)
    x = layers.Dropout(0.2, name=prefix+'drop')((x))
    x = layers.BatchNormalization(name=prefix+'conv_bn')(x)
    return x

def Transpose_Conv2D(x, filters, kernel, strides, padding, block_id, kernel_init=orthogonal()):
    """
    Function is opposite of Conv2DLayer, it transposes convolutional layers.
    
    """
    prefix = f'block_{block_id}_'
    x = layers.Conv2DTranspose(filters, kernel_size=kernel, strides=strides, padding=padding,
                               kernel_initializer=kernel_init, name=prefix+'de-conv')(x)
    x = layers.LeakyReLU(name=prefix+'lrelu')(x)
    x = layers.Dropout(0.2, name=prefix+'drop')((x))
    x = layers.BatchNormalization(name=prefix+'conv_bn')(x)
    return x


def AutoEncdoer(input_shape):
    """
    Function defines AE architecture. 
    
    """
    inputs = layers.Input(shape=input_shape)
    
    # 256 x 256
    conv1 = Conv2DLayer(inputs, 64, 3, strides=1, padding='same', block_id=1)
    conv2 = Conv2DLayer(conv1, 64, 3, strides=1, padding='same', block_id=2)
    
    # 128 x 128
    conv3 = Conv2DLayer(conv2, 128, 5, strides=1, padding='same', block_id=3)
    
    # 64 x 64
    conv4 = Conv2DLayer(conv3, 128, 3, strides=1, padding='same', block_id=4)
    conv5 = Conv2DLayer(conv4, 256, 5, strides=1, padding='same', block_id=5)
    
    # 32 x 32
    conv6 = Conv2DLayer(conv5, 512, 3, strides=1, padding='same', block_id=6)
    
    # 16 x 16
    deconv1 = Transpose_Conv2D(conv6, 512, 3, strides=1, padding='same', block_id=7)
    
    # 32 x 32
    skip1 = layers.concatenate([deconv1, conv5], name='skip1')
    conv7 = Conv2DLayer(skip1, 256, 3, strides=1, padding='same', block_id=8)
    deconv2 = Transpose_Conv2D(conv7, 128, 3, strides=1, padding='same', block_id=9)
    
    # 64 x 64
    skip2 = layers.concatenate([deconv2, conv3], name='skip2')
    conv8 = Conv2DLayer(skip2, 128, 5, strides=1, padding='same', block_id=10)
    deconv3 = Transpose_Conv2D(conv8, 64, 3, strides=1, padding='same', block_id=11)
    
    # 128 x 128
    skip3 = layers.concatenate([deconv3, conv2], name='skip3')
    conv9 = Conv2DLayer(skip3, 64, 5, strides=1, padding='same', block_id=12)
    deconv4 = Transpose_Conv2D(conv9, 64, 3, strides=1, padding='same', block_id=13)
    
    # 256 x 256
    skip3 = layers.concatenate([deconv4, conv1])
    conv10 = layers.Conv2D(3, 3, strides=1, padding='same', activation='sigmoid',
                       kernel_initializer=orthogonal(), name='final_conv')(skip3)

    
    return models.Model(inputs=inputs, outputs=conv10)

# Base Model

In [12]:
nb_train_samples = 7000 # sample of converted .wav files 
nb_test_samples = 105

train_batch_size = [5,4,3,2]
test_batch_size = [5,4,3,2]

epochs = 20

In [13]:
opt = Adam(lr=0.001)
autoencoder = AutoEncdoer((ROW,COL, 3))
autoencoder.compile(optimizer=opt, loss=['mae','mse'], metrics=['mse','accuracy'])

2024-04-26 18:51:57.310762: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:47] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2024-04-26 18:51:57.310810: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1638] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 13797 MB memory:  -> device: 0, name: Tesla T4, pci bus id: 0000:61:00.0, compute capability: 7.5
2024-04-26 18:51:57.313762: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:47] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2024-04-26 18:51:57.313791: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1638] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 13797 MB memory:  -> device: 1, name: Tesla T4, pci bus id: 0000:da:00.0, compute capability: 7.5


In [14]:
autoencoder.summary()

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_1 (InputLayer)           [(None, 257, 62, 3)  0           []                               
                                ]                                                                 
                                                                                                  
 block_1_conv (Conv2D)          (None, 257, 62, 64)  1792        ['input_1[0][0]']                
                                                                                                  
 block_1_lrelu (LeakyReLU)      (None, 257, 62, 64)  0           ['block_1_conv[0][0]']           
                                                                                                  
 block_1_drop (Dropout)         (None, 257, 62, 64)  0           ['block_1_lrelu[0][0]']      

In [15]:
PERIOD = 5

# Include the epoch in the file name (uses `str.format`)
checkpoint_path = "modelcheckpoints/cp-{epoch:04d}.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)

# Create a callback that saves the model's weights every 10 epochs
"""
cp_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_path, 
    verbose=1, 
    save_weights_only=True,
    period=PERIOD
"""

'\ncp_callback = tf.keras.callbacks.ModelCheckpoint(\n    filepath=checkpoint_path, \n    verbose=1, \n    save_weights_only=True,\n    period=PERIOD\n'

In [16]:
# Save the weights using the `checkpoint_path` format
autoencoder.save_weights(checkpoint_path.format(epoch=0))

In [17]:
latest = tf.train.latest_checkpoint(checkpoint_dir)
latest

'modelcheckpoints/cp-0000.ckpt'

In [18]:
checkpoint_dir

'modelcheckpoints'

In [19]:
# Load the previously saved weights
autoencoder.load_weights(latest)

<tensorflow.python.checkpoint.checkpoint.CheckpointLoadStatus at 0x7fce2c11f460>

In [None]:
for i in range(len(train_batch_size)):
    autoencoder_train = autoencoder.fit(data_gen_train(train_batch_size[i]),
                                    epochs= epochs,
                                    verbose=1,
                                    steps_per_epoch= nb_train_samples // train_batch_size[i],
                                    validation_data= data_gen_test(test_batch_size[i]),
                                    validation_steps=nb_test_samples // test_batch_size[i],
                                    callbacks=[CustomCallback()])
    plt.subplot(211)
    plt.title('Loss')
    plt.plot(autoencoder_train.history['loss'], label='train')
    plt.plot(autoencoder_train.history['val_loss'], label='test')
    plt.legend()
    plt.show()
    plt.subplot(212)
    plt.title('Mean Squared Error')
    plt.plot(autoencoder_train.history['mean_squared_error'], label='train')
    plt.plot(autoencoder_train.history['val_mean_squared_error'], label='test')
    plt.legend()
    plt.show()
    # list all data in history
    print(autoencoder_train.history.keys())
    # summarize history for accuracy
    plt.plot(autoencoder_train.history['accuracy'])
    plt.plot(autoencoder_train.history['val_accuracy'])
    plt.title('model accuracy')
    plt.ylabel('accuracy')
    plt.xlabel('epoch')
    plt.legend(['train', 'test'], loc='upper left')
    plt.show()
#     # summarize history for loss
#     plt.plot(autoencoder_train.history['loss'])
#     plt.plot(autoencoder_train.history['val_loss'])
#     plt.title('model loss')
#     plt.ylabel('loss')
#     plt.xlabel('epoch')
#     plt.legend(['train', 'test'], loc='upper left')
#     plt.show()

# evaluate the model=
#     _, train_acc = autoencoder.evaluate(data_gen_train(train_batch_size[i]), verbose=0)
#     _, test_acc = autoencoder.evaluate(data_gen_test(test_batch_size[i]), verbose=0)
#     print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))
# plot training history


Epoch 1/20


2024-04-26 18:51:59.871828: I tensorflow/core/common_runtime/executor.cc:1209] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype int32
	 [[{{node Placeholder/_0}}]]
2024-04-26 18:52:01.343678: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:1014] layout failed: INVALID_ARGUMENT: Size of values 0 does not match size of permutation 4 @ fanin shape inmodel/block_1_drop/dropout/SelectV2-2-TransposeNHWCToNCHW-LayoutOptimizer
2024-04-26 18:52:01.701007: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:424] Loaded cuDNN version 8901
