# Basic ML: Phonocardiograms

Author: Jake Dumbauld <br>
Contact: jacobmilodumbauld@gmail.com<br>
Date: 3.15.22

## Intro

Initial modelling here - copied same model from above </br>
Found that it was not overfitting at all, began to trim some regularization and removed the callbacks. </br>
Also removed the topmost input layer of 1024 </br>
Quickly fell into overfitting, with val loss increasing after only a 2 epochs. Increased regularization adding dropout of 0.4 back to each layer </br>
Began underfitting here, so I trimmed dropout to 0.2 and removed the learning rate schedule </br> 
Quickly fell into an overfitting regime again here. </br>


Out of curiousity, I wanted to try SGD to see if I could get more robust performance on my validation & test sets.


It was around this point that I felt that I needed a more scientific approach to model evaluation and tuning. Starting from a fresh notebook.

## Imports

In [1]:
#imports

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import time
import re

from sklearn.model_selection import train_test_split

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten, Dense, Dropout, BatchNormalization
from tensorflow.keras import regularizers
import keras_tuner as kt

From the Keras FAQ: https://keras.io/getting_started/faq/#how-can-i-obtain-reproducible-results-using-keras-during-development
- Trying to obtain reproducible results. Best I can tell scikitlearn also uses np.random() seed

In [2]:
import random as python_random

np.random.seed(42)

# The below is necessary for starting core Python generated random numbers in a well-defined state.
python_random.seed(42)

# The below set_seed() will make random number generation
# in the TensorFlow backend have a well-defined initial state.
# For further details, see:
# https://www.tensorflow.org/api_docs/python/tf/random/set_seed
tf.random.set_seed(42)

#not sure if the below are necessary - leaving in to perhaps un-comment later.
%env PYTHONHASHSEED=0
%env CUDA_VISIBLE_DEVICES=""

env: PYTHONHASHSEED=0
env: CUDA_VISIBLE_DEVICES=""


## Helper Functions

Defining a Helper Function for Plotting Model Loss

In [3]:
def graph_model_loss(title, history):
    """
    Description:
    Graphs training vs validation loss over epochs for a given model. 
    
    History: tensorflow.python.keras.callbacks.History object
    Title: str
    """ 
    plt.figure(figsize=(12,8))
    plt.plot(history.history['loss'])
    plt.plot(history.history['val_loss'])
    plt.title(title,size=24)
    plt.ylabel('Loss',size=16)
    plt.xlabel('Epoch',size=16)
    plt.legend(['Train', 'Validation'])
    plt.show()

Defining a helper function to evaluate train & test accuracies

In [4]:
def evaluate_model(model, history):
    """
    Description:
    Outputs model train & test accuracies for currently defined train and test set variables.
    
    model: tensorflow model,
    history: tensorflow.python.keras.callbacks.History object
    """
    # Evaluate the network
    train_accuracy = history.history["binary_accuracy"][-1]
    result = model.evaluate(X_test,y_test, verbose=1)

    print(f"Train Accuracy: {np.round(train_accuracy, 6)*100}%")
    print(f"Test Accuracy: {np.round(result[1], 6)*100}%")

## Defining A Search Space

In [5]:
def build_model(hp):
    model = keras.Sequential()
    #flattening input
    model.add(Flatten())
    
    for i in range(hp.Int('layers', 2, 4)):
        model.add(
            Dense(
            #Tuning the number of units in my input layer.
            units=hp.Int("units" + str(i), min_value=32, max_value=1024, step=64),
            kernel_regularizer=regularizers.l2(0.001),
            activation="relu"
            )
        )
        #Tuning whether or not to use dropout.
        if hp.Boolean("dropout" + str(i)):
            model.add(Dropout(rate=0.25))

        #Adding batch normalization
        if hp.Boolean("normalization" + str(i)):
            model.add(BatchNormalization())

    #output layer
    model.add(Dense(1, activation="sigmoid"))
    
    #defining learning rate
    lr_schedule = keras.optimizers.schedules.InverseTimeDecay(
                      #tuning initial learning rate
                      initial_learning_rate=hp.Float("starting_learning_rate", min_value=1e-4, max_value=1e-2, sampling="log"),
                      decay_steps=1.0,
                      decay_rate=0.1
                  )
    model.compile(
        #Optimizer
        optimizer = keras.optimizers.Adam(learning_rate=lr_schedule),
        #Loss
        loss=keras.losses.BinaryCrossentropy(),
        #Metrics
        metrics=[keras.metrics.BinaryAccuracy()]
    )
    return model

build_model(kt.HyperParameters())

2022-04-01 14:19:05.498776: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


<tensorflow.python.keras.engine.sequential.Sequential at 0x7fd1deb15c40>

### Raw Signal Data

In [6]:
raw = np.load('/Users/jmd/Documents/BOOTCAMP/Capstone/arrays/signal_murmur_presimple_4k.npy', allow_pickle=True)

In [7]:
y = raw[:,0] #murmurs are just the first column
X = raw[:,1:]

In [8]:
#train test split
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size = 0.3)

In [9]:
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, stratify=y_train, test_size=0.3)

In [10]:
es_callback = keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)

In [11]:
tuner = kt.BayesianOptimization(
    hypermodel=build_model,
    objective="val_loss",
    max_trials=50,
    seed=42,
    overwrite=True,
    directory='/Users/jmd/Documents/BOOTCAMP/Capstone/kerastune_searches',
    project_name='sequential_4k_signal_no_patient'
)

tuner.search(X_train, y_train, epochs=100, validation_data=(X_val,y_val), callbacks=[es_callback])

Trial 5 Complete [00h 00m 47s]
val_loss: 0.7672173976898193

Best val_loss So Far: 0.6558860540390015
Total elapsed time: 00h 03m 59s

Search: Running Trial #6

Hyperparameter    |Value             |Best Value So Far 
layers            |2                 |2                 
units0            |992               |32                
dropout0          |True              |True              
normalization0    |False             |False             
units1            |608               |992               
dropout1          |False             |False             
normalization1    |False             |False             
starting_learni...|0.001036          |0.00082523        
units2            |32                |32                
dropout2          |False             |False             
normalization2    |True              |True              

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 1

KeyboardInterrupt: 

In [None]:
tuner.results_summary(num_trials=1)

In [None]:
es_callback = keras.callbacks.EarlyStopping(monitor='val_loss', patience=10)

In [None]:
# Get the best hyperparameters.
best_hps = tuner.get_best_hyperparameters()
# Build the model with the best hp.
model = build_model(best_hps[0])

history = model.fit(X_train, y_train, epochs=100, validation_data=(X_val,y_val), callbacks=[es_callback])

In [None]:
model.summary()

In [None]:
evaluate_model(model, history)

In [None]:
graph_model_loss('Raw Signal Data, No Patient Information', history)

In [None]:
#saving model
model.save('/Users/jmd/Documents/BOOTCAMP/Capstone/neural_nets/sequential_4k_signal_no_patient', overwrite=True)