# HSMA Exercise

The data loaded in this exercise is for seven acute stroke units, and whether a patient receives clost-busting treatment for stroke.  There are lots of features, and a description of the features can be found in the file stroke_data_feature_descriptions.csv.

Train a Neural Network model to try to predict whether or not a stroke patient receives clot-busting treatment.  Use the prompts below to write each section of code.

How accurate can you get your model on the test set?

In [None]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# sklearn for pre-processing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

# TensorFlow sequential model
from tensorflow import keras
from tensorflow.keras import backend as K
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Dropout
from tensorflow.keras.optimizers import Adam

# Download data 
# (not required if running locally and have previously downloaded data)

download_required = True

if download_required:
    
    # Download processed data:
    address = 'https://raw.githubusercontent.com/MichaelAllen1966/' + \
                '2004_titanic/master/jupyter_notebooks/data/hsma_stroke.csv'        
    data = pd.read_csv(address)

    # Create a data subfolder if one does not already exist
    import os
    data_directory ='./data/'
    if not os.path.exists(data_directory):
        os.makedirs(data_directory)

    # Save data to data subfolder
    data.to_csv(data_directory + 'hsma_stroke.csv', index=False)

Look at an overview of the data using the `describe()` method of Pandas.

Convert all of the data in the dataframe to type `float`, pull out the X (feature) and y (label) data, and put X and y into Numpy arrays.

Define a function that will MinMax Normalise training and test feature data passed into it.

Define a function that will build a sequential neural network, given a number of features, a number of hidden layers (with a default of 5), a number of neurons per hidden layer (with a default of 64), a dropout rate (with a default of 0), and a learning rate (with a default of 0.003).  The function should also create a single neuron output layer with a Sigmoid activation function, use an Adam optimiser, and a Binary Crossentropy loss function, with accuracy as the performance metric.

Split your data into training and test sets.  Decide on an appropriate test data size.  Then scale the feature data using MinMax Normalisation.

Write a function to calculate accuracy of the model on both training and test sets.

Write a function to plot training and test set accuracy over time during model fitting.

Create a neural network with a number of hidden layers, neurons, dropout rate and learning rate of your choosing.  Run the model for a number of epochs and with a batch size of your choosing (be careful about using large batch sizes unless you've got a CUDA-enabled GPU and TensorFlow is set up to use it).

Calculate training and test set accuracy of your model.

Plot training and test set accuracy over time during fitting.

Save this baseline version of your model, so you can access it again later if you need it.

Now try different things in your model to improve test accuracy.  You might consider :
- Reducing overfitting if overfitting is a problem.
- Changing the number of hidden layers
- Changing the number of hidden neurons
- Changing batch size
- Changing dropout rate
- Changing the learning rate
- Changing the train / test split
- Trying stratified k-fold validation
- Dropping features

or more!

Tip : keep your analysis above as your base case.  Then below, just use the functions you've built to rebuild and retrain models with different parameters (or run altered versions of other cells below).  Don't forget, you need to build and train again before you get new outputs.

Add comments to your code to explain what you've changed, and change things a bit at a time (don't change everything all at once!)