## Artificial Neural Networks

#### Table of Contents <a name='top'></a>

- [Load Modules and Set Notebook Properties](#modules)
- [Define Path and Load Data](#load)
- [Inspect Data](#inspect)
- [Prepare](#prepare)
- [Scale Values](#scale)
- [Create Different ANN Models](#create)
- [Find the Best Model](#evaluate)
- [Evaluate and Choose Models](#evaluate)
- [Predict](#predict)
- [Prepare Submission](#submit)

[go to end](#end)

#### Load Modules and Set Notebook Properties <a name='modules'></a>

In [25]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
import sys
import os

In [26]:
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, StandardScaler, MinMaxScaler, Normalizer

In [27]:
from keras.models import Sequential
from keras.layers import Dense

In [28]:
warnings.filterwarnings('ignore')
pd.options.display.max_columns = None
sns.set_style("darkgrid")

#### Define Path and Load Data  <a name='load'></a> 

In [29]:
INPUT_PATH = 'raw_data_source'
OUTPUT_PATH = 'outputs'

In [30]:
training_set_features = pd.read_csv(os.path.join(INPUT_PATH, 'covid_training_set_features.csv'))
training_set_labels = pd.read_csv(os.path.join(INPUT_PATH, 'covid_training_set_labels.csv'))
test_set_features = pd.read_csv(os.path.join(INPUT_PATH, 'covid_test_set_features.csv'))

#### Inspect Data <a name='inspect'></a> 

##### Background <a name="databackground"></a>

In this exercise, we will take a look at vaccination, a key public health measure used to fight infectious diseases. Vaccines provide immunization for individuals, and enough immunization in a community can further reduce the spread of diseases through "herd immunity".

A phone survey asked respondents whether they had received the H1N1 and seasonal flu vaccines, in conjunction with questions about themselves. These additional questions covered their social, economic, and demographic background, opinions on risks of illness and vaccine effectiveness, and behaviors towards mitigating transmission. A better understanding of how these characteristics are associated with personal vaccination patterns can provide guidance for future public health efforts.

The goal is to predict how likely individuals are to receive their H1N1 and seasonal flu vaccines. Specifically, we will be predicting two probabilities: one for h1n1_vaccine and one for seasonal_vaccine. Each row in the dataset represents one person who responded to the National 2009 H1N1 Flu Survey.

The dataset is taken from the competetion page in [DrivenData](https://www.drivendata.org/competitions/66/flu-shot-learning/page/210/).

#### Prepare Data <a name="dataprep"></a>

[back to top](#toc)

In [31]:
def process_features(df):
    
    cols_to_process =  ['h1n1_concern', 'h1n1_knowledge',
                        'opinion_h1n1_vacc_effective', 'opinion_h1n1_risk',
                        'opinion_h1n1_sick_from_vacc', 'opinion_seas_vacc_effective',
                        'opinion_seas_risk', 'opinion_seas_sick_from_vacc', 'age_group',
                        'education', 'race', 'sex', 'income_poverty', 'marital_status',
                        'rent_or_own', 'employment_status', 'hhs_geo_region', 'census_msa',
                        'household_adults', 'household_children', 'employment_industry',
                        'employment_occupation']
    
    for i in cols_to_process:
        df[i] = [f'{i}_' + str(x)  for x in df[i]]
        
    concat_list = []
    for i in cols_to_process:
        concat_list.append(pd.get_dummies(df[i]))
        
    one_hot_encoded = pd.concat(concat_list, axis=1)
    df = df.drop(columns=cols_to_process)
    df_concatenated = pd.concat([df, one_hot_encoded], axis=1)
        
    return df_concatenated

In [32]:
X = process_features(training_set_features).iloc[:,1:].fillna(0)
X_test = process_features(test_set_features).iloc[:,1:].fillna(0)
y_h1n1 = training_set_labels['h1n1_vaccine']
y_seasonal = training_set_labels['seasonal_vaccine']

In [33]:
X.shape, X_test.shape

((26707, 157), (26708, 157))

#### Scale Values <a name='scale'></a> 


Insert explanation on why the fitting of the scaler should only be done on the training set. 

In [34]:
def scale_values(X_train, X_test, scaler='standard'):
    
    scaler_dict = {'standard': StandardScaler(), 
                    'minmax': MinMaxScaler(), 
                    'normal': Normalizer()}
    if scaler is None:
        return X_train, X_test
    elif scaler not in scaler_dict.keys():
        raise ValueError("Enter a valid value for scaler! Choose between 'standard', 'minmax', 'normal' or None.")
    else:
        scl = scaler_dict[scaler]
        X_train = scl.fit_transform(X_train)
        X_test = scl.transform(X_test) 
        return X_train, X_test

#### Create Different ANN Models <a name='create'></a> 

[back to top](#toc)

In [35]:
def simple_ann(X_train, y_train, epochs, batch_size, verbose):
    
    input_dim = X_train.shape[1]
    model = Sequential()
    model.add(Dense(25, input_dim=input_dim, activation='relu'))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, verbose=verbose)
    
    return model

In [36]:
def deeper_ann(X_train, y_train, epochs, batch_size, verbose):
    
    input_dim = X_train.shape[1]
    model = Sequential()
    model.add(Dense(25, input_dim=input_dim, activation='relu'))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, verbose=verbose)
    
    return model

In [37]:
def wider_ann(X_train, y_train, epochs, batch_size, verbose):
    
    input_dim = X_train.shape[1]
    model = Sequential()
    model.add(Dense(100, input_dim=input_dim, activation='relu'))
    model.add(Dense(50, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, verbose=verbose)
    
    return model

In [38]:
def wider_and_deeper_ann(X_train, y_train, epochs, batch_size, verbose):
    
    input_dim = X_train.shape[1]
    model = Sequential()
    model.add(Dense(100, input_dim=input_dim, activation='relu'))
    model.add(Dense(50, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, verbose=verbose)
    
    return model

In [39]:
def model_wrapper(X, y, model, test_size=0.3, scaler=None, epochs=5, batch_size=10, verbose=1):
    
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=109) # split
    X_train_, X_test_ = scale_values(X_train, X_test, scaler=scaler)  # scale
    trained_model_h1n1 = model(X_train_, y_train, epochs, batch_size, verbose)  # train
    probability = trained_model_h1n1.predict_proba(X_test_)  # predict and get probability score
    
    return roc_auc_score(y_test, probability.flatten())

#### Find the Best Model <a name='find'></a> 

[back to top](#toc)

In [40]:
models = [simple_ann, deeper_ann, wider_ann, wider_and_deeper_ann]
y_values = [y_h1n1, y_seasonal]
results_dict = {}

for y in y_values:
    for model in models:
        auc_score = model_wrapper(X, y, model, test_size=0.3, scaler='standard', epochs=10)
        results_dict[f'{y.name} {model.__name__}'] = auc_score

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


#### Evaluate and Choose Models <a name='evaluate'></a> 

In [41]:
result_df = pd.DataFrame.from_dict(results_dict, orient='index').reset_index()
result_df.columns = ['index', 'auc_score']
result_df['variable'] = [x.split(' ')[0] for x in result_df['index']]
result_df['model'] = [x.split(' ')[1] for x in result_df['index']]
result_df[['variable', 'model', 'auc_score']].pivot(index='variable', columns='model', values='auc_score')

model,deeper_ann,simple_ann,wider_and_deeper_ann,wider_ann
variable,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
h1n1_vaccine,0.851871,0.846405,0.817694,0.807288
seasonal_vaccine,0.845254,0.847928,0.818551,0.810964


- For `h1n1_vaccine` use `deeper_ann`
- For `seasonal_vaccine` use `simple_ann`

#### Predict <a name='predict'></a> 

In [54]:
X_train_, X_test_ = scale_values(X, X_test, scaler=None)
seasonal_model = deeper_ann(X_train_, y_seasonal, epochs=10, batch_size=10, verbose=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [55]:
probability_seasonal = seasonal_model.predict(X_test).flatten()

In [56]:
X_train_, X_test_ = scale_values(X, X_test, scaler=None)
h1n1_model = deeper_ann(X_train_, y_h1n1, epochs=10, batch_size=10, verbose=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [57]:
probability_h1n1 = h1n1_model.predict(X_test).flatten()

In [60]:
probability_h1n1

array([0.08688167, 0.04430285, 0.06147888, ..., 0.09231937, 0.01936588,
       0.73324114], dtype=float32)

#### Prepare Submission <a name='submit'></a> 

In [58]:
submission = pd.DataFrame()
submission['respondent_id'] = test_set_features['respondent_id']
submission['h1n1_vaccine'] = probability_h1n1
submission['seasonal_vaccine'] = probability_seasonal

In [59]:
outpath = os.path.join(OUTPUT_PATH, 'sub3.csv')
submission.to_csv(outpath, index=False)

--end--
<a name="bottom"></a>