# Data Programming in Python | BAIS:6040 
# Advanced Data Analytics: Deep Learning

Instructor: Jeff Hendricks 

Topics to be covered:
- Classification with Multi-Level Perceptron in Scikit-Learn
- Classification using Keras & TensorFlow

References: 
- Documentation scikit-learn (http://scikit-learn.org/stable/documentation.html)
- Introduction to Machine Learning with Python (http://shop.oreilly.com/product/0636920030515.do)
- Python Data Science Handbook by Jake VanderPlas (http://shop.oreilly.com/product/0636920034919.do)
- Python for Data Analysis by Wes McKinney (https://www.oreilly.com/library/view/python-for-data/9781491957653/)
- Documentation for TensorFlow (https://www.tensorflow.org/)

## Prerequisites

In [None]:
#!pip install keras

In [None]:
#!pip install tensorflow

## Importing Modules

In [None]:
import pandas as pd                                       # dataframes
from seaborn import load_dataset                          # Titanic dataset 
from sklearn.model_selection import train_test_split      # train/test data
from sklearn.neural_network import MLPClassifier          # neural networks  
import warnings

warnings.filterwarnings('ignore')

## Loading the Dataset into a Pandas Dataframe

In [None]:
df = load_dataset("titanic")

In [None]:
df.shape

In [None]:
df.info()

In [None]:
df.head()

## Filtering Out Unnecessary Data

In [None]:
df = df[["survived", "pclass", "sex", "age", "sibsp", "parch", "fare"]]

In [None]:
df.info()

## Converting Categorical Columns into Numerical Columns

As most machine learning libraries will only accept numbers as input, every categorical column in a dataset must be replaced with a numerical column. 

In [None]:
df.sex.head()

In [None]:
df.sex = pd.Categorical(df.sex)   # Step 1: declare the column is categorical 

pandas.Categorical: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Categorical.html

In [None]:
df.sex = df.sex.cat.codes         # Step 2: convert each category to its corresponding code

pandas.Series.cat.codes: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.cat.codes.html

In [None]:
df.sex.head()

In [None]:
df.info()

## Handling Missing Data

As with categorical variables, most machine learning libraries will not accept null values as input. Every null value in a dataset must be removed or replaced with a numerical value. 

In [None]:
df[df.isnull().any(axis=1)]

In [None]:
df = df.dropna()        # Drop all rows with any missing values

## Preparing Data for Modeling

In [None]:
features = list(df.columns)
features.remove('survived')

target = "survived"

X = df[features]
y = df[target]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)

## Modeling with Neural Networks

In [None]:
from sklearn.neural_network import MLPClassifier

mlp = MLPClassifier(hidden_layer_sizes=(100, ), random_state=0) # Build a new neural network model 

class sklearn.neural_network.MLPClassifier(`hidden_layer_sizes`=(100, ), `activation`=’relu’, `solver`=’adam’, `alpha`=0.0001, `batch_size`=’auto’, `learning_rate`=’constant’, `learning_rate_init`=0.001, `power_t`=0.5, `max_iter`=200, `shuffle`=True, `random_state`=None, `tol`=0.0001, `verbose`=False, `warm_start`=False, `momentum`=0.9, `nesterovs_momentum`=True, `early_stopping`=False, `validation_fraction`=0.1, `beta_1`=0.9, `beta_2`=0.999, `epsilon`=1e-08, `n_iter_no_change`=10)

sklearn.neural_network.MLPClassifier: https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html

In [None]:
mlp.fit(X_train, y_train)

In [None]:
mlp.score(X_train, y_train)

In [None]:
mlp.score(X_test, y_test)

In [None]:
person1 = {"pclass": 3, 
           "sex": 1,
           "age": 25,
           "sibsp": 0,
           "parch": 0,
           "fare": 7}

person2 = {"pclass": 1,
           "sex": 0,
           "age": 8,
           "sibsp": 1,
           "parch": 2,
           "fare": 40}

person3 = {"pclass": 2,
           "sex": 0,
           "age": 20,
           "sibsp": 0,
           "parch": 0,
           "fare": 15}

In [None]:
X_new = []                                    # X_new contains new data items 
for person in [person1, person2, person3]:
    new_person = [person["pclass"], person["sex"], person["age"], person["sibsp"], person["parch"], person["fare"]]
    X_new.append(new_person)

In [None]:
mlp.predict(X_new)

# Deep Learning with Keras & TensorFlow

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition by Aurélien Géron

https://learning.oreilly.com/library/view/hands-on-machine-learning/9781492032632/

### Backpropagation Basics

It handles one mini-batch at a time (for example, containing 32 instances each), and it goes through the full training set multiple times. Each pass is called an epoch.

- epoch is a pass through the entire training set
- batch_size is the number of observations or instances in a batch

![image.png](attachment:image.png)

Let's set the same goal as the classification example to build a classification model using the Titanic dataset that is able to predict whether an imaginery passenger who has a certain class, sex, age, and fare would have survived the accident or not. This is a binary classification problem. 

In [None]:
import pandas as pd
import numpy as np
from seaborn import load_dataset 
from sklearn.model_selection import train_test_split

df = load_dataset("titanic")
df = df[["survived", "pclass", "sex", "age", "sibsp", "parch", "fare"]]

df.sex = pd.Categorical(df.sex)
df.sex = df.sex.cat.codes

df = df.dropna()

df.head()

In [None]:
features = ["pclass", "sex", "age", "sibsp", "parch", "fare"]
target = "survived"

X = df[features]
y = np.ravel(df[target])

Randomly split the dataset into 80% of train data and 20% of test data. Then further split the train data into 80% of train data and 20% of validation data. 

In [None]:
X_trainval, X_test, y_trainval, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# split train+validation set into training and validation sets
X_train, X_valid, y_train, y_valid = train_test_split(X_trainval, y_trainval, test_size=0.2, random_state=1)

### Using Keras with a TensorFlow backend

- units: Positive integer, dimensionality of the output space.
- activation: Activation function to use. If you don't specify anything, no activation is applied

https://keras.io/guides/sequential_model/

https://keras.io/api/layers/core_layers/dense/

In [None]:
import tensorflow as tf
# from keras.models import Sequential
# from keras.layers import Dense

from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense

# using Sequential and Dense from the keras API library

model = Sequential()

model.add(Dense(units=8, activation='relu', input_shape=(6,)))

model.add(Dense(units=8, activation='relu'))

model.add(Dense(units=8, activation='relu'))

model.add(Dense(units=8, activation='relu'))

model.add(Dense(units=1, activation='sigmoid'))


model.compile(loss='binary_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])

# using the training and validation dataframes
model.fit(X_train, y_train,epochs=5, batch_size=1, verbose=1, validation_data=(X_valid, y_valid))

In [None]:
model.evaluate(X_test, y_test, verbose=1)

In [None]:
# turn X_new into a dataframe
X_new = pd.DataFrame(data=X_new, columns = X_train.columns)

In [None]:
model.predict(X_new)

In [None]:
(model.predict(X_test[:3]) > 0.5).astype("int32")

### Create TensorFlow Dataset from Pandas DF

Creates a Dataset whose elements are slices of the given tensors.

The given tensors are sliced along their first dimension. This operation preserves the structure of the input tensors, removing the first dimension of each tensor and using it as the dataset dimension. All input tensors must have the same size in their first dimensions.

https://www.tensorflow.org/api_docs/python/tf/data/Dataset

In [None]:
import pandas as pd
import numpy as np
import tensorflow as tf
from seaborn import load_dataset
from sklearn.model_selection import train_test_split

df = load_dataset("titanic")
df = df[["survived", "pclass", "sex", "age", "sibsp", "parch", "fare"]]

df.sex = pd.Categorical(df.sex)
df.sex = df.sex.cat.codes

df = df.dropna()

features = ["pclass", "sex", "age", "sibsp", "parch", "fare"]
target = "survived"

X = df[features]
y = df[target]

X_trainval, X_test, y_trainval, y_test = train_test_split(X, y, test_size=0.25, random_state=0)

# split train+validation set into training and validation sets
X_train, X_valid, y_train, y_valid = train_test_split(X_trainval, y_trainval, random_state=1)

In [None]:
person1 = {"pclass": 3, 
           "sex": 1,
           "age": 25,
           "sibsp": 0,
           "parch": 0,
           "fare": 7}

person2 = {"pclass": 1,
           "sex": 0,
           "age": 8,
           "sibsp": 1,
           "parch": 2,
           "fare": 40}

person3 = {"pclass": 2,
           "sex": 0,
           "age": 20,
           "sibsp": 0,
           "parch": 0,
           "fare": 15}


X_new = [] 
for person in [person1, person2, person3]:
    new_person = [person["pclass"], person["sex"], person["age"], person["sibsp"], person["parch"], person["fare"]]
    X_new.append(new_person)
    
# turn X_new into a dataframe
X_new = pd.DataFrame(data=X_new, columns = X_train.columns)

In [None]:
train_dataset = tf.data.Dataset.from_tensor_slices((X_train.values, y_train))
valid_dataset = tf.data.Dataset.from_tensor_slices((X_valid.values, y_valid))
test_dataset = tf.data.Dataset.from_tensor_slices((X_test.values, y_test))

x_new_dataset = tf.data.Dataset.from_tensor_slices((X_new.values, [0,0,0]))

In [None]:
for feat, targ in test_dataset.take(5):
  print ('Features: {}, Target: {}'.format(feat, targ))

In [None]:
# Shuffle and batch the datasets

batch_size = 1

train_dataset = train_dataset.shuffle(len(X_train)).batch(batch_size)
valid_dataset = valid_dataset.shuffle(len(X_valid)).batch(batch_size)
test_dataset = test_dataset.shuffle(len(X_test)).batch(batch_size)

x_new_dataset = x_new_dataset.batch(batch_size)

### Tensorflow Keras

Implementation of the Keras API meant to be a high-level API for TensorFlow.

https://www.tensorflow.org/api_docs/python/tf/keras

### Training a tensorflow.keras Sequential Model

- units	= dimensionality of the output space.
- activation = activation function to use. If you don't specify anything, no activation is applied

https://www.tensorflow.org/api_docs/python/tf/keras/Sequential

https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense

The Sequential API groups a linear stack of layers into a tf.keras.Model.

A Sequential model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor.

https://www.tensorflow.org/guide/keras/sequential_model

In [None]:
import tensorflow as tf

'''
# you could set up imports like this if you didn't already have Keras Sequential and Dense being imported

from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense

model = Sequential([Dense(8, activation='relu')
                   ,Dense(8, activation='relu')
                   ,Dense(1, activation='sigmoid')])'''

model = tf.keras.Sequential([
    tf.keras.layers.Dense(units=8, activation='relu'),
    tf.keras.layers.Dense(units=10, activation='relu'),
    tf.keras.layers.Dense(units=10, activation='relu'),
    tf.keras.layers.Dense(units=10, activation='relu'),
    tf.keras.layers.Dense(units=1, activation ='sigmoid')
  ])


model.compile(loss='binary_crossentropy',
              optimizer='sgd',
              #optimizer='adam',
              metrics=['accuracy'])
                   
model.fit(train_dataset,epochs=5, validation_data= valid_dataset)

In [None]:
model.evaluate(x=test_dataset)

In [None]:
model.predict(x_new_dataset)