## Artificial Neural Network in Python 

This week we have introduced the deep learning and Artificial Neural Networks. Here is the summary:

Artificial Neural Networks (ANNs) are computational models inspired by the structure and function of the human brain. ANNs consist of interconnected nodes called neurons or units, organized into layers. The three primary types of layers in ANNs are input, hidden, and output layers.

Feedforward Neural Networks (FNNs): FNNs are the simplest type of ANNs where data flows in one direction, from input to output. Each neuron in a layer is connected to every neuron in the subsequent layer.

Activation Functions: Neurons apply activation functions to their input, introducing non-linearity into the network. 

Training ANNs: ANNs are trained using optimization algorithms like gradient descent to minimize a loss or cost function. Backpropagation is a key technique for calculating gradients and updating neuron weights during training. Techniques like dropout, batch normalization, and weight regularization help prevent overfitting in ANNs.

Hyperparameter Tuning: Adjusting hyperparameters like the number of layers, neurons, learning rate, and batch size is crucial for optimizing ANN performance.

### MINST data in tensorflow

In [27]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.utils import to_categorical

# Load the MNIST dataset and split it into training and testing sets
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

In [28]:
train_images = train_images.astype('float32')/255
test_images = test_images.astype('float32')/255

# Flatten the images to a 1D array
train_images = train_images.reshape(60000, 28, 28, 1)
test_images = test_images.reshape(10000, 28, 28, 1)

In [5]:
train_images[0].shape

(28, 28, 1)

In [29]:
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

In [14]:
model = Sequential([
    Flatten(input_shape = (28, 28, 1)),
    Dense(64, activation = 'relu'),
    Dense(10, activation = 'softmax')
])

The Flatten layer transforms the 2D input images into a 1D array.
The Dense layers are fully connected layers responsible for learning patterns and making predictions.

The activation functions (ReLU and softmax) introduce non-linearity into the network, allowing it to learn complex relationships in the data.

Other options for the activation function: https://www.tensorflow.org/api_docs/python/tf/keras/activations

In [15]:
# model.compile(optimizer ='adam',
#              loss = 'categorical_crossentropy',
#              metrics = 'accuracy')

opt = keras.optimizers.Adam(learning_rate=0.01)
model.compile(optimizer = opt,
             loss = 'categorical_crossentropy',
             metrics = 'accuracy')

Other optimazation algorithm options: https://www.tensorflow.org/api_docs/python/tf/keras/optimizers
If it is regression, both loss and metrics can be "mean_squared_error". Other options: 
https://www.tensorflow.org/api_docs/python/tf/keras/losses
https://www.tensorflow.org/api_docs/python/tf/keras/metrics

In [16]:
model.fit(train_images, train_labels, epochs=5,
         batch_size=64, validation_split=0.2)  # verbose=-1 to not show details of each epoch

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x24a0c758460>

In [17]:
test_loss, test_accuracy = model.evaluate(test_images, test_labels)
print(f"The test accuracy: {test_accuracy*100: .2f}%")

The test accuracy:  93.61%


In [30]:
# Try by yourself: Fit another neural network with 2 layers. One with 64 nodes 
# and use relu as the activation function while the other one has 32 nodes use 
# tanh as the activation function. Choose another optimizer and train the model 
# then print the accuracy.  

model = Sequential([
    Flatten(input_shape = (28, 28, 1)),
    Dense(64, activation = 'relu'), # Dense layers are ones that receive information from all the previous nodes, i.e.
    Dense(32, activation = 'tanh'), # fully connected NN like the Sequential model
    Dense(10, activation = 'softmax')
])

model.compile(optimizer ='adagrad',
             loss = 'categorical_crossentropy',
             metrics = 'accuracy')

model.fit(train_images, train_labels, epochs=5,
         batch_size=32, validation_split=0.2)

test_loss, test_accuracy = model.evaluate(test_images, test_labels)
print(f"The test accuracy: {test_accuracy*100: .2f}%")

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
The test accuracy:  89.03%


### Other options in the model

In [10]:
# Layer that normalizes its inputs

from tensorflow.keras.layers import BatchNormalization

model = Sequential([
    Flatten(input_shape=(28,28,1)), # why 3d array?
    Dense(64, activation='relu'),
    Dropout(0.2), 
    Dense(10, activation='softmax')
])

In [31]:
# Feature selection at hidden layer using lasso

from tensorflow.keras.regularizers import l2

model = Sequential([
    Flatten(input_shape=(28,28,1)),
    Dense(64, activation='relu', kernel_regularizer=l2(0.01)),
    Dense(10, activation='softmax')
])

### Tune the parameter in the neural network

In [33]:
import numpy as np
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

In [34]:
# tune the number of layers and nodes

def create_model(layers, nodes):
    model = Sequential()
    model.add(Flatten(input_shape=(28,28,1)))
    
    for _ in range(layers):
        model.add(Dense(nodes, activation='relu'))
    
    model.add(Dense(10, activation='softmax'))
    
    model.compile(optimizer='adam',
                 loss='categorical_crossentropy',
                 metrics='accuracy')
    return model

In [37]:
model = KerasClassifier(build_fn=create_model)
param_grid = {
    'layers': [1,2,3],
    'nodes': [32, 64, 128]
}

grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=3, n_jobs=-1)
grid_result = grid_search.fit(train_images, train_labels)
print(f"Best accuracy: {grid_result.best_score_: .4f} using {grid_result.best_params_}")

  model = KerasClassifier(build_fn=create_model)


Best accuracy:  0.9499 using {'layers': 3, 'nodes': 128}


In [39]:
from tensorflow.keras.optimizers import Adam, SGD, RMSprop

def create_model(opt, batch_size=32, epochs=5):
    model = Sequential()
    model.add(Flatten(input_shape=(28,28,1)))
    model.add(Dense(128, activation='relu'))
    model.add(Dense(128, activation='relu'))
    model.add(Dense(128, activation='relu'))
    model.add(Dense(10, activation='softmax'))

    model.compile(optimizer=opt,
                 loss='categorical_crossentropy',
                 metrics='accuracy')
    return model

In [40]:
model = KerasClassifier(build_fn=create_model)
param_grid = {
    'opt': ['adam', 'sgd', 'rmsprop'],
    'batch_size': [32, 64],
    'epochs': [5, 10]
}

grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=3, n_jobs=-1)
grid_result = grid_search.fit(train_images, train_labels)
print(f"Best accuracy: {grid_result.best_score_: .4f} using {grid_result.best_params_}")

  model = KerasClassifier(build_fn=create_model)


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Best accuracy:  0.9718 using {'batch_size': 32, 'epochs': 10, 'opt': 'rmsprop'}


#### Try another data: email data

In [59]:
# Load the email data. Try to fit a neural network to classify the spam. You can
#start with a neural network with one hidden layer, and then tune the parameters. 

import pandas as pd

df = pd.read_csv("email.csv")

X = df.drop(["spam", "time"], axis = 1)
y = df['spam']

X = pd.get_dummies(X, drop_first=True)
print(X.shape)

(3921, 20)


In [43]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=4400)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.fit_transform(X_test)

y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

In [46]:
model = Sequential([
    Flatten(input_shape= (20, 1)),
    Dense(16, activation='relu'),
    Dense(8, activation='relu'),
    Dense(2, activation='sigmoid')
])

model.compile(optimizer='adam',
             loss='binary_crossentropy',
             metrics='accuracy')

model.fit(X_train_scaled, y_train, epochs=5, batch_size=32, validation_split=0.2)
test_loss, test_accuracy = model.evaluate(X_test_scaled, y_test)
print(test_accuracy)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
0.9070063829421997


In [49]:
# Define a function to create the model and tune # of layers, nodes, and optimizers

def create_model(layers, nodes, opt):
    model = Sequential()
    model.add(Flatten(input_shape=(20,1)))
    
    for _ in range(layers):
        model.add(Dense(nodes, activation='relu'))
        
    model.add(Dense(2, activation='sigmoid'))
    model.compile(optimizer=opt,
                 loss='binary_crossentropy',
                 metrics='accuracy')
    return model

In [50]:
model = KerasClassifier(build_fn=create_model)
param_grid = {
    'layers': [1,2,3],
    'nodes': [32, 64, 128],
    'opt': ['adam', 'sgd', 'rmsprop']
}

grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=3, n_jobs=-1)
grid_result = grid_search.fit(X_train_scaled, y_train)
print(f"Best accuracy: {grid_result.best_score_: .4f} using {grid_result.best_params_}")

  model = KerasClassifier(build_fn=create_model)


Best accuracy:  0.9069 using {'layers': 1, 'nodes': 64, 'opt': 'rmsprop'}


### Simulation

In [86]:
# Build a simulation to show the performances between data size and the MSE in a Neural Network
# Use make_regression to make your data
# Set up a simple NN with one hidden layer, use 'linear' for activation function
# Try to set up a large number of features, like 50



In [58]:
from sklearn.datasets import make_regression

np.random.seed(0)

def generate(data_size):
    X, y = make_regression(n_samples=data_size, n_features=50, noise=0.1)
    return X, y

(50,)

In [57]:
def train_and_evaluate(X, y):
    # split into the training and test data, and pre-process the data
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=4400)
    scaler = StandardScaler()
    
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.fit_transform(X_test)
    
    # create a simple NN (one hidden layer, input_shape=(50, 1) -- might have to reshape, check X.shape)
    # activation = 'linear', output layer has one node, also with 'linear' activation
    
    model = Sequential([
        Flatten(input_shape=(50,1)),
        Dense(32, activation='linear'),
        Dense(1, activation='linear')
    ])
    # model compile, model fit, model evaluate
    
    return mse

In [None]:
data_sizes = [500, 1000, 2000, 4000, 8000, 16000, 32000, 50000]
mse = []

# for data_size
# generate data
# train and evaluate
# record the mse

# make the plot of data_size vs. mse