<a href="https://colab.research.google.com/github/mlaiclass/homework/blob/main/Homework_3_Deep_Learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**In this homework you will implement a 2-layer neural network using any Deep Learning 
Framework (e.g., TensorFlow, PyTorch etc.). Upload a .txt file with a link to your file as your 
submission on Submitty.
You need to perform the following tasks for this homework:
In your project, you will pick a dataset (not the same as in the previous homeworks) and 
describe the problem you would like to solve. Include a link to the dataset source. It is highly 
recommended that you pick a dataset with at least 10,000 (or more observations). There are 
many ways of describing a big dataset and one way to describe it is – a big dataset is more 
complex. Complexity can refer to the number of observations, features, or the type of data. For 
this project, there is no restriction to the number of features your dataset has. However, having 
more features gives you greater ability to apply the techniques discussed in class.
Next, you should pick a Deep Learning Framework that you would like to use to implement your 
2-layer Neural Network.**

I chose to implement my 2-layer neural network using the Tensorflow Deep Learning Framework.

**Task 1 (10 points): Assuming you are not familiar with the framework, in this part of the 
homework you will present your research describing the resources you used to learn the 
framework (must include links to all resources). Clearly explain why you needed a particular 
resource for implementing a 2-layer Neural Network (NN). (Consider how you will keep track of 
all the computations in a NN i.e., what libraries/tools do you need within this framework.)
For example, some of the known resources for TensorFlow are:
https://www.tensorflow.org/guide/autodiff
https://www.tensorflow.org/api_docs/python/tf/GradientTape
Hint: You need to figure out the APIs/packages used to implement forward propagation and 
backward propagation.**

I used the following resources to help me learn the framework:
- https://www.tensorflow.org/tutorials/quickstart/beginner#:~:text=TensorFlow%202%20quickstart%20for%20beginners%201%20Set%20up,minimize%20the%20loss%3A%20...%205%20Conclusion%20Congratulations%21%20

I used these particular resources to learn how to implement a 2-layer Neural Network (NN).
- https://easy-tensorflow.com/tf-tutorials/neural-networks/two-layer-neural-network (I used this particular resource to understand how to...)

**Task 2 (60 points): Once you have figured the resources you need for the project, design, and 
implement your project. The project must include the following steps (it’s not limited to these 
steps):**
1. Exploratory Data Analysis (Can include data cleaning, visualization etc.)
2. Perform a train-dev-test split.
3. Implement forward propagation (clearly describe the activation functions and other 
hyper-parameters you are using).
4. Compute the final cost function.
5. Implement gradient descent (any variant of gradient descent depending upon your 
data and project can be used) to train your model. In this step it is up to you as someone 
in charge of their project to improvise using optimization algorithms (Adams, RMSProp 
etc.) and/or regularization.
6. Present the results using the test set.
NOTE: In this step, once you have implemented your 2-layer network you may increase and/or 
decrease the number of layers as part of the hyperparameter tuning process.

In [None]:
# imports
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.examples.tutorials.mnist import input_data

#MNIST images are 28x28
image_height = image_width = 28
# 28x28=784, the total number of pixels        
img_size_flat = image_height * image_width
# Number of classes, one class per digit
n_classes = 10                 

#load the MNIST data
#return the labeled images from MNIST dataset
def dataLoader(mode='train'):
    mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
    if mode == 'train':
        xTrain, yTrain, xValid, yValid = mnist.train.images, mnist.train.labels, \
                                             mnist.validation.images, mnist.validation.labels
        return xTrain, yTrain, xValid, yValid
    elif mode == 'test':
        xTest, yTest = mnist.test.images, mnist.test.labels
    return xTest, yTest

def get_next_batch(x, y, top, bottom):
    xBatch = x[top:bottom]
    yBatch = y[top:bottom]
    return xBatch, yBatch
 
# Load MNIST data
xTrain, yTrain, xValid, yValid = dataLoader(mode='train')
yValid[:5, :]

# Hyper-parameters
epochs = 10    # Total number of training epochs
batchSize = 100   # Training batch size
numToDisplay = 100   # Frequency of displaying the training results
learning_rate = 0.001 # The optimization initial learning rate

# number of nodes in the 1st hidden layer
h1 = 200                
#initialize weight and return it
def weight_variable(name, shape):
    initer = tf.truncated_normal_initializer(stddev=0.01)
    return tf.get_variable('W_' + name, dtype=tf.float32,shape=shape,initializer=initer)

#initialize bias and return it
def bias_var(name, shape):
    initial = tf.constant(0., shape=shape, dtype=tf.float32)
    return tf.get_variable('b_' + name,dtype=tf.float32,initializer=initial)
    
#Create a fully-connected layer
#take input from previous layer and return the output array
def fullyConnectedLayer(x, numberUnits, name, use_relu=True):
    in_dim = x.get_shape()[1]
    W = weight_variable(name, shape=[in_dim, numberUnits])
    b = bias_var(name, [numberUnits])
    layer = tf.matmul(x, W)
    layer += b
    if use_relu:
        layer = tf.nn.relu(layer)
    return layer

# Create the graph for the linear model
x = tf.placeholder(tf.float32, shape=[None, img_size_flat], name='X')
y = tf.placeholder(tf.float32, shape=[None, n_classes], name='Y')

# Create a fully-connected layer with h1 nodes as hidden layer
fc1 = fullyConnectedLayer(x, h1, 'FC1', use_relu=True)
# Create a fully-connected layer with n_classes nodes as output layer
output_logits = fullyConnectedLayer(fc1, n_classes, 'OUT', use_relu=False)

# Define the loss function, optimizer, and accuracy
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=output_logits), name='loss')
optimize = tf.train.AdamOptimizer(learning_rate=learning_rate, name='Adam-op').minimize(loss)
rightPrediction= tf.equal(tf.argmax(output_logits, 1), tf.argmax(y, 1), name='correct_pred')
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name='accuracy')

# Network predictions
classified_prediction = tf.argmax(output_logits, axis=1, name='predictions')
# Create the op for initializing all variables
init = tf.global_variables_initializer()
# Create an interactive session (to keep the session in the other cells)
session= tf.InteractiveSession()
# Initialize all variables
session.run(init)
# Number of training iterations in each epoch
num_tr_iter = int(len(yTrain) / batchSize)
for epoch in range(epochs):
    for itr in range(num_tr_iter):
        start = itr * batchSize
        end = (itr + 1) * batchSize
        xBatch, yBatch = get_next_batch(xTrain, yTrain, start, end)
        # Run optimization with back propagation
        feed_dict_batch = {x: xBatch, y: yBatch}
        session.run(optimizer, feed_dict=feed_dict_batch)
        if itr % numToDisplay == 0:
            # Calculate and display the batch loss and accuracy
            loss_batch, acc_batch = session.run([loss, accuracy],
                                             feed_dict=feed_dict_batch)
    # Run validation after every epoch
    feed_dict_valid = {x: xValid[:1000], y: yValid[:1000]}
    loss_valid, acc_valid = session.run([loss, accuracy], feed_dict=feed_dict_valid)
    
# Test the network after training and show accuracy
xTest, yTest = dataLoader(mode='test')
feed_dict_test = {x: xTest[:1000], y: yTest[:1000]}
testLoss, acc_test = session.run([loss, accuracy], feed_dict=feed_dict_test)
print("Test loss: {0:.2f}, test accuracy: {1:.01%}".format(testLoss, acc_test))

#Create figure with 9x9 sub-plots.
def plot_images(images, classified_true, classified_pred=None, title=None):
    fig, axes = plt.subplots(9, 9, figsize=(81, 81))
    fig.subplots_adjust(hspace=0.3, wspace=0.3)
    for i, ax in enumerate(axes.flat):
        # Plot image.
        ax.imshow(images[i].reshape(28, 28), cmap='binary')
        # Show true and predicted classes.
        if classified_pred is None:
            ax_title = "True: {0}".format(classified_true[i])
        else:
            ax_title = "True: {0}, Pred: {1}".format(classified_true[i], classified_pred[i])
        ax.set_title(ax_title)

#Function for plotting examples of images that have been mis-classified
def plot_example_errors(images, classified_true, classified_pred, title=None):
    incorrect = np.logical_not(np.equal(classified_pred, classified_true))
    # Show images that have been wrongly classified.
    wrong_images = images[incorrect]
    # Get the true and predicted classes for those images.
    classified_pred = classified_pred[incorrect]
    classified_true = classified_true[incorrect]
    # Plot the first 15 images.
    plot_images(images=wrong_images[0:15],classified_true=classified_true[0:15],classified_pred=classified_pred[0:15],title=title)
 # Plot some of the correct and misclassified examples
classified_pred = session.run(classified_prediction, feed_dict=feed_dict_test)
classified_true = np.argmax(yTest[:1000], axis=1)
plot_images(xTest, classified_true, classified_pred, title='Correctly Classified Examples')
plot_example_errors(xTest[:1000], classified_true, classified_pred, title='Wrongly Classified Examples')
plt.show()
session.close()

**Task 3 (10 points): In task 2 describe how you selected the hyperparameters. What was the 
rationale behind the technique you used? Did you use regularization? Why, or why not? Did you use 
an optimization algorithm? Why or why not?**

I selected the hyperparameters by closely inspecting my selected specific problem. The hyperparameters I selected to modify in my optimization are epochs, batch size, and iteration. I learned that an epoch is one forward pass and one backward pass of all the training examples. Batch size is the number of training examples in one forward/backward pass. The higher the batch size, the more memory space you'll need. An iteration is one forward pass and one backward pass of one batch of images the training examples. My rational behind the technique I used was experimenting and testing different optimzation techniques. Regularization contrains the coefficient estimates to zero. This means that regularization discourages learning a more complex model. This helps reduce the probability of overfitting. I did not use regularization because. 
I did use an optimization algorithm by tuning hyperparamters because my original model was overfitting. Optimizing the deep neural network improved the results of the classification of this MNIST dataset.


**Task 4 (20 points): Create another baseline model (can be any model we covered so far except a 
deep learning model). Using the same training data (as above) train your model and evaluate 
results using the test set. Compare the results of both models (the Neural Network and the 
baseline model). What are the reasons for one model performing better (or not) than the 
other? Explain.**

In [None]:
import pandas as pd
import numpy as np
import sklearn
import matplotlib
import seaborn as sns

# Import Dataset from sklearn
from sklearn.datasets import load_MNIST
# Load MNIST Data
MNIST = load_MNIST()
# Creating pd DataFrames
MNIST_df = pd.DataFrame(data= MNIST.data, columns= MNIST.feature_names)
target_df = pd.DataFrame(data= MNIST.target, columns= ['letter'])
def converter(specie):
    if specie == 0:
        return 'setosa'
    elif specie == 1:
        return 'versicolor'
    else:
        return 'virginica'
target_df['letters'] = target_df['letters'].apply(converter)

# Concatenate the DataFrames
MNIST_df = pd.concat([MNIST_df, target_df], axis= 1)

#An overview of the dataset:
MNIST_df.describe()

MNIST_df.info()
sns.pairplot(MNIST_df, hue= 'letter')

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Converting Objects to Numerical dtype
MNIST_df.drop('letter', axis= 1, inplace= True)
target_df = pd.DataFrame(columns= ['letter'], data= MNIST.target)
MNIST_df = pd.concat([MNIST_df, target_df], axis= 1)

# Variables
X= MNIST_df.drop(labels= 'letter', axis= 1)
y= MNIST_df['letter']

# Splitting the Dataset 
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, test_size= 0.33, random_state= 101)

# Instantiating LinearRegression() Model
lr = LinearRegression()

# Training/Fitting the Model
lr.fit(X_train, y_train)

# Making Predictions
lr.predict(X_test)
pred = lr.predict(X_test)

# Test
from sklearn.model_selection import train_test_split
MNIST_df.loc[6]
d = {'letter' : [4.6]}
test_df = pd.DataFrame(data= d)
test_df
pred = lr.predict(X_test)
print('letter:', pred[0])

I used the same training data (as above) to train my model and evaluate 
results using the test set. When I compared the results of both models (the Neural Network and the 
baseline model), I found that the neural network performed classification 32% mmore accurately. This is because I optimized the neural network using specific hyperparameters.