# Lesson 9 Assignment - Wine Neural Network

   ## Author - Mike Pearson

## Instructions
For this assignment you will start from the perceptron neural network notebook (Simple Perceptron Neural Network.ipynb) and modify the python code to make it into a multi-layer neural network. To test your system, use the RedWhiteWine.csv file with the goal of building a red or white wine classifier. Use all the features in the dataset, allowing the network to decide how to build the internal weighting system.

## Tasks
1. Use the provided RedWhiteWine.csv file. Include ALL the features with “Class” being your output vector
2. Use the provided Simple Perceptron Neural Network notebook (copied below) to develop a multi-layer feed-forward/backpropagation neural network
4. Be able to adjust the following between experiments:
<ul>
<li>Learning Rate
<li>Number of epochs
<li>Depth of architecture—number of hidden layers between the input and output layers
<li>Number of nodes in a hidden layer—width of the hidden layers
<li>(optional) Momentum
    </ul>
5. Determine what the best neural network structure and hyperparameter settings results in the
best predictive capability

In [1]:
# Data Set
URL = "https://library.startlearninglabs.uw.edu/DATASCI420/Datasets/RedWhiteWine.csv"

# Import the data and set up the usual libraries

In [2]:
import numpy as np
import pandas as pd

wine_data = pd.read_csv(URL)
##print(wine_data.head())
##print(wine_data.dtypes)
##print(wine_data.describe())



## Set up the sigmoid function and the derivative function

In [3]:
# sigmoid function

def nonlin(x,deriv=False):
    if(deriv==True):
        return x*(1-x)
    return 1/(1+np.exp(-x))


### Scale the Data
 Let's scale our data and see if we can help the learning some

In [4]:
num_features = ['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar', 'chlorides',
                'free sulfur dioxide', 'total sulfur dioxide', 'density', 'pH',
                'sulphates', 'alcohol', 'quality']
 
##scaled_features = {}
scaled_wine_data = pd.DataFrame()
for each in num_features:
    mean, std = wine_data[each].mean(), wine_data[each].std(), 
    rng = np.max(wine_data[each]) - np.min(wine_data[each])
    scaled_wine_data.loc[:, each] = (wine_data[each] - mean)/rng

print(scaled_wine_data.describe())

       fixed acidity  volatile acidity   citric acid  residual sugar  \
count   6.497000e+03      6.497000e+03  6.497000e+03    6.497000e+03   
mean    5.608211e-16     -3.067277e-15  4.212037e-15   -1.136774e-16   
std     1.071433e-01      1.097576e-01  8.754088e-02    7.297245e-02   
min    -2.822568e-01     -1.731107e-01 -1.919477e-01   -7.428275e-02   
25%    -6.738075e-02     -7.311067e-02 -4.134531e-02   -5.587784e-02   
50%    -1.779397e-02     -3.311067e-02 -5.200732e-03   -3.747293e-02   
75%     4.005727e-02      4.022267e-02  4.299204e-02    4.074792e-02   
max     7.177432e-01      8.268893e-01  8.080523e-01    9.257172e-01   

          chlorides  free sulfur dioxide  total sulfur dioxide       density  \
count  6.497000e+03         6.497000e+03          6.497000e+03  6.497000e+03   
mean   1.114006e-15        -8.894056e-17         -1.392275e-16  1.260695e-13   
std    5.819535e-02         6.162986e-02          1.302347e-01  5.781132e-02   
min   -7.812934e-02        -1.0

### Perform a simple two layer neural network

I based this code on part one of I Am Trask's tutorial (http://iamtrask.github.io/2015/07/12/basic-python-network/)


In [8]:
# input dataset - split into features (X), and results (y)

Z = scaled_wine_data
X = Z
print("X shape is ", X.shape)

YY = wine_data['Class'].T
Y = YY
y = np.array([Y]).T


print("y shape is ", y.shape)
np.random.seed(1)

# initialize weights randomly with mean 0

##n = hidden layers

n = 2
syn0 = 2*np.random.random((12,n)) - 1
syn1 = 2*np.random.random((n,1)) - 1

for iter in range(20000):
 
    # forward propagation
    l_zero = X
    l_one = nonlin(np.dot(l_zero,syn0))
    l_two = nonlin(np.dot(l_one, syn1))
    # how much did we miss?
    l_two_error = y - l_two
    
 
    # multiply how much we missed by the
    # slope of the sigmoid at the values in l1
    l_two_delta = l_two_error * nonlin(l_two,True)
    l_one_error = l_two_delta.dot(syn1.T)
    ##print('derivative is ', nonlin(l1,True))
    # update weights
    l_one_delta = l_one_error * nonlin(l_one,deriv=True)
    if (iter% 1000) == 0:
            print("Error after "+str(iter)+" iterations:" + str(np.mean(np.abs(l_two_error))))
    syn0 =syn0 + np.dot(l_zero.T,l_one_delta)
    syn1 =syn1 + np.dot(l_one.T,l_two_delta)
    

print ("Output After Training:")
print (l_two[6492:6497])
print("initial is ", y[6492:6497])
print("l_one shape is", l_one.shape)  
print("l_two shape is", l_two.shape)
print("l_one_error shape is", l_one_error.shape)
print("l_one_delta shape is", l_one_delta.shape)

X shape is  (6497, 12)
y shape is  (6497, 1)
Error after 0 iterations:0.5982139671823632
Error after 1000 iterations:0.24613004612259184
Error after 2000 iterations:0.246063978595473
Error after 3000 iterations:0.24604667661406743
Error after 4000 iterations:0.2460337236563484
Error after 5000 iterations:0.24602300830377394
Error after 6000 iterations:0.24601637124324072
Error after 7000 iterations:0.246012152740657
Error after 8000 iterations:0.24600913718284148
Error after 9000 iterations:0.24600682221864983
Error after 10000 iterations:0.24600500075079648
Error after 11000 iterations:0.24600353315477785
Error after 12000 iterations:0.24600231278641635
Error after 13000 iterations:0.24600126569192265
Error after 14000 iterations:0.24600034403938015
Error after 15000 iterations:0.24599951709304807
Error after 16000 iterations:0.24599876428171102
Error after 17000 iterations:0.24599807109359673
Error after 18000 iterations:0.24599742690812385
Error after 19000 iterations:0.245996823840

### Now with selectable alphas and selectable hidden layers

I made use of the code at I am Trask (http://iamtrask.github.io/2015/07/27/python-network-part2/) for this section. I found it easier to understand than the given code.

In [6]:
# Selectable hidden layers
# Passed in the weight vectors, bias vector, the input vector and the Y

n_lay = [6, 8, 9]
alphaz = [0.01, 0.012, 0.014, 0.016]
n_epochs = 30000
width = 12

print("\nThe number of epochs is ", n_epochs)

for layers in n_lay:
    print("\nThe number of layers is", layers)
    for alpha in alphaz:
        np.random.seed(11)
        ## now let's print out what alpha we are running
        print("\nTraining With Alpha:", str(alpha))
        ## initialize the weights of the layers
        init_synapse = 2*np.random.random((width,layers)) - 1
        next_synapse = 2*np.random.random((layers,1)) - 1
    
        for i in range(n_epochs + 1):
            lyr_zed = X
            lyr_one = nonlin(np.dot(lyr_zed,init_synapse))
            lyr_two = nonlin(np.dot(lyr_one,next_synapse))
                ## how far off?
            lyr_two_error = y - lyr_two
            lyr_two_delta = lyr_two_error * nonlin(lyr_two,True)
            lyr_one_error = lyr_two_delta.dot(next_synapse.T)
            lyr_one_delta = lyr_one_error * nonlin(lyr_one,True)
            if (i% 5000) == 0:
                print("Error after "+str(i)+" iterations:" + str(np.mean(np.abs(lyr_two_error))))
            next_synapse = next_synapse + (alpha * (lyr_one.T.dot(lyr_two_delta)))
            init_synapse = init_synapse + (alpha * (lyr_zed.T.dot(lyr_one_delta)))


The number of epochs is  30000

The number of layers is 6

Training With Alpha: 0.01
Error after 0 iterations:0.48030280948255866
Error after 5000 iterations:0.0075764317578820095
Error after 10000 iterations:0.005480400384825654
Error after 15000 iterations:0.0047283704821205665
Error after 20000 iterations:0.004250255368075119
Error after 25000 iterations:0.003912926735376989
Error after 30000 iterations:0.0036626100662651337

Training With Alpha: 0.012
Error after 0 iterations:0.48030280948255866
Error after 5000 iterations:0.00681293574294505
Error after 10000 iterations:0.0051384288262064075
Error after 15000 iterations:0.004440341295377566
Error after 20000 iterations:0.003984285344494108
Error after 25000 iterations:0.0036706772554025314
Error after 30000 iterations:0.003451855465338075

Training With Alpha: 0.014
Error after 0 iterations:0.48030280948255866
Error after 5000 iterations:0.00633089023195261
Error after 10000 iterations:0.004902811687083391
Error after 15000 itera

### Results

### The best results (lowest cumulative error) looks to be with an alpha (learning rate) of 0.016, combined with 6 hidden layers, and 12 nodes.

