# Lesson 08
# Peter Lorenz

## 0. Preparation
Import required libraries:

In [11]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

Set global options:

In [3]:
# Display plots inline
%matplotlib inline

# Display multiple cell outputs
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# Suppress scientific notation
np.set_printoptions(suppress=True)
np.set_printoptions(precision=3)
pd.set_option('display.float_format', lambda x: '%.3f' % x)

Declare utility functions:

## 1. Use the provided RedWhiteWine.csv file
In this section, we load and prepare the provided RedWhiteWine.csv file. We include all features, using “Class” as the output vector. First we read and prepare the data:

In [8]:
# Internet location of the data set
url = "https://library.startlearninglabs.uw.edu/DATASCI420/2019/Datasets/RedWhiteWine.csv"

# Download the data into a dataframe object
wine_data = pd.read_csv(url)

# Display shape and initial data
wine_data.shape
wine_data.head()

# Examine column types
wine_data.info()

(6497, 13)

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality,Class
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.998,3.51,0.56,9.4,5,1
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.997,3.2,0.68,9.8,5,1
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5,1
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6,1
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.998,3.51,0.56,9.4,5,1


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6497 entries, 0 to 6496
Data columns (total 13 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   fixed acidity         6497 non-null   float64
 1   volatile acidity      6497 non-null   float64
 2   citric acid           6497 non-null   float64
 3   residual sugar        6497 non-null   float64
 4   chlorides             6497 non-null   float64
 5   free sulfur dioxide   6497 non-null   float64
 6   total sulfur dioxide  6497 non-null   float64
 7   density               6497 non-null   float64
 8   pH                    6497 non-null   float64
 9   sulphates             6497 non-null   float64
 10  alcohol               6497 non-null   float64
 11  quality               6497 non-null   int64  
 12  Class                 6497 non-null   int64  
dtypes: float64(11), int64(2)
memory usage: 660.0 KB


Next we set aside the output variable:

In [9]:
# Set aside the output variable
output = wine_data['Class']

# Drop output from main data frame
wine_data = wine_data.drop('Class', axis=1)
wine_data.shape

(6497, 12)

Now we scale the features:

In [10]:
# Scale data
scaler = StandardScaler()
wine_data = pd.DataFrame(scaler.fit_transform(wine_data), 
                         columns=wine_data.columns)

# Display scaled data set
wine_data.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,0.142,2.189,-2.193,-0.745,0.57,-1.1,-1.446,1.035,1.813,0.193,-0.915,-0.937
1,0.451,3.282,-2.193,-0.598,1.198,-0.311,-0.862,0.701,-0.115,1.0,-0.58,-0.937
2,0.451,2.553,-1.918,-0.661,1.027,-0.875,-1.092,0.768,0.258,0.798,-0.58,-0.937
3,3.074,-0.362,1.661,-0.745,0.541,-0.762,-0.986,1.102,-0.364,0.328,-0.58,0.208
4,0.142,2.189,-2.193,-0.745,0.57,-1.1,-1.446,1.035,1.813,0.193,-0.915,-0.937


To prepare for building our network, we partition the data set into training and test data, reserving 10% of the data for test:

In [12]:
# Split data into training and test
X_train, X_test, y_train, y_test = train_test_split(wine_data, 
                                                    output, 
                                                    test_size = 0.1,
                                                    random_state = 0)

# Describe training and test data sets
print("Training data has {} rows.".format(X_train.shape[0]))
print("Test data has {} rows.".format(X_test.shape[0]))

Training data has 5847 rows.
Test data has 650 rows.


## 2. Develop multi-layer feed-forward/backpropagation neural network
In this section, we use the provided Simple Perceptron Neural Network notebook to develop a multi-layer feed-forward/backpropagation neural network.

## 3. Adjust the following between experiments:
In this section, we adjust various settings and run experiments.

### Learning Rate
In this section, we adjust the learning rate.

### Number of epochs
In this section, we adjust the number of epochs.

### Depth of architecture
In this section, we adjust the depth of the architecture, that is, the number of hidden layers between the input and output layers.

### Number of nodes per hidden layer
In this section, we adjust the number of nodes in a hidden layer, the width of the hidden layers.

## 4. Determine best neural network structure and settings
In this section, we determine the best neural network structure and hyperparameter settings, resulting in the best predictive capability.

## Summary
In this assignment, we ...