<a href="https://colab.research.google.com/github/jeffheaton/t81_558_deep_learning/blob/master/assignments/assignment_yourname_class8.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# T81-558: Applications of Deep Neural Networks
* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), School of Engineering and Applied Science, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)
* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/).

**Module 8 Assignment: Feature Engineering**

**Student Name: Hao-Tien Kuo**

# Assignment Instructions

This assignment is similar to assignment 5, except that you must use feature engineering to solve it.  I provide you with a dataset that contains dimensions and the quality of items of specific shapes.  With the values of 'height', 'width', 'depth'. 'shape', and 'quality' you should try to predict the cost of these items.  You should be able to match very close to solution file, if you feature engineer correctly.  To get full credit your average cost should not be more than 50 off from the solution.  The autocorrector will let you know if you are in this range.

You can find all of the needed CSV files here:

* [Shapes - Training](https://data.heatonresearch.com/data/t81-558/datasets/shapes-train.csv)
* [Shapes - Submit](https://data.heatonresearch.com/data/t81-558/datasets/shapes-test.csv)

Use the training file to train your neural network and submit results for for the data contained in the test/submit file.

In [1]:
try:
    from google.colab import drive
    drive.mount('/content/drive', force_remount=True)
    COLAB = True
    print("Note: using Google CoLab")
    %tensorflow_version 2.x
except:
    print("Note: not using Google CoLab")
    COLAB = False

Mounted at /content/drive
Note: using Google CoLab
Colab only includes TensorFlow 2.x; %tensorflow_version has no effect.


# Assignment Submit Function

You will submit the 10 programming assignments electronically.  The following submit function can be used to do this.  My server will perform a basic check of each assignment and let you know if it sees any basic problems. 

**It is unlikely that should need to modify this function.**

In [2]:
import base64
import os
import numpy as np
import pandas as pd
import requests
import PIL
import PIL.Image
import io

# This function submits an assignment.  You can submit an assignment as much as you like, only the final
# submission counts.  The paramaters are as follows:
# data - List of pandas dataframes or images.
# key - Your student key that was emailed to you.
# no - The assignment class number, should be 1 through 1.
# source_file - The full path to your Python or IPYNB file.  This must have "_class1" as part of its name.  
# .             The number must match your assignment number.  For example "_class2" for class assignment #2.
def submit(data,key,no,source_file=None):
    if source_file is None and '__file__' not in globals(): raise Exception('Must specify a filename when a Jupyter notebook.')
    if source_file is None: source_file = __file__
    suffix = '_class{}'.format(no)
    if suffix not in source_file: raise Exception('{} must be part of the filename.'.format(suffix))
    with open(source_file, "rb") as image_file:
        encoded_python = base64.b64encode(image_file.read()).decode('ascii')
    ext = os.path.splitext(source_file)[-1].lower()
    if ext not in ['.ipynb','.py']: raise Exception("Source file is {} must be .py or .ipynb".format(ext))
    payload = []
    for item in data:
        if type(item) is PIL.Image.Image:
            buffered = BytesIO()
            item.save(buffered, format="PNG")
            payload.append({'PNG':base64.b64encode(buffered.getvalue()).decode('ascii')})
        elif type(item) is pd.core.frame.DataFrame:
            payload.append({'CSV':base64.b64encode(item.to_csv(index=False).encode('ascii')).decode("ascii")})
    r= requests.post("https://api.heatonresearch.com/assignment-submit",
        headers={'x-api-key':key}, json={ 'payload': payload,'assignment': no, 'ext':ext, 'py':encoded_python})
    if r.status_code==200:
        print("Success: {}".format(r.text))
    else: print("Failure: {}".format(r.text))

# Assignment #8 Sample Code

The following code provides a starting point for this assignment.

In [27]:
import os
import pandas as pd
from scipy.stats import zscore
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Dropout
import pandas as pd
import io
import requests
import numpy as np
from sklearn import metrics
from sklearn.model_selection import KFold, train_test_split
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from scipy.stats import zscore

# This is your student key that I emailed to you at the beginnning of the semester.
key = "lhkxWaPDTw1k2isuaX0Gk4X0XN0RcGFX3lEQFj0l" 

# You must also identify your source file.  (modify for your local setup)
file='/content/drive/MyDrive/Colab Notebooks/Deep Neural Networks/assignment_Hao-Tien Kuo_class8.ipynb'  # Google CoLab

# Begin assignment
df_train = pd.read_csv("https://data.heatonresearch.com/data/t81-558/datasets/shapes-train.csv")
df_submit = pd.read_csv("https://data.heatonresearch.com/data/t81-558/datasets/shapes-test.csv")

In [28]:
df_train.head()

Unnamed: 0,id,height,width,depth,shape,quality,cost
0,1,40,53,89,cylinder,89,200.49
1,2,85,35,51,box,37,1175.71
2,3,12,19,61,box,23,131.72
3,4,29,37,20,box,94,15.83
4,5,87,40,22,cylinder,54,340.21


In [29]:
df_train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 7 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   id       10000 non-null  int64  
 1   height   10000 non-null  int64  
 2   width    10000 non-null  int64  
 3   depth    10000 non-null  int64  
 4   shape    10000 non-null  object 
 5   quality  10000 non-null  int64  
 6   cost     10000 non-null  float64
dtypes: float64(1), int64(5), object(1)
memory usage: 547.0+ KB


In [30]:
df_train.describe()

Unnamed: 0,id,height,width,depth,quality,cost
count,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0
mean,5000.5,49.781,50.5664,49.8254,50.6615,554.410241
std,2886.89568,28.687285,28.633456,28.568968,28.923111,885.937479
min,1.0,1.0,1.0,1.0,1.0,0.0
25%,2500.75,25.0,26.0,25.0,26.0,51.54
50%,5000.5,50.0,51.0,49.0,51.0,205.195
75%,7500.25,74.25,76.0,74.0,76.0,652.015
max,10000.0,99.0,99.0,99.0,100.0,8215.95


In [31]:
# Generate dummies
df_train = pd.concat([df_train, pd.get_dummies(df_train['shape'])],axis=1)

# Standardize ranges
df_train['height'] = zscore(df_train['height'])
df_train['width'] = zscore(df_train['width'])
df_train['depth'] = zscore(df_train['depth'])
df_train['quality'] = zscore(df_train['quality'])

# Convert to numpy
x_columns = df_train.columns.drop('id').drop('shape').drop('cost')
x = df_train[x_columns].values
y = df_train['cost'].values

# Create train/validation
x_train, x_val, y_train, y_val = train_test_split(x, y, test_size=0.2, random_state=10)

In [34]:
# Build the neural network
model = Sequential()
model.add(Dense(25, input_dim=x.shape[1], activation='relu')) # Hidden 1
model.add(Dense(10, activation='relu')) # Hidden 2
model.add(Dense(1)) # Output
model.compile(loss='mean_squared_error', metrics=[tf.keras.metrics.RootMeanSquaredError()], optimizer='adam')
monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, 
                        patience=5, verbose=1, mode='auto', 
                        restore_best_weights=True)
model.fit(x_train,y_train,validation_data=(x_val,y_val),
          callbacks=[monitor],verbose=2,epochs=1000)

Epoch 1/1000
250/250 - 3s - loss: 1075066.6250 - root_mean_squared_error: 1036.8542 - val_loss: 1068362.6250 - val_root_mean_squared_error: 1033.6162 - 3s/epoch - 13ms/step
Epoch 2/1000
250/250 - 1s - loss: 912521.9375 - root_mean_squared_error: 955.2601 - val_loss: 752170.4375 - val_root_mean_squared_error: 867.2776 - 554ms/epoch - 2ms/step
Epoch 3/1000
250/250 - 1s - loss: 545400.8125 - root_mean_squared_error: 738.5126 - val_loss: 403682.6875 - val_root_mean_squared_error: 635.3603 - 556ms/epoch - 2ms/step
Epoch 4/1000
250/250 - 1s - loss: 326732.2500 - root_mean_squared_error: 571.6050 - val_loss: 287164.1562 - val_root_mean_squared_error: 535.8770 - 506ms/epoch - 2ms/step
Epoch 5/1000
250/250 - 0s - loss: 260773.2812 - root_mean_squared_error: 510.6596 - val_loss: 244296.8125 - val_root_mean_squared_error: 494.2639 - 499ms/epoch - 2ms/step
Epoch 6/1000
250/250 - 1s - loss: 226682.7031 - root_mean_squared_error: 476.1121 - val_loss: 214591.8594 - val_root_mean_squared_error: 463.24

<keras.callbacks.History at 0x7fb94043ed50>

In [35]:
# Predict
pred = model.predict(x_val)

# Measure MSE error
score = metrics.mean_squared_error(pred,y_val)
print("Final score (MSE): {}".format(score))

# Measure RMSE error
score = np.sqrt(metrics.mean_squared_error(pred,y_val))
print("Final score (RMSE): {}".format(score))

Final score (MSE): 35216.95025566725
Final score (RMSE): 187.66179753926278


In [38]:
# Generate dummies
df_submit = pd.concat([df_submit, pd.get_dummies(df_submit['shape'])],axis=1)

# Standardize ranges
df_submit['height'] = zscore(df_submit['height'])
df_submit['width'] = zscore(df_submit['width'])
df_submit['depth'] = zscore(df_submit['depth'])
df_submit['quality'] = zscore(df_submit['quality'])

# Convert to numpy
x_submit = df_submit[x_columns].values

In [41]:
# Predict
pred_submit = pd.DataFrame(model.predict(x_submit), columns=['cost'])
df_submit = pd.concat([df_submit['id'], pred_submit], axis=1)



In [42]:
submit(source_file=file,data=[df_submit],key=key,no=8)

Success: Submitted Assignment 8 for h.kuo:
You have submitted this assignment 2 times. (this is fine)
Note: The mean difference 7.241759098499983 for column 'cost' is acceptable and is less than the maximum allowed value of '50.0' for this assignment.
