<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>

# Neural Network Framework (Keras)

## *Data Science Unit 4 Sprint 2 Assignmnet 3*

## Use the Keras Library to build a Multi-Layer Perceptron Model on the Boston Housing dataset

- The Boston Housing dataset comes with the Keras library so use Keras to import it into your notebook. 
- Normalize the data (all features should have roughly the same scale)
- Import the type of model and layers that you will need from Keras.
- Instantiate a model object and use `model.add()` to add layers to your model
- Since this is a regression model you will have a single output node in the final layer.
- Use activation functions that are appropriate for this task
- Compile your model
- Fit your model and report its accuracy in terms of Mean Squared Error
- Use the history object that is returned from model.fit to make graphs of the model's loss or train/validation accuracies by epoch. 
- Run this same data through a linear regression model. Which achieves higher accuracy?
- Do a little bit of feature engineering and see how that affects your neural network model. (you will need to change your model to accept more inputs)
- After feature engineering, which model sees a greater accuracy boost due to the new features?

In [1]:
import pandas as pd
from sklearn.impute import SimpleImputer
from sklearn.pipeline import make_pipeline
!pip install category_encoders
import category_encoders as ce
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np
from tensorflow.keras.datasets import fashion_mnist
from tensorflow.keras.utils import to_categorical

Collecting category_encoders
[?25l  Downloading https://files.pythonhosted.org/packages/a0/52/c54191ad3782de633ea3d6ee3bb2837bda0cf3bc97644bb6375cf14150a0/category_encoders-2.1.0-py2.py3-none-any.whl (100kB)
[K    100% |████████████████████████████████| 102kB 2.8MB/s a 0:00:011
Installing collected packages: category-encoders
Successfully installed category-encoders-2.1.0
[33mYou are using pip version 10.0.1, however version 19.3.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [2]:
"""importing data"""
df = pd.read_csv('amesHousePrice.csv')
print(df.shape)
df.head()

(1460, 81)


Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
0,1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,...,0,,,,0,2,2008,WD,Normal,208500
1,2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,...,0,,,,0,5,2007,WD,Normal,181500
2,3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,...,0,,,,0,9,2008,WD,Normal,223500
3,4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,...,0,,,,0,2,2006,WD,Abnorml,140000
4,5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,...,0,,,,0,12,2008,WD,Normal,250000


In [3]:
"""Viewing Nan Values"""
df.isnull().sum().sort_values(ascending=False)[:13]

PoolQC          1453
MiscFeature     1406
Alley           1369
Fence           1179
FireplaceQu      690
LotFrontage      259
GarageCond        81
GarageType        81
GarageYrBlt       81
GarageFinish      81
GarageQual        81
BsmtExposure      38
BsmtFinType2      38
dtype: int64

In [4]:
"""Drop the columns with over 500 nan values"""
bad_columns = df.isnull().sum().sort_values(ascending=False)[:5].index.tolist()
# And remove the Id feature
bad_columns.append('Id')
data = df.drop(bad_columns, axis=1).copy()
data.head(12)

Unnamed: 0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,LandSlope,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
0,60,RL,65.0,8450,Pave,Reg,Lvl,AllPub,Inside,Gtl,...,0,0,0,0,0,2,2008,WD,Normal,208500
1,20,RL,80.0,9600,Pave,Reg,Lvl,AllPub,FR2,Gtl,...,0,0,0,0,0,5,2007,WD,Normal,181500
2,60,RL,68.0,11250,Pave,IR1,Lvl,AllPub,Inside,Gtl,...,0,0,0,0,0,9,2008,WD,Normal,223500
3,70,RL,60.0,9550,Pave,IR1,Lvl,AllPub,Corner,Gtl,...,272,0,0,0,0,2,2006,WD,Abnorml,140000
4,60,RL,84.0,14260,Pave,IR1,Lvl,AllPub,FR2,Gtl,...,0,0,0,0,0,12,2008,WD,Normal,250000
5,50,RL,85.0,14115,Pave,IR1,Lvl,AllPub,Inside,Gtl,...,0,320,0,0,700,10,2009,WD,Normal,143000
6,20,RL,75.0,10084,Pave,Reg,Lvl,AllPub,Inside,Gtl,...,0,0,0,0,0,8,2007,WD,Normal,307000
7,60,RL,,10382,Pave,IR1,Lvl,AllPub,Corner,Gtl,...,228,0,0,0,350,11,2009,WD,Normal,200000
8,50,RM,51.0,6120,Pave,Reg,Lvl,AllPub,Inside,Gtl,...,205,0,0,0,0,4,2008,WD,Abnorml,129900
9,190,RL,50.0,7420,Pave,Reg,Lvl,AllPub,Corner,Gtl,...,0,0,0,0,0,1,2008,WD,Normal,118000


In [5]:
"""One hot encodeing and cleaning our data some more."""
encoder = ce.OneHotEncoder()
encoded = encoder.fit_transform(data)
drops1 = encoded.isnull().sum().sort_values(ascending=False)[:3].index.tolist()
drops1

['LotFrontage', 'GarageYrBlt', 'MasVnrArea']

In [6]:
data2 = encoded.drop(drops1, axis=1).copy()
data2.head(12)

Unnamed: 0,MSSubClass,MSZoning_1,MSZoning_2,MSZoning_3,MSZoning_4,MSZoning_5,LotArea,Street_1,Street_2,LotShape_1,...,SaleType_7,SaleType_8,SaleType_9,SaleCondition_1,SaleCondition_2,SaleCondition_3,SaleCondition_4,SaleCondition_5,SaleCondition_6,SalePrice
0,60,1,0,0,0,0,8450,1,0,1,...,0,0,0,1,0,0,0,0,0,208500
1,20,1,0,0,0,0,9600,1,0,1,...,0,0,0,1,0,0,0,0,0,181500
2,60,1,0,0,0,0,11250,1,0,0,...,0,0,0,1,0,0,0,0,0,223500
3,70,1,0,0,0,0,9550,1,0,0,...,0,0,0,0,1,0,0,0,0,140000
4,60,1,0,0,0,0,14260,1,0,0,...,0,0,0,1,0,0,0,0,0,250000
5,50,1,0,0,0,0,14115,1,0,0,...,0,0,0,1,0,0,0,0,0,143000
6,20,1,0,0,0,0,10084,1,0,1,...,0,0,0,1,0,0,0,0,0,307000
7,60,1,0,0,0,0,10382,1,0,0,...,0,0,0,1,0,0,0,0,0,200000
8,50,0,1,0,0,0,6120,1,0,1,...,0,0,0,0,1,0,0,0,0,129900
9,190,1,0,0,0,0,7420,1,0,1,...,0,0,0,1,0,0,0,0,0,118000


In [8]:
data2_st = data2.describe()
data2_st.pop('SalePrice')
data2_st = data2_st.transpose()
data2_st

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
MSSubClass,1460.0,56.897260,42.300571,20.0,20.0,50.0,70.0,190.0
MSZoning_1,1460.0,0.788356,0.408614,0.0,1.0,1.0,1.0,1.0
MSZoning_2,1460.0,0.149315,0.356521,0.0,0.0,0.0,0.0,1.0
MSZoning_3,1460.0,0.006849,0.082505,0.0,0.0,0.0,0.0,1.0
MSZoning_4,1460.0,0.044521,0.206319,0.0,0.0,0.0,0.0,1.0
MSZoning_5,1460.0,0.010959,0.104145,0.0,0.0,0.0,0.0,1.0
LotArea,1460.0,10516.828082,9981.264932,1300.0,7553.5,9478.5,11601.5,215245.0
Street_1,1460.0,0.995890,0.063996,0.0,1.0,1.0,1.0,1.0
Street_2,1460.0,0.004110,0.063996,0.0,0.0,0.0,0.0,1.0
LotShape_1,1460.0,0.633562,0.481996,0.0,0.0,1.0,1.0,1.0


In [9]:
def norm(x):
  return (x - data2_st['mean']) / data2_st['std']
X = norm(data2.drop('SalePrice', axis=1)).values
y = data['SalePrice'].values
X.shape, y.shape

((1460, 278), (1460,))

## Use the Keras Library to build an image recognition network using the Fashion-MNIST dataset (also comes with keras)

- Load and preprocess the image data similar to how we preprocessed the MNIST data in class.
- Make sure to one-hot encode your category labels
- Make sure to have your final layer have as many nodes as the number of classes that you want to predict.
- Try different hyperparameters. What is the highest accuracy that you are able to achieve.
- Use the history object that is returned from model.fit to make graphs of the model's loss or train/validation accuracies by epoch. 
- Remember that neural networks fall prey to randomness so you may need to run your model multiple times (or use Cross Validation) in order to tell if a change to a hyperparameter is truly producing better results.

In [None]:
##### Your Code Here #####

## Stretch Goals:

- Use Hyperparameter Tuning to make the accuracy of your models as high as possible. (error as low as possible)
- Use Cross Validation techniques to get more consistent results with your model.
- Use GridSearchCV to try different combinations of hyperparameters. 
- Start looking into other types of Keras layers for CNNs and RNNs maybe try and build a CNN model for fashion-MNIST to see how the results compare.