## Multilayer Perceptrons for Regression

In [1]:
import pandas as pd
import pickle
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from math import sqrt
from mxnet import gluon,init,autograd,np,nd,npx,gpu
from mxnet.metric import RMSE
from mxnet.gluon import nn
#import d2l
npx.set_np()

Load data

In [2]:
file = open('msd_full.pickle','rb')
data = pickle.load(file)
file.close()

print('x_train',data['X_train'].shape)
print('y_train',data['Y_train'].shape)
print('x_test',data['X_test'].shape)
print('y_test',data['Y_test'].shape)

x_train (463715, 90)
y_train (463715,)
x_test (51630, 90)
y_test (51630,)


Standardize all numerical features in the training set before splitting into subtraining and validation. Apply the mean and standard deviation of features in the training set to standardize numerical features in the test set

In [3]:
# standardize all feature 
xscaler = StandardScaler().fit(data['X_train'])
x_train = xscaler.transform(data['X_train'])
x_test = xscaler.transform(data['X_test'])

Reserve the last 10% of the training dataset as the validation set, and the remaining 90% as the subtraining set

In [4]:
# prepare data that we need
sub_ind = int(x_train.shape[0] * 0.9)
print(sub_ind)

y_train = data['Y_train']
y_test = data['Y_test']
mean_val = y_train.mean()
y_train_demean = y_train - mean_val
y_test_demean = y_test - mean_val
x_subtrain = x_train[0:sub_ind]
x_val = x_train[sub_ind:]
y_subtrain = y_train[0:sub_ind]
y_val = y_train[sub_ind:]
y_subtrain_demean = y_train_demean[0:sub_ind]
y_val_demean = y_train_demean[sub_ind:]
print('x_subtrain: ',x_subtrain.shape)
print('Y_subtrain: ',y_subtrain.shape)
print('x_val:',x_val.shape)
print('y_val:',y_val.shape)

417343
x_subtrain:  (417343, 90)
Y_subtrain:  (417343,)
x_val: (46372, 90)
y_val: (46372,)


### Q1 (90%):
Train and tune the models listed above. Report test RMSE for each model setting.

#### OLS
We don't need to tune hyperparameter when using standard OLS model, so we just use validation set and test set to compute RMSE

In [5]:
m1 = LinearRegression()
m1.fit(x_subtrain[:10000],y_subtrain[:10000])
y_pred1 = m1.predict(x_val)
y_pred2 = m1.predict(x_test)
rms1 = sqrt(mean_squared_error(y_val,y_pred1))
rms2 = sqrt(mean_squared_error(y_test,y_pred2))
print('RMSE for validation data: ',rms1)
print('RMSE for test data: ',rms2)

RMSE for validation data:  9.574683375081593
RMSE for test data:  9.550724957519298


#### MLP_0_dm

In [5]:
def load_array(data_arrays, batch_size, is_train=True):
    """Construct a Gluon data loader"""
    dataset = gluon.data.ArrayDataset(*data_arrays)
    return gluon.data.DataLoader(dataset, batch_size, shuffle=is_train)

def train_model(net,x_train,y_train,x_val,y_val,loss, trainer,num_epochs=3000,batch_size=32,early_stop=True):
    
    features = np.array(x_train.astype('float32'),ctx=gpu())
    target = np.array(y_train.astype('float32'),ctx=gpu())
    train_iter = load_array((features, target), batch_size)
    
    val_features = np.array(x_val.astype('float32'),ctx=gpu())
    val_target = np.array(y_val.astype('float32'),ctx=gpu())
    
    #net.load_parameters('net.params', ctx=gpu(0))
    
    pre_rmse=0.0
    count=0
    
    for epoch in range(1, num_epochs + 1):
        for X, y in train_iter:
            X = X.as_in_context(gpu())
            y = y.as_in_context(gpu())
            with autograd.record():
                l = loss(net(X), y)
            l.backward()
            trainer.step(batch_size)
    

        l = loss(net(features), target)
        val_rmse = RMSE()
        val_rmse.update(val_target,net(val_features))
        
        if epoch ==1:
            pre_rmse=val_rmse.get()[1]
            
        if epoch % 10 ==0:
            print('epoch %d, training loss: %f , validation RMSE: %f' % (epoch, l.mean().asnumpy(),val_rmse.get()[1]))
            
        if epoch % 50 == 0 :

            if val_rmse.get()[1] - pre_rmse >= 0 and early_stop==True:
                print('------Early Stop------')
                break
            else:
                pre_rmse = val_rmse.get()[1]
        
        
        
        
        #pre_rmse = val_rmse.get()[1]
    
    print('finish training... End at Epoch ',epoch)
            

def evaluate_RMSE(model,x_test,y_test):
    test_target = np.array(y_test.astype('float32'),ctx=gpu())
    test_features = np.array(x_test.astype('float32'),ctx=gpu())
    test_rmse = RMSE()
    test_rmse.update(test_target,model(test_features))
    print('Test RMSE:',test_rmse.get()[1])

**Tune Hyperparameter - batch size** <br>
try different batch size = 16,32,64 and select the best batch size

In [7]:
#train model - batch size = 16
# model structure
model1 = nn.Sequential()
model1.add(nn.Dense(1))
model1.initialize(init.Normal(sigma=0.01),ctx=gpu())
model1.collect_params().reset_ctx(gpu())

# Loss
loss = gluon.loss.L2Loss()

#trainer
trainer = gluon.Trainer(model1.collect_params(), 'sgd', {'learning_rate': 0.001})

num_epochs = 3000
train_model(model1,x_subtrain[:10000],y_subtrain_demean[:10000],x_val,y_val_demean,
            loss,trainer,num_epochs,batch_size=16)

epoch 10, training loss: 46.175613 , validation RMSE: 9.598289
epoch 20, training loss: 45.868320 , validation RMSE: 9.575791
epoch 30, training loss: 45.875324 , validation RMSE: 9.585664
epoch 40, training loss: 45.808174 , validation RMSE: 9.572514
epoch 50, training loss: 45.862011 , validation RMSE: 9.585183
epoch 60, training loss: 45.799294 , validation RMSE: 9.578393
epoch 70, training loss: 45.800877 , validation RMSE: 9.571258
epoch 80, training loss: 45.801388 , validation RMSE: 9.580872
epoch 90, training loss: 45.808609 , validation RMSE: 9.580564
epoch 100, training loss: 45.811005 , validation RMSE: 9.583174
epoch 110, training loss: 45.788792 , validation RMSE: 9.580637
epoch 120, training loss: 45.875565 , validation RMSE: 9.580915
epoch 130, training loss: 45.820789 , validation RMSE: 9.577175
epoch 140, training loss: 45.830688 , validation RMSE: 9.582801
epoch 150, training loss: 45.812447 , validation RMSE: 9.585162
------Early Stop------
finish training... End at 

In [8]:
#train model - batch size = 32
# model structure
model2 = nn.Sequential()
model2.add(nn.Dense(1))
model2.initialize(init.Normal(sigma=0.01),ctx=gpu())
model2.collect_params().reset_ctx(gpu())

# Loss
loss = gluon.loss.L2Loss()

#trainer
trainer = gluon.Trainer(model2.collect_params(), 'sgd', {'learning_rate': 0.001})


num_epochs = 3000
train_model(model2,x_subtrain[:10000],y_subtrain_demean[:10000],x_val,y_val_demean,
            loss,trainer,num_epochs,batch_size=32)

epoch 10, training loss: 47.034111 , validation RMSE: 9.686528
epoch 20, training loss: 46.155087 , validation RMSE: 9.599195
epoch 30, training loss: 45.954365 , validation RMSE: 9.588474
epoch 40, training loss: 45.864407 , validation RMSE: 9.572903
epoch 50, training loss: 45.833149 , validation RMSE: 9.572663
epoch 60, training loss: 45.821266 , validation RMSE: 9.568368
epoch 70, training loss: 45.797195 , validation RMSE: 9.571445
epoch 80, training loss: 45.807320 , validation RMSE: 9.570119
epoch 90, training loss: 45.792061 , validation RMSE: 9.574532
epoch 100, training loss: 45.793892 , validation RMSE: 9.569628
epoch 110, training loss: 45.784645 , validation RMSE: 9.571787
epoch 120, training loss: 45.797333 , validation RMSE: 9.577531
epoch 130, training loss: 45.798561 , validation RMSE: 9.575035
epoch 140, training loss: 45.780235 , validation RMSE: 9.573931
epoch 150, training loss: 45.776791 , validation RMSE: 9.573990
------Early Stop------
finish training... End at 

In [37]:
#train model - batch size = 64
# model structure
model3 = nn.Sequential()
model3.add(nn.Dense(1))
model3.initialize(init.Normal(sigma=0.01),ctx=gpu())
model3.collect_params().reset_ctx(gpu())

# Loss
loss = gluon.loss.L2Loss()

#trainer
trainer = gluon.Trainer(model3.collect_params(), 'sgd', {'learning_rate': 0.001})

num_epochs = 3000
train_model(model,x_subtrain[:10000],y_subtrain_demean[:10000],x_val,y_val_demean,
            loss,trainer,num_epochs,batch_size=64)

epoch 10, training loss: 48.632748 , validation RMSE: 9.833032
epoch 20, training loss: 47.030479 , validation RMSE: 9.684284
epoch 30, training loss: 46.418140 , validation RMSE: 9.626363
epoch 40, training loss: 46.143650 , validation RMSE: 9.600238
epoch 50, training loss: 46.007103 , validation RMSE: 9.587128
epoch 60, training loss: 45.931751 , validation RMSE: 9.580962
epoch 70, training loss: 45.887302 , validation RMSE: 9.576270
epoch 80, training loss: 45.861057 , validation RMSE: 9.575994
epoch 90, training loss: 45.838451 , validation RMSE: 9.572468
epoch 100, training loss: 45.827129 , validation RMSE: 9.574351
epoch 110, training loss: 45.814171 , validation RMSE: 9.571141
epoch 120, training loss: 45.806564 , validation RMSE: 9.572041
epoch 130, training loss: 45.800705 , validation RMSE: 9.571461
epoch 140, training loss: 45.796036 , validation RMSE: 9.570953
epoch 150, training loss: 45.792992 , validation RMSE: 9.571774
epoch 160, training loss: 45.789230 , validation 

|batch size|validation RMSE|
|----|----|
|16|9.5852|
|32|9.5740|
|64|9.5725|

從實驗結果得知，batch size = 64 的 validation RMSE最低，因此選用batch size = 64作為最佳模型，並用test data算出最佳模型的RMSE

In [38]:
evaluate_RMSE(model3,x_test,y_test_demean)

Test RMSE: 9.548772811889648


#### MLP_1_dm
**Tune Hyperparameter - batch size** <br>
try different batch size = 16,32,64 and select the best batch size

In [9]:
# model structure
model1 = nn.Sequential()
model1.add(nn.Dense(45,activation='relu'),nn.Dense(1))
model1.initialize(init.Normal(sigma=0.01),ctx=gpu())
model1.collect_params().reset_ctx(gpu())

# Loss
loss = gluon.loss.L2Loss()

#trainer
trainer = gluon.Trainer(model1.collect_params(), 'sgd', {'learning_rate': 0.0001})

#train model - batch size = 16
num_epochs = 3000
train_model(model1,x_subtrain[:10000],y_subtrain_demean[:10000],x_val,y_val_demean,loss,trainer,num_epochs,batch_size=16)

epoch 10, training loss: 59.216850 , validation RMSE: 10.864101
epoch 20, training loss: 49.422592 , validation RMSE: 9.912361
epoch 30, training loss: 43.689594 , validation RMSE: 9.377723
epoch 40, training loss: 42.316460 , validation RMSE: 9.275238
epoch 50, training loss: 41.638336 , validation RMSE: 9.239089
epoch 60, training loss: 41.078587 , validation RMSE: 9.222880
epoch 70, training loss: 40.589249 , validation RMSE: 9.206519
epoch 80, training loss: 40.115952 , validation RMSE: 9.193476
epoch 90, training loss: 39.637615 , validation RMSE: 9.184632
epoch 100, training loss: 39.111950 , validation RMSE: 9.181135
epoch 110, training loss: 38.572758 , validation RMSE: 9.179420
epoch 120, training loss: 38.023556 , validation RMSE: 9.175034
epoch 130, training loss: 37.504105 , validation RMSE: 9.187937
epoch 140, training loss: 36.876846 , validation RMSE: 9.194909
epoch 150, training loss: 36.310162 , validation RMSE: 9.203779
------Early Stop------
finish training... End at

In [34]:
# model structure
model2 = nn.Sequential()
model2.add(nn.Dense(45,activation='relu'),nn.Dense(1))
model2.initialize(init.Normal(sigma=0.01),ctx=gpu())
model2.collect_params().reset_ctx(gpu())

# Loss
loss = gluon.loss.L2Loss()

#trainer
trainer = gluon.Trainer(model2.collect_params(), 'sgd', {'learning_rate': 0.0001})

#train model - batch size = 32
num_epochs = 3000
train_model(model2,x_subtrain[:10000],y_subtrain_demean[:10000],x_val,y_val_demean,loss,trainer,num_epochs,batch_size=32)

epoch 10, training loss: 59.776989 , validation RMSE: 10.917001
epoch 20, training loss: 59.443592 , validation RMSE: 10.885474
epoch 30, training loss: 57.310349 , validation RMSE: 10.682651
epoch 40, training loss: 51.828030 , validation RMSE: 10.150523
epoch 50, training loss: 46.791069 , validation RMSE: 9.659284
epoch 60, training loss: 44.252895 , validation RMSE: 9.426146
epoch 70, training loss: 43.039055 , validation RMSE: 9.323101
epoch 80, training loss: 42.404709 , validation RMSE: 9.276715
epoch 90, training loss: 41.989639 , validation RMSE: 9.252031
epoch 100, training loss: 41.637730 , validation RMSE: 9.233479
epoch 110, training loss: 41.327396 , validation RMSE: 9.220832
epoch 120, training loss: 41.040115 , validation RMSE: 9.210591
epoch 130, training loss: 40.761883 , validation RMSE: 9.202125
epoch 140, training loss: 40.475327 , validation RMSE: 9.194084
epoch 150, training loss: 40.189880 , validation RMSE: 9.190995
epoch 160, training loss: 39.890320 , validat

In [10]:
# model structure
model3 = nn.Sequential()
model3.add(nn.Dense(45,activation='relu'),nn.Dense(1))
model3.initialize(init.Normal(sigma=0.01),ctx=gpu())
model3.collect_params().reset_ctx(gpu())

# Loss
loss = gluon.loss.L2Loss()

#trainer
trainer = gluon.Trainer(model3.collect_params(), 'sgd', {'learning_rate': 0.0001})

#train model - batch size = 64
num_epochs = 3000
train_model(model3,x_subtrain[:10000],y_subtrain_demean[:10000],x_val,y_val_demean,loss,trainer,num_epochs,batch_size=64)

epoch 10, training loss: 59.808731 , validation RMSE: 10.919972
epoch 20, training loss: 59.772148 , validation RMSE: 10.916531
epoch 30, training loss: 59.669601 , validation RMSE: 10.906811
epoch 40, training loss: 59.370594 , validation RMSE: 10.878314
epoch 50, training loss: 58.544044 , validation RMSE: 10.799281
epoch 60, training loss: 56.645508 , validation RMSE: 10.616733
epoch 70, training loss: 53.587521 , validation RMSE: 10.319614
epoch 80, training loss: 50.322334 , validation RMSE: 9.999773
epoch 90, training loss: 47.722084 , validation RMSE: 9.748066
epoch 100, training loss: 45.914349 , validation RMSE: 9.578751
epoch 110, training loss: 44.693661 , validation RMSE: 9.469372
epoch 120, training loss: 43.860275 , validation RMSE: 9.398204
epoch 130, training loss: 43.282150 , validation RMSE: 9.351095
epoch 140, training loss: 42.873138 , validation RMSE: 9.319342
epoch 150, training loss: 42.568127 , validation RMSE: 9.296987
epoch 160, training loss: 42.319092 , vali

|batch size|validation RMSE|
|----|----|
|16|9.2038|
|32|9.1781|
|64|9.1696|

從實驗結果得知，batch size = 64 的 validation RMSE最低，因此選用batch size = 64作為最佳模型，並用test data算出最佳模型的RMSE

In [11]:
evaluate_RMSE(model3,x_test,y_test_demean)

Test RMSE: 9.221061706542969


#### MLP_2_dm
**Tune Hyperparameter - batch size** <br>
try different batch size = 16,32,64 and select the best batch size

In [12]:
# model structure
model1 = nn.Sequential()
model1.add(nn.Dense(45,activation='relu'),nn.Dense(45,activation='relu'),nn.Dense(1))
model1.initialize(init.Normal(sigma=0.01))
model1.collect_params().reset_ctx(gpu())

# Loss
loss = gluon.loss.L2Loss()

#trainer
trainer = gluon.Trainer(model1.collect_params(), 'sgd', {'learning_rate': 1e-4})

#train model
num_epochs = 3000
train_model(model1,x_subtrain[:10000],y_subtrain_demean[:10000],x_val,y_val_demean,loss,trainer,num_epochs,batch_size=16)

epoch 10, training loss: 59.824268 , validation RMSE: 10.921429
epoch 20, training loss: 59.824032 , validation RMSE: 10.921406
epoch 30, training loss: 59.823673 , validation RMSE: 10.921372
epoch 40, training loss: 59.823101 , validation RMSE: 10.921320
epoch 50, training loss: 59.822037 , validation RMSE: 10.921222
epoch 60, training loss: 59.819805 , validation RMSE: 10.921016
epoch 70, training loss: 59.814243 , validation RMSE: 10.920498
epoch 80, training loss: 59.796143 , validation RMSE: 10.918808
epoch 90, training loss: 59.701157 , validation RMSE: 10.909915
epoch 100, training loss: 58.197857 , validation RMSE: 10.767823
epoch 110, training loss: 43.362961 , validation RMSE: 9.344172
epoch 120, training loss: 40.638092 , validation RMSE: 9.162130
epoch 130, training loss: 38.922829 , validation RMSE: 9.112578
epoch 140, training loss: 37.089420 , validation RMSE: 9.107737
epoch 150, training loss: 34.848827 , validation RMSE: 9.134316
epoch 160, training loss: 32.489918 , v

In [47]:
# model structure
model2 = nn.Sequential()
model2.add(nn.Dense(45,activation='relu'),nn.Dense(45,activation='relu'),nn.Dense(1))
model2.initialize(init.Normal(sigma=0.01))
model2.collect_params().reset_ctx(gpu())

# Loss
loss = gluon.loss.L2Loss()

#trainer
trainer = gluon.Trainer(model2.collect_params(), 'sgd', {'learning_rate': 1e-4})


#train model
num_epochs = 3000
train_model(model2,x_subtrain[:10000],y_subtrain_demean[:10000],x_val,y_val_demean,loss,trainer,num_epochs,batch_size=32)

epoch 10, training loss: 59.824219 , validation RMSE: 10.921427
epoch 20, training loss: 59.824001 , validation RMSE: 10.921406
epoch 30, training loss: 59.823761 , validation RMSE: 10.921386
epoch 40, training loss: 59.823483 , validation RMSE: 10.921359
epoch 50, training loss: 59.823112 , validation RMSE: 10.921327
epoch 60, training loss: 59.822620 , validation RMSE: 10.921282
epoch 70, training loss: 59.821945 , validation RMSE: 10.921221
epoch 80, training loss: 59.820969 , validation RMSE: 10.921131
epoch 90, training loss: 59.819481 , validation RMSE: 10.920996
epoch 100, training loss: 59.817123 , validation RMSE: 10.920780
epoch 110, training loss: 59.813148 , validation RMSE: 10.920414
epoch 120, training loss: 59.805912 , validation RMSE: 10.919744
epoch 130, training loss: 59.791374 , validation RMSE: 10.918396
epoch 140, training loss: 59.757931 , validation RMSE: 10.915284
epoch 150, training loss: 59.664101 , validation RMSE: 10.906515
epoch 160, training loss: 59.30135

In [13]:
# model structure
model3 = nn.Sequential()
model3.add(nn.Dense(45,activation='relu'),nn.Dense(45,activation='relu'),nn.Dense(1))
model3.initialize(init.Normal(sigma=0.01))
model3.collect_params().reset_ctx(gpu())

# Loss
loss = gluon.loss.L2Loss()

#trainer
trainer = gluon.Trainer(model3.collect_params(), 'sgd', {'learning_rate': 1e-4})


#train model
num_epochs = 3000
train_model(model3,x_subtrain[:10000],y_subtrain_demean[:10000],x_val,y_val_demean,loss,trainer,num_epochs,batch_size=64)

epoch 10, training loss: 59.824131 , validation RMSE: 10.921416
epoch 20, training loss: 59.824005 , validation RMSE: 10.921405
epoch 30, training loss: 59.823883 , validation RMSE: 10.921392
epoch 40, training loss: 59.823757 , validation RMSE: 10.921380
epoch 50, training loss: 59.823620 , validation RMSE: 10.921367
epoch 60, training loss: 59.823475 , validation RMSE: 10.921353
epoch 70, training loss: 59.823311 , validation RMSE: 10.921337
epoch 80, training loss: 59.823124 , validation RMSE: 10.921320
epoch 90, training loss: 59.822918 , validation RMSE: 10.921299
epoch 100, training loss: 59.822674 , validation RMSE: 10.921276
epoch 110, training loss: 59.822388 , validation RMSE: 10.921249
epoch 120, training loss: 59.822056 , validation RMSE: 10.921217
epoch 130, training loss: 59.821651 , validation RMSE: 10.921179
epoch 140, training loss: 59.821167 , validation RMSE: 10.921133
epoch 150, training loss: 59.820576 , validation RMSE: 10.921077
epoch 160, training loss: 59.81985

|batch size|validation RMSE|
|----|----|
|16|9.9207|
|32|9.3392|
|64|9.1078|

從實驗結果得知，batch size = 64 的 validation RMSE最低，因此選用batch size = 64作為最佳模型，並用test data算出最佳模型的RMSE

In [14]:
evaluate_RMSE(model3,x_test,y_test_demean)

Test RMSE: 9.177852630615234


#### MLP_2_dm_L2

**Tune Hyperparameter - weight decay** <br>
try different weight decay (wd) = [1,0.1,0.01] and select the best weight decay

In [18]:
# model structure
model1 = nn.Sequential()
model1.add(nn.Dense(45,activation='relu'),nn.Dense(45,activation='relu'),nn.Dense(1))
model1.initialize(init.Normal(sigma=0.01))
model1.collect_params().reset_ctx(gpu())

# Loss
loss = gluon.loss.L2Loss()

#trainer
wd = 1
trainer = gluon.Trainer(model1.collect_params(), 'sgd', {'learning_rate': 2e-4, 'wd': wd})
model1.collect_params('.*bias').setattr('wd_mult', 0)

#train model
num_epochs = 3000
train_model(model1,x_subtrain[:10000],y_subtrain_demean[:10000],x_val,y_val_demean,loss,trainer,num_epochs,batch_size=64)

epoch 10, training loss: 59.824444 , validation RMSE: 10.921448
epoch 20, training loss: 59.824413 , validation RMSE: 10.921443
epoch 30, training loss: 59.824406 , validation RMSE: 10.921442
epoch 40, training loss: 59.824413 , validation RMSE: 10.921442
epoch 50, training loss: 59.824413 , validation RMSE: 10.921441
epoch 60, training loss: 59.824413 , validation RMSE: 10.921441
epoch 70, training loss: 59.824413 , validation RMSE: 10.921441
epoch 80, training loss: 59.824413 , validation RMSE: 10.921440
epoch 90, training loss: 59.824413 , validation RMSE: 10.921441
epoch 100, training loss: 59.824406 , validation RMSE: 10.921440
epoch 110, training loss: 59.824413 , validation RMSE: 10.921440
epoch 120, training loss: 59.824413 , validation RMSE: 10.921440
epoch 130, training loss: 59.824413 , validation RMSE: 10.921440
epoch 140, training loss: 59.824406 , validation RMSE: 10.921440
epoch 150, training loss: 59.824413 , validation RMSE: 10.921440
------Early Stop------
finish trai

In [19]:
model2 = nn.Sequential()
model2.add(nn.Dense(45,activation='relu'),nn.Dense(45,activation='relu'),nn.Dense(1))
model2.initialize(init.Normal(sigma=0.01))
model2.collect_params().reset_ctx(gpu())

# Loss
loss = gluon.loss.L2Loss()

#trainer
wd = 0.1
trainer = gluon.Trainer(model2.collect_params(), 'sgd', {'learning_rate': 2e-4, 'wd': wd})
model2.collect_params('.*bias').setattr('wd_mult', 0)

#train model
num_epochs = 3000
train_model(model2,x_subtrain[:10000],y_subtrain_demean[:10000],x_val,y_val_demean,loss,trainer,num_epochs,batch_size=64)

epoch 10, training loss: 59.824329 , validation RMSE: 10.921434
epoch 20, training loss: 59.824139 , validation RMSE: 10.921415
epoch 30, training loss: 59.823982 , validation RMSE: 10.921401
epoch 40, training loss: 59.823837 , validation RMSE: 10.921388
epoch 50, training loss: 59.823700 , validation RMSE: 10.921374
epoch 60, training loss: 59.823555 , validation RMSE: 10.921360
epoch 70, training loss: 59.823395 , validation RMSE: 10.921345
epoch 80, training loss: 59.823212 , validation RMSE: 10.921327
epoch 90, training loss: 59.822994 , validation RMSE: 10.921307
epoch 100, training loss: 59.822731 , validation RMSE: 10.921282
epoch 110, training loss: 59.822399 , validation RMSE: 10.921249
epoch 120, training loss: 59.821976 , validation RMSE: 10.921209
epoch 130, training loss: 59.821419 , validation RMSE: 10.921157
epoch 140, training loss: 59.820675 , validation RMSE: 10.921086
epoch 150, training loss: 59.819645 , validation RMSE: 10.920988
epoch 160, training loss: 59.81816

In [20]:
model3 = nn.Sequential()
model3.add(nn.Dense(45,activation='relu'),nn.Dense(45,activation='relu'),nn.Dense(1))
model3.initialize(init.Normal(sigma=0.01))
model3.collect_params().reset_ctx(gpu())

# Loss
loss = gluon.loss.L2Loss()

#trainer
wd = 0.01
trainer = gluon.Trainer(model3.collect_params(), 'sgd', {'learning_rate': 2e-4, 'wd': wd})
model3.collect_params('.*bias').setattr('wd_mult', 0)

#train model
num_epochs = 3000
train_model(model3,x_subtrain[:10000],y_subtrain_demean[:10000],x_val,y_val_demean,loss,trainer,num_epochs,batch_size=64)

epoch 10, training loss: 59.824451 , validation RMSE: 10.921448
epoch 20, training loss: 59.824326 , validation RMSE: 10.921435
epoch 30, training loss: 59.824211 , validation RMSE: 10.921423
epoch 40, training loss: 59.824100 , validation RMSE: 10.921412
epoch 50, training loss: 59.823975 , validation RMSE: 10.921400
epoch 60, training loss: 59.823818 , validation RMSE: 10.921385
epoch 70, training loss: 59.823620 , validation RMSE: 10.921366
epoch 80, training loss: 59.823364 , validation RMSE: 10.921340
epoch 90, training loss: 59.823032 , validation RMSE: 10.921309
epoch 100, training loss: 59.822586 , validation RMSE: 10.921266
epoch 110, training loss: 59.821968 , validation RMSE: 10.921206
epoch 120, training loss: 59.821075 , validation RMSE: 10.921121
epoch 130, training loss: 59.819733 , validation RMSE: 10.920992
epoch 140, training loss: 59.817612 , validation RMSE: 10.920791
epoch 150, training loss: 59.814068 , validation RMSE: 10.920451
epoch 160, training loss: 59.80767

|weight decay|validation RMSE|
|----|----|
|1|10.9214|
|0.1|9.1890|
|0.01|9.2794|

從實驗結果得知，weight decay=0.1 的 validation RMSE最低，因此選用weight decay=0.1作為最佳模型，並用test data算出最佳模型的RMSE

In [21]:
evaluate_RMSE(model2,x_test,y_test_demean)

Test RMSE: 9.311225891113281


#### MLP_2_dm_dropout
**Tune Hyperparameter - batch size** <br>
try different batch size = 32,64,128 and select the best batch size

In [22]:
# model structure
model2 = nn.Sequential()
model2.add(nn.Dense(45,activation='relu'),
          nn.Dropout(0.5),
          nn.Dense(45,activation='relu'),
          nn.Dropout(0.5),
          nn.Dense(1))
model2.initialize(init.Normal(sigma=0.01))
model2.collect_params().reset_ctx(gpu())

# Loss
loss = gluon.loss.L2Loss()

#trainer
trainer = gluon.Trainer(model2.collect_params(), 'sgd', {'learning_rate': 0.0005})

#train model
num_epochs = 3000
train_model(model2,x_subtrain[:10000],y_subtrain_demean[:10000],x_val,y_val_demean,loss,trainer,num_epochs,batch_size=32)

epoch 10, training loss: 59.823814 , validation RMSE: 10.921388
epoch 20, training loss: 59.822037 , validation RMSE: 10.921223
epoch 30, training loss: 59.808613 , validation RMSE: 10.919962
epoch 40, training loss: 58.588261 , validation RMSE: 10.804729
epoch 50, training loss: 43.063652 , validation RMSE: 9.346679
epoch 60, training loss: 39.862694 , validation RMSE: 9.152026
epoch 70, training loss: 37.823639 , validation RMSE: 9.086305
epoch 80, training loss: 35.823441 , validation RMSE: 9.076674
epoch 90, training loss: 35.060814 , validation RMSE: 9.110268
epoch 100, training loss: 33.458622 , validation RMSE: 9.121112
epoch 110, training loss: 32.680523 , validation RMSE: 9.160365
epoch 120, training loss: 31.453310 , validation RMSE: 9.150271
epoch 130, training loss: 30.698637 , validation RMSE: 9.177656
epoch 140, training loss: 30.231415 , validation RMSE: 9.224083
epoch 150, training loss: 29.239084 , validation RMSE: 9.239199
------Early Stop------
finish training... End

In [77]:
# model structure
model3 = nn.Sequential()
model3.add(nn.Dense(45,activation='relu'),
          nn.Dropout(0.5),
          nn.Dense(45,activation='relu'),
          nn.Dropout(0.5),
          nn.Dense(1))
model3.initialize(init.Normal(sigma=0.01))
model3.collect_params().reset_ctx(gpu())

# Loss
loss = gluon.loss.L2Loss()

#trainer
trainer = gluon.Trainer(model3.collect_params(), 'sgd', {'learning_rate': 0.0005})

#train model
num_epochs = 3000
train_model(model3,x_subtrain[:10000],y_subtrain_demean[:10000],x_val,y_val_demean,loss,trainer,num_epochs,batch_size=64)

epoch 10, training loss: 59.824188 , validation RMSE: 10.921421
epoch 20, training loss: 59.823689 , validation RMSE: 10.921371
epoch 30, training loss: 59.822861 , validation RMSE: 10.921292
epoch 40, training loss: 59.821087 , validation RMSE: 10.921125
epoch 50, training loss: 59.816257 , validation RMSE: 10.920669
epoch 60, training loss: 59.797539 , validation RMSE: 10.918904
epoch 70, training loss: 59.639805 , validation RMSE: 10.904027
epoch 80, training loss: 52.514893 , validation RMSE: 10.216157
epoch 90, training loss: 42.772877 , validation RMSE: 9.295037
epoch 100, training loss: 41.333805 , validation RMSE: 9.216601
epoch 110, training loss: 40.133137 , validation RMSE: 9.174685
epoch 120, training loss: 39.071373 , validation RMSE: 9.131564
epoch 130, training loss: 37.756714 , validation RMSE: 9.108133
epoch 140, training loss: 36.945789 , validation RMSE: 9.102376
epoch 150, training loss: 35.811268 , validation RMSE: 9.077980
epoch 160, training loss: 35.075748 , val

In [23]:
# model structure
model4 = nn.Sequential()
model4.add(nn.Dense(45,activation='relu'),
          nn.Dropout(0.5),
          nn.Dense(45,activation='relu'),
          nn.Dropout(0.5),
          nn.Dense(1))
model4.initialize(init.Normal(sigma=0.01))
model4.collect_params().reset_ctx(gpu())

# Loss
loss = gluon.loss.L2Loss()

#trainer
trainer = gluon.Trainer(model4.collect_params(), 'sgd', {'learning_rate': 0.0005})

#train model
num_epochs = 3000
train_model(model4,x_subtrain[:10000],y_subtrain_demean[:10000],x_val,y_val_demean,loss,trainer,num_epochs,batch_size=128)

epoch 10, training loss: 59.824444 , validation RMSE: 10.921449
epoch 20, training loss: 59.824295 , validation RMSE: 10.921433
epoch 30, training loss: 59.824169 , validation RMSE: 10.921421
epoch 40, training loss: 59.824032 , validation RMSE: 10.921408
epoch 50, training loss: 59.823864 , validation RMSE: 10.921391
epoch 60, training loss: 59.823643 , validation RMSE: 10.921371
epoch 70, training loss: 59.823364 , validation RMSE: 10.921343
epoch 80, training loss: 59.822948 , validation RMSE: 10.921305
epoch 90, training loss: 59.822350 , validation RMSE: 10.921248
epoch 100, training loss: 59.821449 , validation RMSE: 10.921164
epoch 110, training loss: 59.819923 , validation RMSE: 10.921020
epoch 120, training loss: 59.817261 , validation RMSE: 10.920769
epoch 130, training loss: 59.812164 , validation RMSE: 10.920287
epoch 140, training loss: 59.801361 , validation RMSE: 10.919272
epoch 150, training loss: 59.773632 , validation RMSE: 10.916666
epoch 160, training loss: 59.68448

|batch size|validation RMSE|
|----|----|
|32|9.2392|
|64|9.1633|
|128|9.1066|

從實驗結果得知，batch size = 128 的 validation RMSE最低，因此選用batch size = 128作為最佳模型，並用test data算出最佳模型的RMSE

In [24]:
evaluate_RMSE(model4,x_test,y_test_demean)

Test RMSE: 9.20824146270752


#### MLP_2_ykeep
**Tune Hyperparameter - batch size** <br>
try different batch size = 16,32,64 and select the best batch size

In [27]:
# model structure
model1 = nn.Sequential()
model1.add(nn.Dense(45,activation='relu'),
          nn.Dense(45,activation='relu'),
          nn.Dense(1))
model1.initialize(init.Normal(sigma=0.0005))
model1.collect_params().reset_ctx(gpu())

# Loss
loss = gluon.loss.L2Loss()

#trainer
trainer = gluon.Trainer(model1.collect_params(), 'sgd', {'learning_rate': 0.005})

#train model
num_epochs = 3000
train_model(model1,x_subtrain[:10000],y_subtrain[:10000],x_val,y_val,loss,trainer,num_epochs,batch_size=16)

epoch 10, training loss: 59.825306 , validation RMSE: 10.921534
epoch 20, training loss: 59.824612 , validation RMSE: 10.921453
epoch 30, training loss: 59.828289 , validation RMSE: 10.921822
epoch 40, training loss: 59.825462 , validation RMSE: 10.921523
epoch 50, training loss: 59.826370 , validation RMSE: 10.921638
epoch 60, training loss: 59.824806 , validation RMSE: 10.921469
epoch 70, training loss: 59.824413 , validation RMSE: 10.921440
epoch 80, training loss: 59.827911 , validation RMSE: 10.921786
epoch 90, training loss: 59.829426 , validation RMSE: 10.921870
epoch 100, training loss: 59.830700 , validation RMSE: 10.921983
------Early Stop------
finish training... End at Epoch  100


In [91]:
# model structure
model2 = nn.Sequential()
model2.add(nn.Dense(45,activation='relu'),
          nn.Dense(45,activation='relu'),
          nn.Dense(1))
model2.initialize(init.Normal(sigma=0.0005))
model2.collect_params().reset_ctx(gpu())

# Loss
loss = gluon.loss.L2Loss()

#trainer
trainer = gluon.Trainer(model2.collect_params(), 'sgd', {'learning_rate': 0.005})

#train model
num_epochs = 3000
train_model(model2,x_subtrain[:10000],y_subtrain[:10000],x_val,y_val,loss,trainer,num_epochs,batch_size=32)

epoch 10, training loss: 721879367680.000000 , validation RMSE: 1201565.125000
epoch 20, training loss: 59.848381 , validation RMSE: 10.923569
epoch 30, training loss: 59.826138 , validation RMSE: 10.921581
epoch 40, training loss: 59.824623 , validation RMSE: 10.921454
epoch 50, training loss: 59.825649 , validation RMSE: 10.921569
epoch 60, training loss: 59.825108 , validation RMSE: 10.921515
epoch 70, training loss: 59.825092 , validation RMSE: 10.921514
epoch 80, training loss: 59.824875 , validation RMSE: 10.921492
epoch 90, training loss: 59.825207 , validation RMSE: 10.921524
epoch 100, training loss: 59.824913 , validation RMSE: 10.921495
epoch 110, training loss: 59.824417 , validation RMSE: 10.921440
epoch 120, training loss: 59.825256 , validation RMSE: 10.921506
epoch 130, training loss: 59.827732 , validation RMSE: 10.921720
epoch 140, training loss: 59.824425 , validation RMSE: 10.921444
epoch 150, training loss: 59.824905 , validation RMSE: 10.921476
epoch 160, training

In [25]:
# model structure
model3 = nn.Sequential()
model3.add(nn.Dense(45,activation='relu'),
          nn.Dense(45,activation='relu'),
          nn.Dense(1))
model3.initialize(init.Normal(sigma=0.0005))
model3.collect_params().reset_ctx(gpu())

# Loss
loss = gluon.loss.L2Loss()

#trainer
trainer = gluon.Trainer(model3.collect_params(), 'sgd', {'learning_rate': 0.005})

#train model
num_epochs = 3000
train_model(model3,x_subtrain[:10000],y_subtrain[:10000],x_val,y_val,loss,trainer,num_epochs,batch_size=64)

epoch 10, training loss: 7744604.500000 , validation RMSE: 3935.636475
epoch 20, training loss: 61.090042 , validation RMSE: 11.037183
epoch 30, training loss: 59.824463 , validation RMSE: 10.921441
epoch 40, training loss: 59.824520 , validation RMSE: 10.921454
epoch 50, training loss: 59.824905 , validation RMSE: 10.921494
epoch 60, training loss: 59.824417 , validation RMSE: 10.921443
epoch 70, training loss: 59.824520 , validation RMSE: 10.921454
epoch 80, training loss: 59.824417 , validation RMSE: 10.921440
epoch 90, training loss: 59.824612 , validation RMSE: 10.921453
epoch 100, training loss: 59.824638 , validation RMSE: 10.921467
epoch 110, training loss: 59.824501 , validation RMSE: 10.921453
epoch 120, training loss: 59.824471 , validation RMSE: 10.921449
epoch 130, training loss: 59.824417 , validation RMSE: 10.921439
epoch 140, training loss: 59.824642 , validation RMSE: 10.921454
epoch 150, training loss: 59.824505 , validation RMSE: 10.921444
epoch 160, training loss: 5

|batch size|validation RMSE|
|----|----|
|16|10.9220|
|32|10.9214|
|64|10.9215|

從實驗結果得知，batch size = 32 的 validation RMSE最低，因此選用batch size = 32作為最佳模型，並用test data算出最佳模型的RMSE

In [92]:
evaluate_RMSE(model2,x_test,y_test)

Test RMSE: 10.852547645568848


#### MLP_2_ykeep_L2
**Tune Hyperparameter - weight decay** <br>
try different weight decay = [0.1,0.05] and select the best weight decay

In [30]:
# model structure
model1 = nn.Sequential()
model1.add(nn.Dense(45,activation='relu'),
          nn.Dense(45,activation='relu'),
          nn.Dense(1))
model1.initialize(init.Normal(sigma=0.0002))
model1.collect_params().reset_ctx(gpu())

# Loss
loss = gluon.loss.L2Loss()

#trainer
wd = 0.1
trainer = gluon.Trainer(model1.collect_params(), 'sgd', {'learning_rate': 0.005,'wd':wd})
model1.collect_params('.*bias').setattr('wd_mult', 0)

num_epochs = 3000
train_model(model1,x_subtrain[:10000],y_subtrain[:10000],x_val,y_val,loss,trainer,num_epochs,batch_size=64)

epoch 10, training loss: 1882805248.000000 , validation RMSE: 61364.574219
epoch 20, training loss: 356.574707 , validation RMSE: 26.700857
epoch 30, training loss: 59.824482 , validation RMSE: 10.921451
epoch 40, training loss: 59.824486 , validation RMSE: 10.921444
epoch 50, training loss: 59.824413 , validation RMSE: 10.921440
epoch 60, training loss: 59.824413 , validation RMSE: 10.921441
epoch 70, training loss: 59.824417 , validation RMSE: 10.921443
epoch 80, training loss: 59.824638 , validation RMSE: 10.921456
epoch 90, training loss: 59.824417 , validation RMSE: 10.921442
epoch 100, training loss: 59.824432 , validation RMSE: 10.921444
------Early Stop------
finish training... End at Epoch  100


In [98]:
# model structure
model2 = nn.Sequential()
model2.add(nn.Dense(45,activation='relu'),
          nn.Dense(45,activation='relu'),
          nn.Dense(1))
model2.initialize(init.Normal(sigma=0.0002))
model2.collect_params().reset_ctx(gpu())

# Loss
loss = gluon.loss.L2Loss()

#trainer
wd = 0.05
trainer = gluon.Trainer(model2.collect_params(), 'sgd', {'learning_rate': 0.005,'wd':wd})
model2.collect_params('.*bias').setattr('wd_mult', 0)

num_epochs = 3000
train_model(model2,x_subtrain[:10000],y_subtrain[:10000],x_val,y_val,loss,trainer,num_epochs,batch_size=64)

epoch 10, training loss: 828402496.000000 , validation RMSE: 40703.875000
epoch 20, training loss: 190.613693 , validation RMSE: 19.518221
epoch 30, training loss: 59.824444 , validation RMSE: 10.921440
epoch 40, training loss: 59.824608 , validation RMSE: 10.921463
epoch 50, training loss: 59.824417 , validation RMSE: 10.921439
epoch 60, training loss: 59.824436 , validation RMSE: 10.921445
epoch 70, training loss: 59.824661 , validation RMSE: 10.921457
epoch 80, training loss: 59.824471 , validation RMSE: 10.921449
epoch 90, training loss: 59.824593 , validation RMSE: 10.921462
epoch 100, training loss: 59.824455 , validation RMSE: 10.921447
------Early Stop------
finish training... End at Epoch  100


|weight decay|validation RMSE|
|----|----|
|0.05|10.921444|
|0.1|10.921447|

從實驗結果得知，weight decay = 0.1 的 validation RMSE最低，因此選用weight decay = 0.1作為最佳模型，並用test data算出最佳模型的RMSE

In [99]:
evaluate_RMSE(model2,x_test,y_test)

Test RMSE: 10.852538108825684


#### MLP_2_ykeep_dropout
**Tune Hyperparameter - batch size** <br>
try different batch size = 64,128 and select the best batch size

In [106]:
model3 = nn.Sequential()
model3.add(nn.Dense(45,activation='relu'),
          nn.Dropout(0.5),
          nn.Dense(45,activation='relu'),
          nn.Dropout(0.5),
          nn.Dense(1))
model3.initialize(init.Normal(sigma=0.005))
model3.collect_params().reset_ctx(gpu())

# Loss
loss = gluon.loss.L2Loss()

#trainer
trainer = gluon.Trainer(model3.collect_params(), 'sgd', {'learning_rate': 0.005})

num_epochs = 3000
train_model(model3,x_subtrain[:10000],y_subtrain[:10000],x_val,y_val,loss,trainer,num_epochs,batch_size=64)

epoch 10, training loss: 952378898757189632.000000 , validation RMSE: 1380129792.000000
epoch 20, training loss: 149953789952.000000 , validation RMSE: 547638.187500
epoch 30, training loss: 23667.128906 , validation RMSE: 217.566757
epoch 40, training loss: 59.827457 , validation RMSE: 10.921741
epoch 50, training loss: 59.824520 , validation RMSE: 10.921453
epoch 60, training loss: 59.824413 , validation RMSE: 10.921440
epoch 70, training loss: 59.824413 , validation RMSE: 10.921441
epoch 80, training loss: 59.824413 , validation RMSE: 10.921441
epoch 90, training loss: 59.824512 , validation RMSE: 10.921446
epoch 100, training loss: 59.824413 , validation RMSE: 10.921440
epoch 110, training loss: 59.824486 , validation RMSE: 10.921444
epoch 120, training loss: 59.824474 , validation RMSE: 10.921450
epoch 130, training loss: 59.824455 , validation RMSE: 10.921440
epoch 140, training loss: 59.824413 , validation RMSE: 10.921440
epoch 150, training loss: 59.824493 , validation RMSE: 10

In [38]:
model4 = nn.Sequential()
model4.add(nn.Dense(45,activation='relu'),
          nn.Dropout(0.5),
          nn.Dense(45,activation='relu'),
          nn.Dropout(0.5),
          nn.Dense(1))
model4.initialize(init.Normal(sigma=0.005))
model4.collect_params().reset_ctx(gpu())

# Loss
loss = gluon.loss.L2Loss()

#trainer
trainer = gluon.Trainer(model4.collect_params(), 'sgd', {'learning_rate': 0.005})

num_epochs = 3000
train_model(model4,x_subtrain[:10000],y_subtrain[:10000],x_val,y_val,loss,trainer,num_epochs,batch_size=128)

epoch 10, training loss: 1663326413063255490560.000000 , validation RMSE: 57677144064.000000
epoch 20, training loss: 660012640898121728.000000 , validation RMSE: 1148923520.000000
epoch 30, training loss: 261894791233536.000000 , validation RMSE: 22886450.000000
epoch 40, training loss: 103920574464.000000 , validation RMSE: 455895.937500
epoch 50, training loss: 41236160.000000 , validation RMSE: 9081.430664
epoch 60, training loss: 16422.419922 , validation RMSE: 181.233658
epoch 70, training loss: 66.318138 , validation RMSE: 11.501677
epoch 80, training loss: 59.826588 , validation RMSE: 10.921659
epoch 90, training loss: 59.824425 , validation RMSE: 10.921440
epoch 100, training loss: 59.824417 , validation RMSE: 10.921439
epoch 110, training loss: 59.824413 , validation RMSE: 10.921441
epoch 120, training loss: 59.824413 , validation RMSE: 10.921439
epoch 130, training loss: 59.824413 , validation RMSE: 10.921440
epoch 140, training loss: 59.824425 , validation RMSE: 10.921443
e

|batch size|validation RMSE|
|----|----|
|64|10.9215|
|128|10.9214|

從實驗結果得知，batch size = 128 的 validation RMSE最低，因此選用batch size = 128作為最佳模型，並用test data算出最佳模型的RMSE

In [39]:
evaluate_RMSE(model4,x_test,y_test)

Test RMSE: 10.852678298950195


#### MLP_2_dm_dropout_full
**Tune Hyperparameter - batch size** <br>
try different batch size = 256,512 and select the best batch size

In [110]:
# model structure
model1 = nn.Sequential()
model1.add(nn.Dense(45,activation='relu'),
          nn.Dropout(0.5),
          nn.Dense(45,activation='relu'),
          nn.Dropout(0.5),
          nn.Dense(1))
model1.initialize(init.Normal(sigma=0.01))
model1.collect_params().reset_ctx(gpu())

# Loss
loss = gluon.loss.L2Loss()

#trainer
trainer = gluon.Trainer(model1.collect_params(), 'sgd', {'learning_rate': 0.01})

num_epochs = 50
train_model(model1,x_subtrain,y_subtrain_demean,x_val,y_val_demean,loss,trainer,num_epochs,batch_size=256,early_stop=False)

epoch 10, training loss: 38.769062 , validation RMSE: 8.782681
epoch 20, training loss: 38.647022 , validation RMSE: 8.780467
epoch 30, training loss: 38.447243 , validation RMSE: 8.764668
epoch 40, training loss: 38.209610 , validation RMSE: 8.746068
epoch 50, training loss: 37.873760 , validation RMSE: 8.707487
finish training... End at Epoch  50


In [40]:
# model structure
model2 = nn.Sequential()
model2.add(nn.Dense(45,activation='relu'),
          nn.Dropout(0.5),
          nn.Dense(45,activation='relu'),
          nn.Dropout(0.5),
          nn.Dense(1))
model2.initialize(init.Normal(sigma=0.01))
model2.collect_params().reset_ctx(gpu())

# Loss
loss = gluon.loss.L2Loss()

#trainer
trainer = gluon.Trainer(model2.collect_params(), 'sgd', {'learning_rate': 0.01})

#
num_epochs = 50
train_model(model2,x_subtrain,y_subtrain_demean,x_val,y_val_demean,loss,trainer,num_epochs,batch_size=512,early_stop=False)

epoch 10, training loss: 38.663067 , validation RMSE: 8.753753
epoch 20, training loss: 38.327370 , validation RMSE: 8.732924
epoch 30, training loss: 38.236294 , validation RMSE: 8.737166
epoch 40, training loss: 38.118820 , validation RMSE: 8.723217
epoch 50, training loss: 37.967419 , validation RMSE: 8.715676
finish training... End at Epoch  50


|batch size|validation RMSE|
|----|----|
|256|8.7075|
|512|8.7157|

從實驗結果得知，batch size = 256 的 validation RMSE最低，因此選用batch size = 256作為最佳模型，並用test data算出最佳模型的RMSE

In [112]:
evaluate_RMSE(model1,x_test,y_test_demean)

Test RMSE: 8.868842124938965


### Q2
Summarize test RMSE in one table. Discuss your findings

|Case|Test RMSE|
|----|----|
|OLS|9.5507|
|MLP_0_dm|9.5488|
|MLP_1_dm|9.2210|
|MLP_2_dm|9.1779|
|MLP_2_dm_L2|9.3112|
|MLP_2_dm_dropout|9.2082|
|MLP_2_ykeep|10.8525|
|MLP_2_ykeep_L2|10.8525|
|MLP_2_ykeep_dropout|10.8527|
|MLP_2_dm_dropout_full|8.8688|

我們使用test RMSE評估各種模型的成效，RMSE越低表示模型成效越好。

OLS和MLP_0_dm都是Linear Regression，差異在於MLP_0_dm找模型最佳解的方式，使用gradient descent，兩者的模型成效差異不大，MLP_0_dm略佳。<br>

以MLP模型layer數多寡來看，MLP_0_dm多加一層hidden layer變成MLP_1_dm，模型的成效提升，可見多加一層是比較好的。但若再加一層hidden layer變成MLP_2_dm，雖然模型的成效還是有提升，卻比較容易有Overfitting的問題。

由於MLP_2_dm容易有Overfitting的問題，所以嘗試兩種解決Overfitting的方式 - L2 Regularization and Dropout。我發現MLP_2_dm_dropout比MLP_2_dm_L2的模型成效還要好，有可能是在L2還沒有tune到更好的weight decay，又或是在這個資料上，使用dropout比使用L2 Regularization還要好

至於在資料上事先對y做de-mean，對MLP模型成效會有甚麼影響。我發現沒有做de-mean所訓練的模型，在訓練初期會有比較大的training loss，在給定初始模型weight上要特別設定，初始的weight不要太大，不然容易有loss過大降不下來的情形發生。除此之外，沒做de-mean的話比較容易讓training loss卡在local minima，就不像有對資料做de-mean的模型，可以讓training loss和validation RMSE再下降。對沒做de-mean的模型而言，有沒有做L2 regularization和dropout並不太影響模型成效。

最後，可以從MLP_2_dm_dropout_full得知，當訓練資料變多時，事先對y做de-mean，採用兩層hidden layer(MLP_2)並使用Dropout的MLP模型，其模型成效比前面所有的Case model都還要好(Test RMSE最小)。可見在這個Dataset上，訓練資料比較多，MLP模型自然也會表現得比較好。