## [作業重點]
確保你了解隨機森林模型中每個超參數的意義，並觀察調整超參數對結果的影響

## 作業

1. 試著調整 RandomForestClassifier(...) 中的參數，並觀察是否會改變結果？
2. 改用其他資料集 (boston, wine)，並與回歸模型與決策樹的結果進行比較

In [1]:
from sklearn import datasets, metrics
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

In [2]:
def train_regr_from_data(regr, skdataset):
    data = skdataset.data
    target = skdataset.target
    
    x_train, x_test, y_train, y_test = train_test_split(data, target, test_size=0.2, random_state=4)
    
    regr.fit(x_train, y_train)
    
    y_pred = regr.predict(x_test)
    
    print(f'train score:\t{regr.score(x_train, y_train):.2f}')
    print(f'test score:\t{regr.score(x_test, y_test):.2f}')
    
    return y_pred, y_test, regr

In [3]:
wine = datasets.load_wine()

In [4]:
rfc_20_4 = RandomForestClassifier(n_estimators=20, max_depth=4)

rfc_20_4_pred, rfc_20_4_test, rfc_20_4_train_rr = train_regr_from_data(rfc_20_4, wine)

print(f'accuracy score: {metrics.accuracy_score(rfc_20_4_pred, rfc_20_4_test)}')

train score:	0.99
test score:	1.00
accuracy score: 1.0


In [5]:
rfc_10_4 = RandomForestClassifier(n_estimators=10, max_depth=4)

rfc_10_4_pred, rfc_10_4_test, rfc_10_4_train_rr = train_regr_from_data(rfc_10_4, wine)

print(f'accuracy score: {metrics.accuracy_score(rfc_10_4_pred, rfc_10_4_test)}')

train score:	1.00
test score:	1.00
accuracy score: 1.0


In [6]:
rfc_5_4 = RandomForestClassifier(n_estimators=5, max_depth=4)

rfc_5_4_pred, rfc_5_4_test, rfc_5_4_train_rr = train_regr_from_data(rfc_5_4, wine)

print(f'accuracy score: {metrics.accuracy_score(rfc_5_4_pred, rfc_5_4_test)}')

train score:	0.99
test score:	0.92
accuracy score: 0.9166666666666666


In [7]:
from sklearn.ensemble import RandomForestRegressor

In [8]:
diabetes = datasets.load_diabetes()

In [9]:
rfr_40_4 = RandomForestRegressor(n_estimators=40, max_depth=4)

rfr_40_4_pred, rfr_40_4_test, rfr_40_4_train_rr = train_regr_from_data(rfr_40_4, diabetes)

print(f'mean squared error: {metrics.mean_squared_error(rfr_40_4_pred, rfr_40_4_test)}')

train score:	0.66
test score:	0.38
mean squared error: 3396.294831255442


In [10]:
rfr_20_4 = RandomForestRegressor(n_estimators=20, max_depth=4)

rfr_20_4_pred, rfr_20_4_test, rfr_20_4_train_rr = train_regr_from_data(rfr_20_4, diabetes)

print(f'mean squared error: {metrics.mean_squared_error(rfr_20_4_pred, rfr_20_4_test)}')

train score:	0.66
test score:	0.39
mean squared error: 3313.189913582593


In [11]:
rfr_40_2 = RandomForestRegressor(n_estimators=40, max_depth=2)

rfr_40_2_pred, rfr_40_2_test, rfr_40_2_train_rr = train_regr_from_data(rfr_40_2, diabetes)

print(f'mean squared error: {metrics.mean_squared_error(rfr_40_2_pred, rfr_40_2_test)}')

train score:	0.50
test score:	0.42
mean squared error: 3152.0380179211347


In [12]:
rfr_10_4 = RandomForestRegressor(n_estimators=10, max_depth=4)

rfr_10_4_pred, rfr_10_4_test, rfr_10_4_train_rr = train_regr_from_data(rfr_10_4, diabetes)

print(f'mean squared error: {metrics.mean_squared_error(rfr_10_4_pred, rfr_10_4_test)}')

train score:	0.65
test score:	0.40
mean squared error: 3272.061800972709


In [13]:
rfr_10_6 = RandomForestRegressor(n_estimators=10, max_depth=6)

rfr_10_6_pred, rfr_10_6_test, rfr_10_6_train_rr = train_regr_from_data(rfr_10_6, diabetes)

print(f'mean squared error: {metrics.mean_squared_error(rfr_10_6_pred, rfr_10_6_test)}')

train score:	0.78
test score:	0.32
mean squared error: 3728.6175769542383
