## [作業重點]
確保你了解隨機森林模型中每個超參數的意義，並觀察調整超參數對結果的影響

## 作業

1. 試著調整 RandomForestClassifier(...) 中的參數，並觀察是否會改變結果？
2. 改用其他資料集 (boston, wine)，並與回歸模型與決策樹的結果進行比較

In [1]:
from sklearn import datasets, metrics
from sklearn.ensemble import RandomForestClassifier
from sklearn import svm
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

In [2]:
iris = datasets.load_iris()
x_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.25, random_state=4)

In [3]:
for num_est in [100, 200, 300, 400]:
    for max_depth in [5, 6, 7, 8]:
        clf = RandomForestClassifier(
            n_estimators=num_est, 
            max_depth=max_depth, 
            bootstrap=True)
        clf.fit(x_train, y_train)
        y_pred = clf.predict(x_test)
        acc = metrics.accuracy_score(y_test, y_pred)
        print("# of estimators: {} | Max depth: {} | Accuracy: {}".format(num_est, max_depth, acc))

# of estimators: 100 | Max depth: 5 | Accuracy: 0.9736842105263158
# of estimators: 100 | Max depth: 6 | Accuracy: 0.9736842105263158
# of estimators: 100 | Max depth: 7 | Accuracy: 0.9736842105263158
# of estimators: 100 | Max depth: 8 | Accuracy: 0.9736842105263158
# of estimators: 200 | Max depth: 5 | Accuracy: 0.9736842105263158
# of estimators: 200 | Max depth: 6 | Accuracy: 0.9736842105263158
# of estimators: 200 | Max depth: 7 | Accuracy: 0.9736842105263158
# of estimators: 200 | Max depth: 8 | Accuracy: 0.9736842105263158
# of estimators: 300 | Max depth: 5 | Accuracy: 0.9736842105263158
# of estimators: 300 | Max depth: 6 | Accuracy: 0.9736842105263158
# of estimators: 300 | Max depth: 7 | Accuracy: 0.9736842105263158
# of estimators: 300 | Max depth: 8 | Accuracy: 0.9736842105263158
# of estimators: 400 | Max depth: 5 | Accuracy: 0.9736842105263158
# of estimators: 400 | Max depth: 6 | Accuracy: 0.9736842105263158
# of estimators: 400 | Max depth: 7 | Accuracy: 0.973684210526

In [4]:
boston = datasets.load_boston()
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.25, random_state=4)

In [5]:
for c in [1, 2, 3, 4, 5, 6]:
    reg = make_pipeline(StandardScaler(), svm.SVR(C=c))
    reg.fit(x_train, y_train)
    y_pred = reg.predict(x_test)
    mse = metrics.mean_squared_error(y_test, y_pred)
    print("Penalties: {} | MSE: {}".format(c, mse))

Penalties: 1 | MSE: 42.670029732855795
Penalties: 2 | MSE: 31.870269694362477
Penalties: 3 | MSE: 27.355687749608045
Penalties: 4 | MSE: 25.013394160606925
Penalties: 5 | MSE: 23.04952632025864
Penalties: 6 | MSE: 21.757800048310777
