## [作業重點]
確保你了解隨機森林模型中每個超參數的意義，並觀察調整超參數對結果的影響

## 作業

1. 試著調整 RandomForestClassifier(...) 中的參數，並觀察是否會改變結果？
2. 改用其他資料集 (boston, wine)，並與回歸模型與決策樹的結果進行比較

In [1]:
from sklearn import datasets, metrics
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

In [2]:
def train_and_evaluate(classifier, dataset, name):
    print("\n[%s]---------------" % name)
    x_train, x_test, y_train, y_test = train_test_split(dataset.data, dataset.target, 
                                                        test_size=0.25, random_state=4)
    classifier.fit(x_train, y_train)
    y_pred = classifier.predict(x_test)
    acc = metrics.accuracy_score(y_test, y_pred)
    print("Accuracy: ", acc)
    print("Feature importance:")
    for i in range(len(dataset.feature_names)):
        print("%s = %f" % (dataset.feature_names[i], classifier.feature_importances_[i]))

In [3]:
iris = datasets.load_iris()
print("using iris dataset")

clf = RandomForestClassifier(n_estimators=15, max_depth=4)
train_and_evaluate(clf, iris, name='n_estimators=15, max_depth=4')

clf = RandomForestClassifier(n_estimators=10, max_depth=4)
train_and_evaluate(clf, iris, name='n_estimators=10, max_depth=4')

clf = RandomForestClassifier(n_estimators=5, max_depth=4)
train_and_evaluate(clf, iris, name='n_estimators=5, max_depth=4')

clf = RandomForestClassifier(n_estimators=1, max_depth=4)
train_and_evaluate(clf, iris, name='n_estimators=1, max_depth=4')

using iris dataset

[n_estimators=15, max_depth=4]---------------
Accuracy:  0.9736842105263158
Feature importance:
sepal length (cm) = 0.136336
sepal width (cm) = 0.015468
petal length (cm) = 0.576846
petal width (cm) = 0.271349

[n_estimators=10, max_depth=4]---------------
Accuracy:  0.9473684210526315
Feature importance:
sepal length (cm) = 0.115788
sepal width (cm) = 0.012937
petal length (cm) = 0.443959
petal width (cm) = 0.427315

[n_estimators=5, max_depth=4]---------------
Accuracy:  0.9736842105263158
Feature importance:
sepal length (cm) = 0.087989
sepal width (cm) = 0.027836
petal length (cm) = 0.572799
petal width (cm) = 0.311375

[n_estimators=1, max_depth=4]---------------
Accuracy:  0.9736842105263158
Feature importance:
sepal length (cm) = 0.000000
sepal width (cm) = 0.003863
petal length (cm) = 0.973472
petal width (cm) = 0.022665


In [4]:
wine = datasets.load_wine()
print("using wine dataset")

clf = RandomForestClassifier(n_estimators=15, max_depth=4)
train_and_evaluate(clf, wine, name='n_estimators=15, max_depth=4')

clf = RandomForestClassifier(n_estimators=10, max_depth=4)
train_and_evaluate(clf, wine, name='n_estimators=10, max_depth=4')

clf = RandomForestClassifier(n_estimators=5, max_depth=4)
train_and_evaluate(clf, wine, name='n_estimators=5, max_depth=4')

clf = RandomForestClassifier(n_estimators=1, max_depth=4)
train_and_evaluate(clf, wine, name='n_estimators=1, max_depth=4')

using wine dataset

[n_estimators=15, max_depth=4]---------------
Accuracy:  0.9777777777777777
Feature importance:
alcohol = 0.154988
malic_acid = 0.005692
ash = 0.018635
alcalinity_of_ash = 0.014832
magnesium = 0.045986
total_phenols = 0.019336
flavanoids = 0.140830
nonflavanoid_phenols = 0.014005
proanthocyanins = 0.015462
color_intensity = 0.174138
hue = 0.083785
od280/od315_of_diluted_wines = 0.142421
proline = 0.169889

[n_estimators=10, max_depth=4]---------------
Accuracy:  1.0
Feature importance:
alcohol = 0.190064
malic_acid = 0.017011
ash = 0.007177
alcalinity_of_ash = 0.039475
magnesium = 0.062697
total_phenols = 0.011840
flavanoids = 0.260605
nonflavanoid_phenols = 0.006219
proanthocyanins = 0.011712
color_intensity = 0.122785
hue = 0.040443
od280/od315_of_diluted_wines = 0.099970
proline = 0.130002

[n_estimators=5, max_depth=4]---------------
Accuracy:  0.9555555555555556
Feature importance:
alcohol = 0.079581
malic_acid = 0.089065
ash = 0.011574
alcalinity_of_ash = 0.01