## [作業重點]
確保你了解隨機森林模型中每個超參數的意義，並觀察調整超參數對結果的影響

## 作業

1. 試著調整 RandomForestClassifier(...) 中的參數，並觀察是否會改變結果？


In [16]:
from sklearn import datasets, metrics
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.model_selection import train_test_split

In [4]:
# 讀取鳶尾花資料集
iris = datasets.load_iris()

# 切分訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.25, random_state=4)

# 建立模型 (使用 10 顆樹，每棵樹的最大深度為 4)
clf = RandomForestClassifier(n_estimators=10, max_depth=4)

# 訓練模型
clf.fit(x_train, y_train)

# 預測測試集
y_pred = clf.predict(x_test)

In [5]:
acc = metrics.accuracy_score(y_test, y_pred)
print("Accuracy: ", acc)

Accuracy:  0.9736842105263158


In [6]:
print(iris.feature_names)

['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']


In [7]:
print("Feature importance: ", clf.feature_importances_)

Feature importance:  [0.11634473 0.02520534 0.46093499 0.39751495]


2. 改用其他資料集 (boston, wine)，並與回歸模型與決策樹的結果進行比較

In [8]:
# 讀取 wine 資料集
wine = datasets.load_wine()

# 切分訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(wine.data, wine.target, test_size=0.3, random_state=0)

# 建立模型 (使用 10 顆樹，每棵樹的最大深度為 4)
clf = RandomForestClassifier(n_estimators=10, max_depth=4)

# 訓練模型
clf.fit(x_train, y_train)

# 預測測試集
y_pred = clf.predict(x_test)

print(y_pred)

[0 2 1 0 1 1 0 2 1 1 2 2 0 1 2 1 0 0 2 0 1 0 0 1 1 1 1 1 1 2 0 0 1 0 0 0 2
 1 1 2 0 0 1 1 1 0 2 1 2 0 2 2 0 2]


In [9]:
acc = metrics.accuracy_score(y_test, y_pred)
print("Acuuracy: ", acc)

Acuuracy:  0.9814814814814815


In [10]:
print(wine.feature_names)

['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids', 'nonflavanoid_phenols', 'proanthocyanins', 'color_intensity', 'hue', 'od280/od315_of_diluted_wines', 'proline']


In [11]:
print("Feature importance: ", clf.feature_importances_)

Feature importance:  [0.12432039 0.02302595 0.01340534 0.00242216 0.01018271 0.06375447
 0.07978231 0.00263245 0.02227647 0.24185297 0.04400726 0.16403825
 0.20829927]


In [23]:
# 讀取 boston 資料集
boston = datasets.load_boston()

# 切分訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.3, random_state=0)

# 建立模型 (使用 10 顆樹，每棵樹的最大深度為 4)
#  RandomForestClassifier  適用於 y_train:label 是 屬於 分類回歸，此 boston 的情況下會出現錯誤 Unknown label type: 'continuous'
#clf = RandomForestClassifier(n_estimators=10, max_depth=4)

#  改用 RandomForestRegressor
clf = RandomForestRegressor(n_estimators=10, max_depth=4)

# 訓練模型
clf.fit(x_train, y_train)

# 預測測試集
y_pred = clf.predict(x_test)

print(y_pred)

[23.58993617 33.7097883  21.04190687 12.03894733 20.3667156  22.47592783
 20.3667156  22.47592783 23.54814569 20.3667156   8.8308286  13.22442531
 14.33660581 11.35019608 45.66983333 31.10532353 20.3667156  32.4921235
 24.55795516 22.47592783 23.58993617 21.04190687 20.3667156  23.58993617
 21.29701267 22.76411819 20.3667156  15.94183049 43.56525758 17.29724019
 13.64113011 18.872028   20.9386771  21.04190687 22.9296066  18.80399241
 11.35019608 33.7097883  13.02398099 14.24815785 23.0150755  21.29701267
 22.9296066  14.54836208 24.81608171 22.66275475 18.99498486 16.54977684
 14.7171162  26.74738797 18.42383181 17.96346664 20.78299093 41.8952381
 15.33756698 18.99498486 20.78299093 20.3667156  31.02989061 18.99739746
 23.25629708 20.55354252 30.42484734 26.64557708 17.52933779 26.64557708
 18.50228072 17.67847829 13.7586396  20.55354252 20.3667156  22.47592783
 24.32632153 30.42484734 32.03194991 11.35019608 42.70507143 20.55354252
 22.9296066  20.3667156  26.54104446 18.12056898 27.6

In [24]:
# ValueError: continuous is not supported

#acc = metrics.accuracy_score(y_test, y_pred)
#print("Acuuracy: ", acc)

# 預測值與實際值的差距，使用 MSE
print("Mean squared error: %.2f"
      % metrics.mean_squared_error(y_test, y_pred))

Mean squared error: 16.79


In [25]:
print(boston.feature_names)


['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO'
 'B' 'LSTAT']


In [20]:
print("Feature importance: ", clf.feature_importances_)

Feature importance:  [3.13885508e-02 1.69789908e-04 9.92781807e-03 1.87268743e-03
 1.61487498e-02 5.36957301e-01 6.04204143e-03 2.71054288e-02
 8.10284543e-04 5.45231977e-03 1.54274141e-02 5.94949305e-03
 3.42748122e-01]
