## [作業重點]
確保你了解隨機森林模型中每個超參數的意義，並觀察調整超參數對結果的影響

## 作業

1. 試著調整 RandomForestClassifier(...) 中的參數，並觀察是否會改變結果？
2. 改用其他資料集 (boston, wine)，並與回歸模型與決策樹的結果進行比較

In [1]:
from sklearn import datasets, metrics
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.model_selection import train_test_split

In [2]:
# =============================================================================
# boston資料，為迴歸問題，使用RandomForestRegressor(n_estimators=20, max_depth=4)
# =============================================================================
# 讀取 boston資料
boston = datasets.load_boston()

# 切分資料
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.2, random_state=4)

# 建立模型
rgr = RandomForestRegressor(n_estimators=20, max_depth=4)

# 訓練模型
rgr.fit(x_train, y_train)

# 預測測試集
y_pred = rgr.predict(x_test)

mse = metrics.mean_squared_error(y_test, y_pred)
print('MSE of RandomForestRegressor(n_estimators=20, max_depth=4):', mse)
print(boston.feature_names)
print('Feature importance:', rgr.feature_importances_)

MSE of RandomForestRegressor(n_estimators=20, max_depth=4): 22.02719587770832
['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO'
 'B' 'LSTAT']
Feature importance: [2.87583426e-02 4.09257113e-04 1.43625804e-03 1.99619943e-03
 1.46145701e-02 5.74477040e-01 4.01253801e-03 5.80112004e-02
 3.75938381e-03 5.95842971e-03 1.67162772e-02 2.97482484e-03
 2.86875678e-01]


In [3]:
# =============================================================================
# boston資料，為迴歸問題，使用RandomForestRegressor(n_estimators=40, max_depth=4)
# =============================================================================
# 建立模型
rgr = RandomForestRegressor(n_estimators=40, max_depth=4)

# 訓練模型
rgr.fit(x_train, y_train)

# 預測測試集
y_pred = rgr.predict(x_test)

mse = metrics.mean_squared_error(y_test, y_pred)
print('MSE of RandomForestRegressor(n_estimators=40, max_depth=4):', mse)
print(boston.feature_names)
print('Feature importance:', rgr.feature_importances_)

MSE of RandomForestRegressor(n_estimators=40, max_depth=4): 20.094999373704596
['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO'
 'B' 'LSTAT']
Feature importance: [4.39841078e-02 6.46155199e-06 2.85478762e-03 2.74611203e-04
 1.56360802e-02 5.60721434e-01 4.53729996e-03 4.80687359e-02
 4.10808667e-03 7.73768424e-03 1.11978235e-02 3.11763792e-03
 2.97755250e-01]


In [4]:
# =============================================================================
# wine資料，為分類問題，使用RandomForestClassifier(n_estimators=20, max_depth=4)
# =============================================================================
# 讀取資料
wine = datasets.load_wine()

#切分資料
x_train, x_test, y_train, y_test = train_test_split(wine.data, wine.target, test_size=0.2, random_state=4)

# 建立模型
clf = RandomForestClassifier(n_estimators=20, max_depth=4)

# 訓練模型
clf.fit(x_train, y_train)

# 預測測試集
y_pred = clf.predict(x_test)

acc = metrics.accuracy_score(y_test, y_pred)
print('Accuracy of RandomForestClassifier(n_estimators=20, max_depth=4):', acc)
print(wine.feature_names)
print('Feature importance:', clf.feature_importances_)

Accuracy of RandomForestClassifier(n_estimators=20, max_depth=4): 1.0
['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids', 'nonflavanoid_phenols', 'proanthocyanins', 'color_intensity', 'hue', 'od280/od315_of_diluted_wines', 'proline']
Feature importance: [0.14435352 0.02145673 0.01510859 0.02846715 0.01578033 0.01448016
 0.11286491 0.01600415 0.01974931 0.1328703  0.06291675 0.14106433
 0.27488377]


In [5]:
# =============================================================================
# wine資料，為分類問題，使用RandomForestClassifier(n_estimators=40, max_depth=4)
# =============================================================================
# 建立模型
clf = RandomForestClassifier(n_estimators=40, max_depth=4)

# 訓練模型
clf.fit(x_train, y_train)

# 預測測試集
y_pred = clf.predict(x_test)

acc = metrics.accuracy_score(y_test, y_pred)
print('Accuracy of RandomForestClassifier(n_estimators=40, max_depth=4):', acc)
print(wine.feature_names)
print('Feature importance:', clf.feature_importances_)

Accuracy of RandomForestClassifier(n_estimators=40, max_depth=4): 1.0
['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids', 'nonflavanoid_phenols', 'proanthocyanins', 'color_intensity', 'hue', 'od280/od315_of_diluted_wines', 'proline']
Feature importance: [0.13273352 0.01798671 0.02229471 0.03794458 0.01778143 0.04040778
 0.16060548 0.00429761 0.03282099 0.18449913 0.06503431 0.11256021
 0.17103355]
