## [作業重點]
確保你了解隨機森林模型中每個超參數的意義，並觀察調整超參數對結果的影響

## 作業

1. 試著調整 RandomForestClassifier(...) 中的參數，並觀察是否會改變結果？
2. 改用其他資料集 (boston, wine)，並與回歸模型與決策樹的結果進行比較

In [1]:
from sklearn import datasets, metrics, linear_model
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

In [2]:
iris = datasets.load_iris()
x_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.25, random_state=4)
clf = RandomForestClassifier(n_estimators=20, max_depth=4)
clf.fit(x_train, y_train)

# 預測測試集
y_pred = clf.predict(x_test)
acc = metrics.accuracy_score(y_test, y_pred)
print("Acuuracy: ", acc)
print("Feature names: ", iris.feature_names)
print("Feature importance: ", clf.feature_importances_)

Acuuracy:  0.9736842105263158
Feature names:  ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
Feature importance:  [0.10029432 0.02828771 0.52187895 0.34953901]


In [3]:
iris = datasets.load_iris()
x_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.25, random_state=4)
#clf = RandomForestClassifier(n_estimators=20, max_depth=4)
clf = RandomForestClassifier(n_estimators=10, criterion="gini",
                             max_features="auto", max_depth=10,
                             min_samples_split=2, min_samples_leaf=1)
clf.fit(x_train, y_train)

# 預測測試集
y_pred = clf.predict(x_test)
acc = metrics.accuracy_score(y_test, y_pred)
print("Acuuracy: ", acc)
print("Feature names: ", iris.feature_names)
print("Feature importance: ", clf.feature_importances_)

Acuuracy:  0.9736842105263158
Feature names:  ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
Feature importance:  [0.07165895 0.0271701  0.5227767  0.37839425]


In [4]:
wine = datasets.load_wine()
x_train, x_test, y_train, y_test = train_test_split(wine.data, wine.target, test_size=0.25, random_state=4)
#clf = RandomForestClassifier(n_estimators=20, max_depth=4)
clf = RandomForestClassifier(n_estimators=10, criterion="gini",
                             max_features="auto", max_depth=10,
                             min_samples_split=2, min_samples_leaf=1)
clf.fit(x_train, y_train)

# 預測測試集
y_pred = clf.predict(x_test)
acc = metrics.accuracy_score(y_test, y_pred)
print("Acuuracy: ", acc)
print("Feature names: ", wine.feature_names)
print("Feature importance: ", clf.feature_importances_)

Acuuracy:  1.0
Feature names:  ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids', 'nonflavanoid_phenols', 'proanthocyanins', 'color_intensity', 'hue', 'od280/od315_of_diluted_wines', 'proline']
Feature importance:  [0.16091354 0.04121847 0.02074133 0.09129131 0.00345728 0.0823122
 0.13970795 0.01080208 0.01408674 0.18853046 0.05113456 0.03465684
 0.16114724]


In [5]:
### Random Forest Regression
boston = datasets.load_boston()
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.25, random_state=4)
rfr = RandomForestRegressor(n_estimators=10,
                             max_features="auto", max_depth=10,
                             min_samples_split=2, min_samples_leaf=1)
rfr.fit(x_train, y_train)

# 預測測試集
y_pred = rfr.predict(x_test)
#acc = metrics.accuracy_score(y_test, y_pred)
#print("Acuuracy: ", acc)
print("Mean squared error: %.2f" % mean_squared_error(y_test, y_pred))
print("Feature names: ", boston.feature_names)
print("Feature importance: ", rfr.feature_importances_)

Mean squared error: 14.23
Feature names:  ['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO'
 'B' 'LSTAT']
Feature importance:  [0.06020368 0.0008854  0.01150609 0.00402928 0.01794549 0.42058793
 0.01196016 0.05715538 0.00513309 0.01193442 0.01465552 0.01025309
 0.37375048]


In [6]:
### Linear Regression
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.25, random_state=4)
regr = linear_model.LinearRegression()
regr.fit(x_train, y_train)
y_pred = regr.predict(x_test)
print("Mean squared error: %.2f"
      % mean_squared_error(y_test, y_pred))

Mean squared error: 26.95
