## [作業重點]
確保你了解隨機森林模型中每個超參數的意義，並觀察調整超參數對結果的影響

## 作業

1. 試著調整 RandomForestClassifier(...) 中的參數，並觀察是否會改變結果？
2. 改用其他資料集 (boston, wine)，並與回歸模型與決策樹的結果進行比較

In [1]:
from sklearn import datasets, metrics
from sklearn.ensemble import RandomForestClassifier as RFC
from sklearn.ensemble import RandomForestRegressor as RFR
from sklearn.model_selection import train_test_split

## Q1.iris 調整參數

In [2]:
iris = datasets.load_iris()
x = iris.data
y = iris.target

In [3]:
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size = 0.2, random_state = 77)

In [5]:
# 建立模型 (使用 20 顆樹，每棵樹的最大深度為 4)
model = RFC(n_estimators=20, max_depth=4)
model.fit(x_train,y_train)

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
                       max_depth=4, max_features='auto', max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=20,
                       n_jobs=None, oob_score=False, random_state=None,
                       verbose=0, warm_start=False)

In [6]:
y_pred = model.predict(x_test)
y_pred

array([1, 1, 2, 1, 0, 2, 2, 1, 0, 1, 0, 1, 0, 0, 0, 2, 2, 2, 0, 1, 0, 2,
       2, 1, 1, 1, 2, 0, 1, 1])

In [7]:
acc = metrics.accuracy_score(y_test,y_pred)
print("Accuracy:", acc)

Accuracy: 0.8666666666666667


In [8]:
print(iris.feature_names)

['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']


In [10]:
print("Feature importance:", model.feature_importances_)

Feature importance: [0.09619215 0.00993961 0.44313898 0.45072925]


### 調整model的參數

In [12]:
model = RFC(n_estimators = 10, max_depth = 10)
model.fit(x_train,y_train)

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
                       max_depth=10, max_features='auto', max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=10,
                       n_jobs=None, oob_score=False, random_state=None,
                       verbose=0, warm_start=False)

In [13]:
acc = metrics.accuracy_score(y_test,y_pred)
print("Accuracy:", acc)

Accuracy: 0.8666666666666667


In [14]:
print("Feature importance:", model.feature_importances_)

Feature importance: [0.05223112 0.01780361 0.48922431 0.44074096]


## Q2. 改用其他資料集 (boston, wine)，並與回歸模型與決策樹的結果進行比較

### random forest

In [15]:
boston = datasets.load_boston()
x = boston.data
y = boston.target

In [16]:
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size = 0.2, random_state = 777)

In [17]:
model = RFR()
model.fit(x_train,y_train)



RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
                      max_features='auto', max_leaf_nodes=None,
                      min_impurity_decrease=0.0, min_impurity_split=None,
                      min_samples_leaf=1, min_samples_split=2,
                      min_weight_fraction_leaf=0.0, n_estimators=10,
                      n_jobs=None, oob_score=False, random_state=None,
                      verbose=0, warm_start=False)

In [18]:
y_pred = model.predict(x_test)

In [21]:
from sklearn.metrics import mean_squared_error as MSE
mse = MSE(y_test,y_pred)
print("Mean squared error:", mse)

Mean squared error: 10.78061176470588


### linear regression

In [22]:
from sklearn.linear_model import LinearRegression as LR
model_reg = LR()
model_reg.fit(x_train,y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [23]:
y_pred = model_reg.predict(x_test)

In [25]:
mse_reg = MSE(y_test,y_pred)
print("MSE_REG:", mse_reg)

MSE_REG: 24.368039930992108


### decision tree

In [27]:
from sklearn.tree import DecisionTreeRegressor as DTR
model_dec = DTR()
model_dec.fit(x_train,y_train)

DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None,
                      max_leaf_nodes=None, min_impurity_decrease=0.0,
                      min_impurity_split=None, min_samples_leaf=1,
                      min_samples_split=2, min_weight_fraction_leaf=0.0,
                      presort=False, random_state=None, splitter='best')

In [29]:
y_pred = model_dec.predict(x_test)

In [30]:
mse_dec = MSE(y_test,y_pred)
print("MSE_DEC:", mse_dec)

MSE_DEC: 13.487058823529408


## Random forest 做出來的MSE最小