## [作業重點]
確保你了解隨機森林模型中每個超參數的意義，並觀察調整超參數對結果的影響

### 實作 RandomForest 教學
https://www.youtube.com/watch?v=QHOazyP-YlM

## 作業

1. 試著調整 RandomForestClassifier(...) 中的參數，並觀察是否會改變結果？
2. 改用其他資料集 (boston, wine)，並與回歸模型與決策樹的結果進行比較

In [1]:
from sklearn import datasets, metrics
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.linear_model import LassoCV, RidgeCV, LinearRegression, LogisticRegression
from sklearn.model_selection import cross_val_score
from sklearn.metrics import mean_squared_error, make_scorer, accuracy_score

# Dataset: boston

In [2]:
boston = datasets.load_boston()

estimators = [
    LinearRegression(),
    RandomForestRegressor(),
]

for etr in estimators:
    scores = cross_val_score(
            etr,
            boston.data, 
            boston.target,
            cv=10,
            scoring=make_scorer(mean_squared_error, greater_is_better=False),
            n_jobs=-1,
        )

    print(f"{str(etr).split('(')[0]:<25}: {scores.mean()} +/- {scores.std()}")

LinearRegression         : -34.70525594452492 +/- 45.57399920030865
RandomForestRegressor    : -23.308536152941173 +/- 22.95802660384197


# Dataset: wine

In [3]:
wine = datasets.load_wine()

estimators = [
    LogisticRegression(),
    RandomForestClassifier(),
]

for etr in estimators:
    scores = cross_val_score(
            etr,
            wine.data, 
            wine.target,
            cv=10,
            scoring=make_scorer(accuracy_score, greater_is_better=True),
            n_jobs=-1,
        )

    print(f"{str(etr).split('(')[0]:<25}: {scores.mean()} +/- {scores.std()}")

LogisticRegression       : 0.9564327485380117 +/- 0.052492506317112934
RandomForestClassifier   : 0.9722222222222221 +/- 0.03726779962499651
