## [作業重點]
確保你了解隨機森林模型中每個超參數的意義，並觀察調整超參數對結果的影響

## 作業

1. 試著調整 RandomForestClassifier(...) 中的參數，並觀察是否會改變結果？
2. 改用其他資料集 (boston, wine)，並與回歸模型與決策樹的結果進行比較

In [18]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import datasets, linear_model
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score
from sklearn.preprocessing import MinMaxScaler, LabelEncoder
from sklearn.linear_model import Lasso, Ridge, LinearRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor

from sklearn.externals.six import StringIO  
from sklearn.tree import export_graphviz
import pydotplus

import warnings
warnings.filterwarnings(action='ignore')

In [33]:
def evaluate_clf(model, dataset):
    train_X, test_X, train_y, test_y = train_test_split(dataset['data'], dataset['target'], test_size=0.25, random_state=1234)
    model.fit(train_X, train_y)
    pred_y = model.predict(test_X)

    acc = accuracy_score(test_y, pred_y)
    importance = pd.Series(model.feature_importances_, index=dataset['feature_names'])

    return model, importance, acc

In [37]:
dataset = datasets.load_iris()
tree, importance, acc = evaluate_clf(RandomForestClassifier(), dataset)
print("acc_score={}".format(acc))
print(importance)

# 建立模型 (使用 20 顆樹，每棵樹的最大深度為 4)
tree, importance, acc = evaluate_clf(RandomForestClassifier(n_estimators=20, max_depth=4), dataset)
print("acc_score={}".format(acc))
print(importance)

# 建立模型 (使用 100 顆樹，每棵樹的最大深度為 8, Criterion 為 entrophy)
tree, importance, acc = evaluate_clf(RandomForestClassifier(criterion ='entropy', n_estimators=100, max_depth=8), dataset)
print("acc_score={}".format(acc))
print(importance)



acc_score=0.9473684210526315
sepal length (cm)    0.131373
sepal width (cm)     0.070450
petal length (cm)    0.337830
petal width (cm)     0.460347
dtype: float64
acc_score=0.9736842105263158
sepal length (cm)    0.116747
sepal width (cm)     0.044924
petal length (cm)    0.474878
petal width (cm)     0.363450
dtype: float64
acc_score=0.9473684210526315
sepal length (cm)    0.096222
sepal width (cm)     0.023495
petal length (cm)    0.418672
petal width (cm)     0.461611
dtype: float64


In [40]:
# Wine dataset
dataset = datasets.load_wine()

tree, importance, acc = evaluate_clf(DecisionTreeClassifier(criterion ='entropy', max_depth=8), dataset)
print("acc_score={}".format(acc))
print(importance)

tree, importance, acc = evaluate_clf(RandomForestClassifier(criterion ='entropy', n_estimators=50, max_depth=8), dataset)
print("acc_score={}".format(acc))
print(importance)

acc_score=0.8888888888888888
alcohol                         0.000000
malic_acid                      0.000000
ash                             0.054169
alcalinity_of_ash               0.000000
magnesium                       0.000000
total_phenols                   0.000000
flavanoids                      0.427651
nonflavanoid_phenols            0.000000
proanthocyanins                 0.000000
color_intensity                 0.338127
hue                             0.000000
od280/od315_of_diluted_wines    0.000000
proline                         0.180053
dtype: float64
acc_score=0.9555555555555556
alcohol                         0.127823
malic_acid                      0.026184
ash                             0.005208
alcalinity_of_ash               0.026613
magnesium                       0.019640
total_phenols                   0.048274
flavanoids                      0.148484
nonflavanoid_phenols            0.005506
proanthocyanins                 0.022899
color_intensity          