## [作業重點]
確保你了解隨機森林模型中每個超參數的意義，並觀察調整超參數對結果的影響

## 作業

1. 試著調整 RandomForestClassifier(...) 中的參數，並觀察是否會改變結果？
2. 改用其他資料集 (boston, wine)，並與回歸模型與決策樹的結果進行比較

In [1]:
from sklearn import datasets, metrics
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.model_selection import train_test_split

## boston

In [2]:
boston = datasets.load_boston()
print(f'boston dir: {dir(boston)}')
print(f'boston feature_names: {boston.feature_names}')
print(f'boston data shape: {boston.data.shape}')
print(f'boston DESCR: {boston.DESCR}')

boston dir: ['DESCR', 'data', 'feature_names', 'filename', 'target']
boston feature_names: ['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO'
 'B' 'LSTAT']
boston data shape: (506, 13)
boston DESCR: .. _boston_dataset:

Boston house prices dataset
---------------------------

**Data Set Characteristics:**  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior t

In [3]:
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.2, random_state=4)
rf_regr = RandomForestRegressor()
rf_regr.fit(x_train, y_train)
y_pred = rf_regr.predict(x_test)
print(f'Mean squared error: {metrics.mean_squared_error(y_test, y_pred)}')
print("==feature_importances==")
for feature, importance in sorted(zip(boston.feature_names, rf_regr.feature_importances_), key=lambda k: k[1], reverse=True):
    print(f'{feature}: {importance}')

Mean squared error: 18.198653921568628
==feature_importances==
RM: 0.4439174345904214
LSTAT: 0.34927546799041326
DIS: 0.055044810457369704
CRIM: 0.04525828659575232
TAX: 0.019129703208438164
AGE: 0.01857408746282384
PTRATIO: 0.01847303078832145
NOX: 0.018441678620994537
B: 0.01516616507413998
RAD: 0.009146033042627845
INDUS: 0.004897450851238195
ZN: 0.0018398101297910578
CHAS: 0.0008360411876683621




## wine

In [4]:
wine = datasets.load_wine()
print(f'wine dir: {dir(wine)}')
print(f'wine feature_names: {wine.feature_names}')
print(f'wine data shape: {wine.data.shape}')
print(f'wine DESCR: {wine.DESCR}')

wine dir: ['DESCR', 'data', 'feature_names', 'target', 'target_names']
wine feature_names: ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids', 'nonflavanoid_phenols', 'proanthocyanins', 'color_intensity', 'hue', 'od280/od315_of_diluted_wines', 'proline']
wine data shape: (178, 13)
wine DESCR: .. _wine_dataset:

Wine recognition dataset
------------------------

**Data Set Characteristics:**

    :Number of Instances: 178 (50 in each of three classes)
    :Number of Attributes: 13 numeric, predictive attributes and the class
    :Attribute Information:
 		- Alcohol
 		- Malic acid
 		- Ash
		- Alcalinity of ash  
 		- Magnesium
		- Total phenols
 		- Flavanoids
 		- Nonflavanoid phenols
 		- Proanthocyanins
		- Color intensity
 		- Hue
 		- OD280/OD315 of diluted wines
 		- Proline

    - class:
            - class_0
            - class_1
            - class_2
		
    :Summary Statistics:
    
                                   Min   Max   Me

In [5]:
x_train, x_test, y_train, y_test = train_test_split(wine.data, wine.target, test_size=0.2, random_state=4)
rf_class = RandomForestClassifier()
rf_class.fit(x_train, y_train)
y_pred = rf_class.predict(x_test)
print(f'Accuracy: {metrics.accuracy_score(y_test, y_pred)}')
print("==feature_importances==")
for feature, importance in sorted(zip(wine.feature_names, rf_class.feature_importances_), key=lambda k: k[1], reverse=True):
    print(f'{feature}: {importance}')

Accuracy: 1.0
==feature_importances==
proline: 0.23755060237954626
flavanoids: 0.20219505908948412
color_intensity: 0.129249397363865
od280/od315_of_diluted_wines: 0.10071641310075241
alcohol: 0.08225955705235583
hue: 0.056145177506116424
malic_acid: 0.04903617708392796
magnesium: 0.045478821872191544
total_phenols: 0.0439169079340685
ash: 0.022409301858684717
proanthocyanins: 0.016837393597571967
alcalinity_of_ash: 0.012065025373944547
nonflavanoid_phenols: 0.00214016578749058


