## [作業重點]
目前你應該已經要很清楚資料集中，資料的型態是什麼樣子囉！包含特徵 (features) 與標籤 (labels)。因此要記得未來不管什麼專案，必須要把資料清理成相同的格式，才能送進模型訓練。
今天的作業開始踏入決策樹這個非常重要的模型，請務必確保你理解模型中每個超參數的意思，並試著調整看看，對最終預測結果的影響為何

## 作業

1. 試著調整 DecisionTreeClassifier(...) 中的參數，並觀察是否會改變結果？
2. 改用其他資料集 (boston, wine)，並與回歸模型的結果進行比較

In [1]:
from sklearn import datasets, metrics
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.model_selection import train_test_split

  return f(*args, **kwds)


## boston

In [8]:
boston = datasets.load_boston()
print(f'boston dir: {dir(boston)}')
print(f'boston feature_names: {boston.feature_names}')
print(f'boston data shape: {boston.data.shape}')
print(f'boston DESCR: {boston.DESCR}')


boston dir: ['DESCR', 'data', 'feature_names', 'filename', 'target']
boston feature_names: ['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO'
 'B' 'LSTAT']
boston data shape: (506, 13)
boston DESCR: .. _boston_dataset:

Boston house prices dataset
---------------------------

**Data Set Characteristics:**  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior t

In [22]:
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.2, random_state=4)

dt_regr = DecisionTreeRegressor()

dt_regr.fit(x_train, y_train)

y_pred = dt_regr.predict(x_test)

print(f'Mean squared error: {metrics.mean_squared_error(y_test, y_pred)}')

print("==feature_importances==")
for feature, importance in sorted(zip(boston.feature_names, dt_regr.feature_importances_), key=lambda k: k[1], reverse=True):
    print(f'{feature}: {importance}')


Mean squared error: 25.46892156862745
==feature_importances==
RM: 0.5994589293054029
LSTAT: 0.21013993116474672
DIS: 0.052456819956077566
CRIM: 0.04461495690250715
NOX: 0.03259389793323203
PTRATIO: 0.020795293668248394
TAX: 0.0175119203754523
AGE: 0.009150914350995206
B: 0.007111100855331431
INDUS: 0.00408916283223124
RAD: 0.0011875287306957005
ZN: 0.0006468231497093362
CHAS: 0.0002427207753700124


## wine

In [23]:
wine = datasets.load_wine()
print(f'wine dir: {dir(wine)}')
print(f'wine feature_names: {wine.feature_names}')
print(f'wine data shape: {wine.data.shape}')
print(f'wine DESCR: {wine.DESCR}')


wine dir: ['DESCR', 'data', 'feature_names', 'target', 'target_names']
wine feature_names: ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids', 'nonflavanoid_phenols', 'proanthocyanins', 'color_intensity', 'hue', 'od280/od315_of_diluted_wines', 'proline']
wine data shape: (178, 13)
wine DESCR: .. _wine_dataset:

Wine recognition dataset
------------------------

**Data Set Characteristics:**

    :Number of Instances: 178 (50 in each of three classes)
    :Number of Attributes: 13 numeric, predictive attributes and the class
    :Attribute Information:
 		- Alcohol
 		- Malic acid
 		- Ash
		- Alcalinity of ash  
 		- Magnesium
		- Total phenols
 		- Flavanoids
 		- Nonflavanoid phenols
 		- Proanthocyanins
		- Color intensity
 		- Hue
 		- OD280/OD315 of diluted wines
 		- Proline

    - class:
            - class_0
            - class_1
            - class_2
		
    :Summary Statistics:
    
                                   Min   Max   Me

In [30]:
x_train, x_test, y_train, y_test = train_test_split(wine.data, wine.target, test_size=0.2, random_state=4)

dt_class = DecisionTreeClassifier()

dt_class.fit(x_train, y_train)

y_pred = dt_class.predict(x_test)

print(f'Accuracy: {metrics.accuracy_score(y_test, y_pred)}')

print("==feature_importances==")
for feature, importance in sorted(zip(wine.feature_names, dt_class.feature_importances_), key=lambda k: k[1], reverse=True):
    print(f'{feature}: {importance}')

Accuracy: 0.9166666666666666
==feature_importances==
proline: 0.40405356638087775
color_intensity: 0.3609487914436515
od280/od315_of_diluted_wines: 0.11762342303451415
hue: 0.05968764232579323
alcohol: 0.04151836670433906
flavanoids: 0.016168210110824352
malic_acid: 0.0
ash: 0.0
alcalinity_of_ash: 0.0
magnesium: 0.0
total_phenols: 0.0
nonflavanoid_phenols: 0.0
proanthocyanins: 0.0
