## [作業重點]
確保你了解隨機森林模型中每個超參數的意義，並觀察調整超參數對結果的影響

## 作業

1. 試著調整 RandomForestClassifier(...) 中的參數，並觀察是否會改變結果？
2. 改用其他資料集 (boston, wine)，並與回歸模型與決策樹的結果進行比較

# 作業1

### Importing the libraries

In [1]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

### Importing the dataset

In [2]:
from sklearn import datasets

iris = datasets.load_iris()
feature_names = iris.feature_names
df = pd.DataFrame(iris.data, columns=feature_names)
df['Target'] = iris.target
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),Target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [3]:
X = df[feature_names]
y = df['Target']

### Splitting the dataset into the Training set and Test set

In [5]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=4)

### Fitting Random Forest to the Training set

In [19]:
from sklearn.ensemble import RandomForestClassifier
forest_clf = RandomForestClassifier(n_estimators=100, criterion='entropy')
forest_clf.fit(X_train, y_train)

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)

### Predicting the Test set results

In [20]:
y_pred = forest_clf.predict(X_test)

In [21]:
from sklearn import metrics

acc = metrics.accuracy_score(y_test, y_pred)
print("Acuuracy: ", acc)

Acuuracy:  0.9736842105263158


In [22]:
pd.DataFrame(forest_clf.feature_importances_, index=feature_names, columns=['feature_importances'])

Unnamed: 0,feature_importances
sepal length (cm),0.099906
sepal width (cm),0.021217
petal length (cm),0.473117
petal width (cm),0.40576


# 作業2

## wine資料集比較: LogisticRegression

In [23]:
# Importing the dataset
wine = datasets.load_wine()
feature_names = wine.feature_names
df = pd.DataFrame(wine.data, columns=feature_names)
df['Target'] = wine.target

X = df[feature_names]
y = df['Target']

In [24]:
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# Fitting LogisticRegression to the Training set
from sklearn.linear_model import LogisticRegression
logreg = LogisticRegression(solver='liblinear')
logreg.fit(X_train, y_train)

# Predicting the Test set results
from sklearn import metrics
y_pred = logreg.predict(X_test)
acc = metrics.accuracy_score(y_test, y_pred)
print("Acuuracy: ", acc)

Acuuracy:  1.0


## wine資料集比較: Decision Tree

In [25]:
# Importing the dataset
wine = datasets.load_wine()
feature_names = wine.feature_names
df = pd.DataFrame(wine.data, columns=feature_names)
df['Target'] = wine.target

X = df[feature_names]
y = df['Target']

In [26]:
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# Fitting Decision Tree to the Training set
from sklearn.tree import DecisionTreeClassifier
tree_clf = DecisionTreeClassifier(criterion='entropy', random_state=0)
tree_clf.fit(X_train, y_train)

# Predicting the Test set results
from sklearn import metrics
y_pred = tree_clf.predict(X_test)
acc = metrics.accuracy_score(y_test, y_pred)
print("Acuuracy: ", acc)

pd.DataFrame(tree_clf.feature_importances_, index=feature_names, columns=['feature_importances'])

Acuuracy:  0.9333333333333333


Unnamed: 0,feature_importances
alcohol,0.0
malic_acid,0.018835
ash,0.0
alcalinity_of_ash,0.0
magnesium,0.0
total_phenols,0.0
flavanoids,0.439494
nonflavanoid_phenols,0.0
proanthocyanins,0.0
color_intensity,0.236317


## wine資料集比較: Random Forest

In [28]:
# Importing the dataset
wine = datasets.load_wine()
feature_names = wine.feature_names
df = pd.DataFrame(wine.data, columns=feature_names)
df['Target'] = wine.target

X = df[feature_names]
y = df['Target']

In [29]:
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# Fitting Random Forest to the Training set
from sklearn.ensemble import RandomForestClassifier
forest_clf = RandomForestClassifier(n_estimators=100, criterion='entropy', random_state=0)
forest_clf.fit(X_train, y_train)

# Predicting the Test set results
from sklearn import metrics
y_pred = forest_clf.predict(X_test)
acc = metrics.accuracy_score(y_test, y_pred)
print("Acuuracy: ", acc)

pd.DataFrame(forest_clf.feature_importances_, index=feature_names, columns=['feature_importances'])

Acuuracy:  0.9777777777777777


Unnamed: 0,feature_importances
alcohol,0.096439
malic_acid,0.022573
ash,0.01096
alcalinity_of_ash,0.036753
magnesium,0.022013
total_phenols,0.057803
flavanoids,0.200652
nonflavanoid_phenols,0.013617
proanthocyanins,0.02072
color_intensity,0.169951
