## [作業重點]
確保你了解隨機森林模型中每個超參數的意義，並觀察調整超參數對結果的影響

## 作業

1. 試著調整 RandomForestClassifier(...) 中的參數，並觀察是否會改變結果？
2. 改用其他資料集 (boston, wine)，並與回歸模型與決策樹的結果進行比較

In [1]:
from sklearn import datasets, metrics
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

In [2]:
# 讀取鳶尾花資料集
iris = datasets.load_iris()

# 切分訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.25, random_state=4)

# 建立模型 (使用 20 顆樹，每棵樹的最大深度為 4)
clf = RandomForestClassifier(n_estimators=20, max_depth=4)

# 訓練模型
clf.fit(x_train, y_train)

# 預測測試集
y_pred = clf.predict(x_test)

# 準確性
acc = metrics.accuracy_score(y_test, y_pred)
print("Accuracy: ", acc)

Accuracy:  0.9736842105263158


In [5]:
import pandas as pd
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score

In [6]:
# n_estimators=10, #決策樹的數量量

n = [3,5,10,15,35,60]
Accuacy={}
for nn in n:
    rf = RandomForestClassifier(n_estimators = nn)
    rf.fit(x_train,y_train)
    y_pred = rf.predict(x_test)
    Accuacy[nn] = metrics.accuracy_score(y_test, y_pred)
    print(f'when n={nn} ACC={Accuacy[nn]:.3F}')
Accuacy_df = pd.DataFrame(data=Accuacy,index=[0])
Accuacy_df

when n=3 ACC=0.974
when n=5 ACC=0.974
when n=10 ACC=0.974
when n=15 ACC=0.974
when n=35 ACC=0.974
when n=60 ACC=0.974


Unnamed: 0,3,5,10,15,35,60
0,0.973684,0.973684,0.973684,0.973684,0.973684,0.973684


In [7]:
# criterion (default=”gini”)
c = ['gini','entropy']
Accuacy={}
for cc in c:
    rf = RandomForestClassifier(n_estimators =5,criterion = cc)
    rf.fit(x_train,y_train)
    y_pred = rf.predict(x_test)
    Accuacy[cc] = metrics.accuracy_score(y_test, y_pred)
    print(f'when criterion={cc} ACC={Accuacy[cc]:.3F}')
Accuacy_df = pd.DataFrame(data=Accuacy,index=[0])
Accuacy_df

when criterion=gini ACC=0.974
when criterion=entropy ACC=0.974


Unnamed: 0,gini,entropy
0,0.973684,0.973684


In [8]:
#bootstrap (default=True)
b = ['True','False']
Accuacy={}
for bb in b:
    rf = RandomForestClassifier(n_estimators =5,bootstrap = bb)
    rf.fit(x_train,y_train)
    y_pred = rf.predict(x_test)
    Accuacy[bb] = metrics.accuracy_score(y_test, y_pred)
    print(f'when bootstrap={bb} ACC={Accuacy[bb]:.3F}')
Accuacy_df = pd.DataFrame(data=Accuacy,index=[0])
Accuacy_df

when bootstrap=True ACC=0.947
when bootstrap=False ACC=0.974


Unnamed: 0,True,False
0,0.947368,0.973684


In [9]:
#改用其他資料集 (boston, wine)

In [10]:
wine = datasets.load_wine()
df = pd.DataFrame(wine.data , columns=wine.feature_names)
df.head(2)

Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280/od315_of_diluted_wines,proline
0,14.23,1.71,2.43,15.6,127.0,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065.0
1,13.2,1.78,2.14,11.2,100.0,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050.0


In [11]:
X = wine.data  # wine用的是data!!!
Y = wine.target
X_train,X_test,Y_train,Y_test = train_test_split(X ,Y, test_size=0.25 ,random_state=4)

In [12]:
# RandomForestClassifier
rf = RandomForestClassifier()
rf.fit(X_train,Y_train)
Y_pred = rf.predict(X_test)
print(f'Accuacy= {metrics.accuracy_score(Y_test, Y_pred):.3f}')
print(f'MSE = {mean_squared_error(Y_test, Y_pred):.3f}')

Accuacy= 0.978
MSE = 0.022




In [14]:
# DecisionTreeClassifier
from sklearn.tree import DecisionTreeClassifier
dt = DecisionTreeClassifier()
dt.fit(X_train,Y_train)
Y_pred = dt.predict(X_test)
print(f'Accuacy= {metrics.accuracy_score(Y_test, Y_pred):.3f}')
print(f'MSE = {mean_squared_error(Y_test, Y_pred):.3f}')

Accuacy= 0.889
MSE = 0.111
