## 作業

1. 試著調整 RandomForestClassifier(...) 中的參數，並觀察是否會改變結果？
2. 改用其他資料集 (boston, wine)，並與回歸模型與決策樹的結果進行比較

In [139]:
from sklearn import datasets, metrics, linear_model
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score

In [133]:
# 讀取鳶尾花資料集
iris = datasets.load_iris()

# 切分訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.25, random_state=4)

# 建立模型
clf = RandomForestClassifier(n_estimators=100, max_depth=3)                         

# 訓練模型
clf.fit(x_train, y_train)

# 預測測試集
y_pred = clf.predict(x_test)

In [134]:
acc = metrics.accuracy_score(y_test, y_pred)
print("Acuuracy: ", acc)

Acuuracy:  0.9736842105263158


In [135]:
print(iris.feature_names)
print("Feature importance: ", clf.feature_importances_)

['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
Feature importance:  [0.11549992 0.02248538 0.39401855 0.46799615]


In [136]:
wine = datasets.load_wine()
boston = datasets.load_boston()

## Wine_OLS

In [140]:
data = wine.data
target = wine.target
print(data.shape)
print(data, '\n')
print(target.shape)
print(target, '\n')
print(wine.feature_names, '\n')

(178, 13)
[[1.423e+01 1.710e+00 2.430e+00 ... 1.040e+00 3.920e+00 1.065e+03]
 [1.320e+01 1.780e+00 2.140e+00 ... 1.050e+00 3.400e+00 1.050e+03]
 [1.316e+01 2.360e+00 2.670e+00 ... 1.030e+00 3.170e+00 1.185e+03]
 ...
 [1.327e+01 4.280e+00 2.260e+00 ... 5.900e-01 1.560e+00 8.350e+02]
 [1.317e+01 2.590e+00 2.370e+00 ... 6.000e-01 1.620e+00 8.400e+02]
 [1.413e+01 4.100e+00 2.740e+00 ... 6.100e-01 1.600e+00 5.600e+02]] 

(178,)
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2] 

['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids', 'nonflavanoid_phenols', 'proanthocyanins', 'color_intensity', 'hue', 'od280/od315_of_diluted_wines', 'proline']

In [141]:
X_train, X_Valid, y_train, y_Valid = train_test_split(data, target, test_size=0.25)
print(X_train.shape)
print(X_Valid.shape)

clf = linear_model.LogisticRegression()

clf.fit(X_train, y_train)

y_pred = clf.predict(X_Valid)

(133, 13)
(45, 13)


In [142]:
acc = accuracy_score(y_Valid, y_pred)
print("Accuracy: ", acc)

Accuracy:  0.9777777777777777


## Wine_DecisionTree

In [143]:
X_train, X_Valid, y_train, y_Valid = train_test_split(data, target, test_size=0.25)
print(X_train.shape)
print(X_Valid.shape)

clf = DecisionTreeClassifier(min_samples_split=2, min_samples_leaf=1)

clf.fit(X_train, y_train)

y_pred = clf.predict(X_Valid)

(133, 13)
(45, 13)


In [144]:
acc = accuracy_score(y_Valid, y_pred)
print("Accuracy: ", acc)

Accuracy:  0.8888888888888888


## Wine_RandomForest

In [149]:
X_train, X_Valid, y_train, y_Valid = train_test_split(data, target, test_size=0.25)
print(X_train.shape)
print(X_Valid.shape)

clf = RandomForestClassifier(n_estimators=100, max_depth=3)

clf.fit(X_train, y_train)

y_pred = clf.predict(X_Valid)

(133, 13)
(45, 13)


In [150]:
acc = accuracy_score(y_Valid, y_pred)
print("Accuracy: ", acc)

Accuracy:  0.9777777777777777


## Boston_OLS

In [151]:
data = boston.data
target = boston.target
print(data.shape)
print(target.shape)

(506, 13)
(506,)


In [204]:
X_train, X_Valid, y_train, y_Valid = train_test_split(data, target, test_size=0.25)
print(X_train.shape)
print(X_Valid.shape)

reg = linear_model.LinearRegression()

reg.fit(X_train, y_train)

y_pred = reg.predict(X_Valid)

(379, 13)
(127, 13)


In [205]:
print('Coefficients: ', '\n', reg.coef_, '\n')
print('Intercepts: ', reg.intercept_, '\n')

print("Mean squared error: %.2f"
      % mean_squared_error(y_Valid, y_pred))

Coefficients:  
 [-1.23755850e-01  3.60330995e-02  5.62464160e-02  1.90821859e+00
 -1.98852455e+01  3.79061927e+00  6.30087754e-03 -1.32793565e+00
  3.18458980e-01 -1.30136955e-02 -1.01084563e+00  7.52657052e-03
 -5.46349825e-01] 

Intercepts:  38.6777757585665 

Mean squared error: 24.80


## Boston_DecisionTree

In [210]:
X_train, X_Valid, y_train, y_Valid = train_test_split(data, target, test_size=0.25)
print(X_train.shape)
print(X_Valid.shape)

reg = DecisionTreeRegressor()

reg.fit(X_train, y_train)

y_pred = reg.predict(X_Valid)

(379, 13)
(127, 13)


In [211]:
print("Mean squared error: %.2f"
      % mean_squared_error(y_Valid, y_pred))

Mean squared error: 15.58


## Boston_RandomForest

In [196]:
X_train, X_Valid, y_train, y_Valid = train_test_split(data, target, test_size=0.25)
print(X_train.shape)
print(X_Valid.shape)

reg = RandomForestRegressor(n_estimators=200)

reg.fit(X_train, y_train)

y_pred = reg.predict(X_Valid)

(379, 13)
(127, 13)


In [197]:
print("Mean squared error: %.2f"
      % mean_squared_error(y_Valid, y_pred))

Mean squared error: 7.12
