## [作業重點]
使用 Sklearn 中的線性迴歸模型，來訓練各種資料集，務必了解送進去模型訓練的**資料型態**為何，也請了解模型中各項參數的意義

## 作業
試著使用 sklearn datasets 的其他資料集 (wine, boston, ...)，來訓練自己的線性迴歸模型。

### HINT: 注意 label 的型態，確定資料集的目標是分類還是回歸，在使用正確的模型訓練！

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets, linear_model
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score

In [2]:
# 讀取乳癌資料集
breast_cancer = datasets.load_breast_cancer()
print(breast_cancer['feature_names'])
print(breast_cancer.data.shape)

#X = breast_cancer.data[:, np.newaxis, 2]
#print("Data shape: ", X.shape) # 可以看見有 569 筆資料與我們取出的其中一個 feature
#print(breast_cancer.target)
#print(breast_cancer.data.shape)
# 切分訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(breast_cancer.data, breast_cancer.target, test_size=0.4, random_state=4)

print(x_test.shape)
print(y_test.shape)
print(x_train.shape)
print(y_train.shape)

# 建立一個線性回歸模型
regr = linear_model.LogisticRegression()

# 將訓練資料丟進去模型訓練
regr.fit(x_train, y_train)

# 將測試資料丟進模型得到預測結果
y_pred = regr.predict(x_test)
print(y_test)
print(y_pred)

['mean radius' 'mean texture' 'mean perimeter' 'mean area'
 'mean smoothness' 'mean compactness' 'mean concavity'
 'mean concave points' 'mean symmetry' 'mean fractal dimension'
 'radius error' 'texture error' 'perimeter error' 'area error'
 'smoothness error' 'compactness error' 'concavity error'
 'concave points error' 'symmetry error' 'fractal dimension error'
 'worst radius' 'worst texture' 'worst perimeter' 'worst area'
 'worst smoothness' 'worst compactness' 'worst concavity'
 'worst concave points' 'worst symmetry' 'worst fractal dimension']
(569, 30)
(228, 30)
(228,)
(341, 30)
(341,)
[1 1 0 0 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 0 1
 0 0 0 0 1 0 1 1 1 1 0 1 0 0 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 0 0 1 1 1 0
 1 0 0 1 1 1 1 0 0 1 1 1 1 1 0 1 1 0 1 0 0 1 0 1 1 1 1 0 0 1 1 1 1 1 1 0 1
 0 0 1 1 0 1 0 0 1 1 0 1 1 1 1 1 0 0 1 1 1 0 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0
 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0 1 0 1 1
 1 0 1 1 1 0 0 1 1 1 0 0 1



In [3]:
# 可以看回歸模型的參數值
print('Coefficients: ', regr.coef_)

# 預測值與實際值的差距，使用 MSE
print("Mean squared error: %.2f"
      % mean_squared_error(y_test, y_pred))

acc = accuracy_score(y_test, y_pred)
print("Accuracy:", acc)

Coefficients:  [[ 1.16543541e+00  3.37342246e-01  1.78128646e-01 -3.88421365e-03
  -5.21509400e-02 -1.89343449e-01 -3.76615255e-01 -1.69149825e-01
  -9.80924703e-02 -1.40959949e-02  4.79022924e-02  2.97213124e-01
   2.68677234e-01 -1.42725235e-01 -6.01431477e-03 -1.38042392e-02
  -6.17242646e-02 -1.95940438e-02  6.68394386e-03 -9.42974551e-04
   1.11736705e+00 -4.24753452e-01 -1.22320157e-01 -3.41356402e-02
  -1.18127719e-01 -5.65641704e-01 -1.05695190e+00 -3.03613142e-01
  -1.47449864e-01 -6.04965854e-02]]
Mean squared error: 0.09
Accuracy: 0.9122807017543859


In [4]:
# 畫出回歸模型與實際資料的分佈
print(x_test.shape)
print(y_test.shape)
print(x_train.shape)
print(y_train.shape)
#plt.scatter(x_test, y_test,  color='black')
#plt.plot(y_train, y_pred, color='blue', linewidth=3)
#plt.show()

(228, 30)
(228,)
(341, 30)
(341,)


In [5]:
from sklearn.datasets import load_boston, load_wine
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score

In [6]:
# 讀取 Boston 資料
boston = load_boston()

# 切分訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.1, random_state=4)

# 建立一個線性回歸模型
regr = LinearRegression()

# 將訓練資料丟進去模型訓練
regr.fit(x_train, y_train)

# 將測試資料丟進模型得到預測結果
y_pred = regr.predict(x_test)

In [7]:
print("Mean squared error: %.2f"
      % mean_squared_error(y_test, y_pred))

Mean squared error: 17.04


In [8]:
# 讀取 wine 資料
wine = load_wine()

# 切分訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(wine.data, wine.target, test_size=0.1, random_state=4)

# 建立一個羅吉斯回歸模型
regr = LogisticRegression()

# 將訓練資料丟進去模型訓練
regr.fit(x_train, y_train)

# 將測試資料丟進模型得到預測結果
y_pred = regr.predict(x_test)



In [9]:
acc = accuracy_score(y_test, y_pred)
print("Accuracy: ", acc)
print(y_test)
print(y_pred)


Accuracy:  0.9444444444444444
[2 2 0 0 1 2 0 1 0 1 1 0 2 2 0 1 0 1]
[2 2 0 0 1 2 0 0 0 1 1 0 2 2 0 1 0 1]
