## [作業重點]
使用 Sklearn 中的 Lasso, Ridge 模型，來訓練各種資料集，務必了解送進去模型訓練的**資料型態**為何，也請了解模型中各項參數的意義。

機器學習的模型非常多種，但要訓練的資料多半有固定的格式，確保你了解訓練資料的格式為何，這樣在應用新模型時，就能夠最快的上手開始訓練！

## 練習時間
試著使用 sklearn datasets 的其他資料集 (boston, ...)，來訓練自己的線性迴歸模型，並加上適當的正則化來觀察訓練情形。

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets, linear_model
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, accuracy_score

# 設定alpha值
alpha_list = [0.25, 0.5, 0.75, 1]

In [2]:
# =============================================================================
# data : boston 
# =============================================================================
# 讀取資料
boston = datasets.load_boston() #此資料為迴歸問題，故使線性迴歸

# 資料切分
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.2, random_state=4)

# 建立一邏輯斯迴歸模型
regr = linear_model.LinearRegression()

# 將訓練模型投入模型訓練
regr.fit(x_train, y_train)

# 預測驗證資料集
y_pred = regr.predict(x_test)

# 觀察參數
print('Parameters of LinearRegression: ', regr.coef_)

# 觀察MSE
print('Mean squared error of LinearRegression: %.2f' % mean_squared_error(y_test, y_pred))

Parameters of LinearRegression:  [-1.15966452e-01  4.71249231e-02  8.25980146e-03  3.23404531e+00
 -1.66865890e+01  3.88410651e+00 -1.08974442e-02 -1.54129540e+00
  2.93208309e-01 -1.34059383e-02 -9.06296429e-01  8.80823439e-03
 -4.57723846e-01]
Mean squared error of LinearRegression: 25.42


In [3]:
# =============================================================================
# LASSO 
# =============================================================================
for alpha_value in alpha_list:
    lasso = linear_model.Lasso(alpha=alpha_value)
    lasso.fit(x_train, y_train)
    y_pred = lasso.predict(x_test)
    
    print('Lasso(alpha=%.2f):' % alpha_value)
    print('Parameters of Lasso: ', lasso.coef_)
    print('Mean squared error of Lasso: %.2f' % mean_squared_error(y_test, y_pred))
    print("-"*30)


Lasso(alpha=0.25):
Parameters of Lasso:  [-0.10104019  0.04880253 -0.02613895  0.         -0.          3.4145279
 -0.01364913 -1.14216044  0.26486613 -0.01577282 -0.75187568  0.0094572
 -0.53589797]
Mean squared error of Lasso: 26.62
------------------------------
Lasso(alpha=0.50):
Parameters of Lasso:  [-0.08860117  0.04829133 -0.01107435  0.         -0.          2.66101769
 -0.00307949 -0.98440282  0.25664031 -0.01593271 -0.73252329  0.00884426
 -0.59210164]
Mean squared error of Lasso: 26.94
------------------------------
Lasso(alpha=0.75):
Parameters of Lasso:  [-0.07676773  0.04701493 -0.          0.         -0.          1.9352261
  0.00205618 -0.8683967   0.24633848 -0.01586606 -0.71302737  0.00827
 -0.64015726]
Mean squared error of Lasso: 27.78
------------------------------
Lasso(alpha=1.00):
Parameters of Lasso:  [-0.06494981  0.04581458 -0.          0.         -0.          1.18140024
  0.01109101 -0.73695809  0.23350042 -0.01551065 -0.69270805  0.00763157
 -0.6927848 ]
Mean

In [4]:
# =============================================================================
# Ridge
# =============================================================================
for alpha_value in alpha_list:
    ridge = linear_model.Ridge(alpha=alpha_value)
    ridge.fit(x_train, y_train)
    y_pred = ridge.predict(x_test)
    
    print('Ridge(alpha=%.2f):' % alpha_value)
    print('Parameters of Ridge: ', ridge.coef_)
    print('Mean squared error of Ridge: %.2f' % mean_squared_error(y_test, y_pred))
    print("-"*30)

Ridge(alpha=0.25):
Parameters of Ridge:  [-1.14651940e-01  4.74169041e-02 -3.86509860e-03  3.14636886e+00
 -1.39471613e+01  3.90538343e+00 -1.33609910e-02 -1.50112557e+00
  2.86315374e-01 -1.35785155e-02 -8.77924960e-01  8.93838361e-03
 -4.60633067e-01]
Mean squared error of Ridge: 25.51
------------------------------
Ridge(alpha=0.50):
Parameters of Ridge:  [-1.13720313e-01  4.76370805e-02 -1.25294762e-02  3.07531514e+00
 -1.19789169e+01  3.91845004e+00 -1.51046851e-02 -1.47224633e+00
  2.81475089e-01 -1.37075697e-02 -8.57737917e-01  9.03172343e-03
 -4.62924119e-01]
Mean squared error of Ridge: 25.60
------------------------------
Ridge(alpha=0.75):
Parameters of Ridge:  [-1.13029384e-01  4.78119108e-02 -1.90160032e-02  3.01496621e+00
 -1.04964311e+01  3.92634998e+00 -1.63952178e-02 -1.45047798e+00
  2.77925486e-01 -1.38090969e-02 -8.42702118e-01  9.10184688e-03
 -4.64824144e-01]
Mean squared error of Ridge: 25.68
------------------------------
Ridge(alpha=1.00):
Parameters of Ridge: 

In [5]:
# =============================================================================
# data : wine
# =============================================================================
# 讀取資料
wine = datasets.load_wine() #此資料為分類問題，故使用邏輯斯回歸

# 資料切分
x_train, x_test, y_train, y_test = train_test_split(wine.data, wine.target, test_size=0.2, random_state=4)

# 建立線性迴歸模型
logreg = linear_model.LogisticRegression()

# 將訓練模型投入模型訓練
logreg.fit(x_train, y_train)

# 預測驗證資料集
y_pred = logreg.predict(x_test)

# 觀察參數
print('Parameters of LinearRegression: ', logreg.coef_)

# 預測值與實際值的差距，使用Accuracy
acc = accuracy_score(y_test, y_pred)
print('Accuracy: ', acc)

Parameters of LinearRegression:  [[-6.82864779e-01  7.19709566e-01  9.78123238e-01 -5.71326897e-01
  -3.15688084e-02  3.00522775e-01  1.11716506e+00 -3.43549778e-02
  -4.90150215e-01 -1.05374113e-02 -1.54185796e-01  9.61331414e-01
   1.81479366e-02]
 [ 9.32405991e-01 -1.02836307e+00 -7.03687526e-01  2.35034368e-01
   8.51406104e-03  7.62359762e-02  4.71638459e-01  5.60638803e-01
   6.15085511e-01 -1.81947987e+00  9.33098198e-01  7.36442197e-02
  -1.40242413e-02]
 [-4.72180741e-01  6.31034394e-01 -6.36847579e-02  1.56380289e-01
   3.13408128e-02 -7.52374558e-01 -1.62587954e+00 -1.31786834e-01
  -7.01391158e-01  1.03384290e+00 -4.87953685e-01 -1.15357424e+00
   1.40302540e-04]]
Accuracy:  0.9722222222222222




In [9]:
# =============================================================================
# LogisticRegression(L1)
# =============================================================================
for alpha_value in alpha_list:
    logreg = linear_model.LogisticRegression(penalty='l1',C=alpha_value)
    logreg.fit(x_train, y_train)
    y_pred = logreg.predict(x_test)
    
    print('LogisticRegression(L1)(alpha=%.2f):' % alpha_value)
    print('Parameters of Lasso: ', logreg.coef_)
    acc = accuracy_score(y_test, y_pred)
    print('Accuracy: ', acc)
    print("-"*30)

LogisticRegression(L1)(alpha=0.25):
Parameters of Lasso:  [[ 0.          0.          0.         -0.46380957 -0.04938031  0.
   0.6577548   0.          0.          0.          0.          0.
   0.01548334]
 [ 0.53567011 -0.57230653  0.          0.15811602  0.02459615  0.
   0.32969191  0.          0.         -1.30220391  0.          0.
  -0.00942791]
 [ 0.          0.08349246  0.          0.01187655  0.          0.
  -1.96317136  0.          0.          0.79753565  0.         -0.35175169
  -0.00223921]]
Accuracy:  0.9444444444444444
------------------------------
LogisticRegression(L1)(alpha=0.50):
Parameters of Lasso:  [[-2.36600044e-01  2.91567552e-01  0.00000000e+00 -4.70806377e-01
  -4.66345872e-02  0.00000000e+00  1.12173517e+00  0.00000000e+00
   0.00000000e+00  0.00000000e+00  0.00000000e+00  1.33019285e-01
   1.66809903e-02]
 [ 8.53446304e-01 -8.53946080e-01  0.00000000e+00  1.75667633e-01
   1.91650684e-02  0.00000000e+00  5.56353300e-01  0.00000000e+00
   4.77403748e-03 -1.741



In [10]:
# =============================================================================
# LogisticRegression(L2)
# =============================================================================
for alpha_value in alpha_list:
    logreg = linear_model.LogisticRegression(penalty='l2',C=alpha_value)
    logreg.fit(x_train, y_train)
    y_pred = logreg.predict(x_test)
    
    print('LogisticRegression(L2)(alpha=%.2f):' % alpha_value)
    print('Parameters of Ridge: ', logreg.coef_)
    acc = accuracy_score(y_test, y_pred)
    print('Accuracy: ', acc)
    print("-"*30)

LogisticRegression(L2)(alpha=0.25):
Parameters of Ridge:  [[-0.33662807  0.3612442   0.31358185 -0.48990432 -0.04392272  0.26399046
   0.64652145 -0.04462011 -0.03123821 -0.04008826 -0.04974098  0.50943412
   0.01698277]
 [ 0.52463624 -0.69686805 -0.24757697  0.17894567  0.01714422  0.15194874
   0.29514996  0.16088707  0.29967388 -1.21858598  0.37515763  0.25163519
  -0.01078697]
 [-0.29411123  0.46529182 -0.0343909   0.10938236  0.01455958 -0.50012567
  -1.01472716 -0.0237717  -0.40527816  0.82218175 -0.25667654 -0.75951823
  -0.00135498]]
Accuracy:  0.9722222222222222
------------------------------
LogisticRegression(L2)(alpha=0.50):
Parameters of Ridge:  [[-4.92361444e-01  5.32574611e-01  5.72083024e-01 -5.26213503e-01
  -3.93875874e-02  3.02786091e-01  8.64059553e-01 -4.46498662e-02
  -1.99163468e-01 -2.73304769e-02 -9.30081211e-02  7.12834501e-01
   1.75579152e-02]
 [ 7.19862561e-01 -8.64354574e-01 -4.21498219e-01  2.02823453e-01
   1.34333484e-02  1.34446512e-01  3.62524420e-01 

