## [作業重點]
使用 Sklearn 中的 Lasso, Ridge 模型，來訓練各種資料集，務必了解送進去模型訓練的**資料型態**為何，也請了解模型中各項參數的意義。

機器學習的模型非常多種，但要訓練的資料多半有固定的格式，確保你了解訓練資料的格式為何，這樣在應用新模型時，就能夠最快的上手開始訓練！

## 練習時間
試著使用 sklearn datasets 的其他資料集 (boston, ...)，來訓練自己的線性迴歸模型，並加上適當的正則化來觀察訓練情形。

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression, Lasso, RidgeClassifier, Ridge
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.datasets import load_boston, load_wine
from sklearn.metrics import mean_squared_error

# Boston

In [3]:
boston = load_boston()

In [4]:
X = boston.data
y = boston.target

In [5]:
xtrain, xtest, ytrain, ytest = train_test_split(X, y, test_size = 0.3, random_state = 1)

In [6]:
LR = LinearRegression().fit(xtrain, ytrain)
pred_y = LR.predict(xtest)

In [7]:
print(f'score: {LR.score(xtest, ytest)}')
print(f'mean square error: {mean_squared_error(ytest, pred_y)}')
print(LR.coef_)

score: 0.7836295385076281
mean square error: 19.831323672063235
[-9.85424717e-02  6.07841138e-02  5.91715401e-02  2.43955988e+00
 -2.14699650e+01  2.79581385e+00  3.57459778e-03 -1.51627218e+00
  3.07541745e-01 -1.12800166e-02 -1.00546640e+00  6.45018446e-03
 -5.68834539e-01]


# Lasso Regression

In [8]:
LS = Lasso().fit(xtrain, ytrain)
pred_y = LS.predict(xtest)

In [9]:
print(f'score: {LS.score(xtest, ytest)}')
print(f'mean square error: {mean_squared_error(ytest, pred_y)}')
print(LS.coef_)

score: 0.6694782854622287
mean square error: 30.29379822196717
[-0.05256765  0.05904289 -0.          0.         -0.          0.
  0.01964989 -0.5767539   0.23300934 -0.01230686 -0.72109227  0.00600289
 -0.79711475]


In [10]:
score = []
for i in np.linspace(0,1,50):
    LS = Lasso(alpha=i).fit(xtrain, ytrain)
    score.append(LS.score(xtest, ytest))
print(f'max score: {max(score)}, alpha = {np.linspace(0,1,50)[score.index(max(score))]}')

max score: 0.786724355547493, alpha = 0.04081632653061224


In [11]:
LS = Lasso(alpha=np.linspace(0,1,50)[score.index(max(score))]).fit(xtrain, ytrain)
pred_y = LS.predict(xtest)
print(f'score: {LS.score(xtest, ytest)}')
print(f'mean square error: {mean_squared_error(ytest, pred_y)}')
print(LS.coef_)

score: 0.786724355547493
mean square error: 19.547669803600503
[-8.70083640e-02  6.24036754e-02 -0.00000000e+00  1.75661864e+00
 -8.10230570e+00  2.84882082e+00 -5.77990148e-03 -1.30647817e+00
  2.78354513e-01 -1.27532597e-02 -8.47389998e-01  7.28673663e-03
 -5.90728464e-01]


# Ridge Regression

In [12]:
RR = Ridge().fit(xtrain, ytrain)
pred_y = RR.predict(xtest)

In [13]:
print(f'score: {RR.score(xtest, ytest)}')
print(f'mean square error: {mean_squared_error(ytest, pred_y)}')
print(RR.coef_)

score: 0.7890510666829773
mean square error: 19.334416287843627
[-8.99352520e-02  6.20345865e-02  1.21404325e-02  2.23426149e+00
 -1.12838152e+01  2.89618901e+00 -4.81458007e-03 -1.36998976e+00
  2.83653073e-01 -1.22828776e-02 -8.84229846e-01  7.09753443e-03
 -5.80033848e-01]


In [14]:
score = []
for i in np.linspace(0,1,50):
    RR = Ridge(alpha=i).fit(xtrain, ytrain)
    score.append(RR.score(xtest, ytest))
print(f'max score: {max(score)}, alpha = {np.linspace(0,1,50)[score.index(max(score))]}')

max score: 0.7890510666829773, alpha = 1.0


--------------------------------------------------------------------------------------------------------------------------

# Ridge Classification (Wine)

In [15]:
wine = load_wine()

In [16]:
X= wine.data
y = wine.target

In [17]:
score = cross_val_score(RidgeClassifier(), X, y, cv=5).mean()
score

0.9891891891891891

In [18]:
score = []
for i in np.linspace(0,1,50):
    score.append(cross_val_score(RidgeClassifier(alpha=i), X, y, cv=5).mean())
print(f'max score: {max(score)}, alpha = {np.linspace(0,1,50)[score.index(max(score))]}')

max score: 0.9945945945945945, alpha = 0.7346938775510203
