## [作業重點]
使用 Sklearn 中的 Lasso, Ridge 模型，來訓練各種資料集，務必了解送進去模型訓練的**資料型態**為何，也請了解模型中各項參數的意義。

機器學習的模型非常多種，但要訓練的資料多半有固定的格式，確保你了解訓練資料的格式為何，這樣在應用新模型時，就能夠最快的上手開始訓練！

## 練習時間
試著使用 sklearn datasets 的其他資料集 (boston, ...)，來訓練自己的線性迴歸模型，並加上適當的正則話來觀察訓練情形。

In [24]:
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

In [25]:
boston = datasets.load_boston()
train_X = boston.data
train_y = boston.target
train_X, test_X, train_y, test_y = train_test_split(train_X, train_y, test_size=0.2)
print(train_X.shape, test_X.shape, train_y.shape, test_y.shape, sep='\n')

(404, 13)
(102, 13)
(404,)
(102,)


In [26]:
reg = linear_model.LinearRegression()
reg.fit(train_X, train_y)
prediction = reg.predict(test_X)
mse = mean_squared_error(test_y, prediction)
print('linear regression:\t', mse)

reg = linear_model.Lasso(alpha=0.1)
reg.fit(train_X, train_y)
prediction = reg.predict(test_X)
mse = mean_squared_error(test_y, prediction)
print('lasso:\t', mse)

linear regression:	 21.282388116070177
lasso:	 22.78953658315812


In [27]:
reg.coef_

array([-0.02524835,  0.06361187, -0.05030965,  1.48593296, -0.        ,
        3.48155861, -0.00881344, -1.24280463,  0.28481561, -0.01454066,
       -0.62765398,  0.01238876, -0.59955758])

In [28]:
wine = datasets.load_wine()
train_X = wine.data
train_y = wine.target
train_X, test_X, train_y, test_y = train_test_split(train_X, train_y, test_size=0.2)
print(train_X.shape, test_X.shape, train_y.shape, test_y.shape, sep='\n')

(142, 13)
(36, 13)
(142,)
(36,)


In [29]:
reg = linear_model.LinearRegression()
reg.fit(train_X, train_y)
prediction = reg.predict(test_X)
mse = mean_squared_error(test_y, prediction)
print('linear regression:\t', mse)

reg = linear_model.Ridge(alpha=0.1)
reg.fit(train_X, train_y)
prediction = reg.predict(test_X)
mse = mean_squared_error(test_y, prediction)
print('ridge:\t', mse)

linear regression:	 0.060471872398158766
ridge:	 0.060438468121856305


In [30]:
prediction = reg.predict(test_X)
print(prediction)

[ 1.13669214  0.75204296  0.65919283  1.77218216  1.41896253  0.89298065
 -0.06499556  2.10953006 -0.45886127  0.00772966  2.00587016  0.08096961
  1.2316556   1.82278164  1.94248882  0.2337511   0.67505842  1.82785803
 -0.00432057 -0.53319974  1.04195955  2.02912069  2.06290213 -0.34033351
  0.94749544  1.19930519  0.91667149  1.3372233   1.125705   -0.23952273
  2.05900312  1.05241046 -0.24251327  1.64392533 -0.0484721   2.27079112]


In [31]:
reg.coef_

array([-0.13430206,  0.0259775 , -0.18902802,  0.03666031,  0.00054601,
        0.16227167, -0.36401331, -0.19277003,  0.00857269,  0.07415035,
       -0.21189228, -0.26829114, -0.00077736])