## [作業重點]
使用 Sklearn 中的 Lasso, Ridge 模型，來訓練各種資料集，務必了解送進去模型訓練的**資料型態**為何，也請了解模型中各項參數的意義。

機器學習的模型非常多種，但要訓練的資料多半有固定的格式，確保你了解訓練資料的格式為何，這樣在應用新模型時，就能夠最快的上手開始訓練！

## 練習時間
試著使用 sklearn datasets 的其他資料集 (boston, ...)，來訓練自己的線性迴歸模型，並加上適當的正則化來觀察訓練情形。

In [11]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets, linear_model
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

In [78]:
#讀取Boston
btn = datasets.load_diabetes()

#切分訓練/測試集
X_train, X_test, y_train, y_test = train_test_split(btn.data, btn.target, test_size=0.2, random_state=4)

#建立線性回歸模型
lr = linear_model.LinearRegression()

#訓練回歸模型
lr.fit(X_train, y_train)

#測試回歸模型
y_pred = lr.predict(X_test)
y_pred

array([ 74.43391835,  93.01109527, 174.89737143,  52.70087943,
       180.63167083, 145.29216855, 112.87572919, 121.31665469,
        86.80032197,  72.09378757, 106.45223905, 193.85629433,
       182.28244231, 125.44954596, 155.88807338, 139.16274427,
       176.97750299, 119.391544  , 110.71450134, 183.79586685,
       215.37201852, 181.15778329,  58.74605273, 228.82344172,
        54.2078642 , 107.86621435, 157.1596501 , 180.13320036,
        62.20538852,  67.18498305, 190.81938489, 118.09494234,
       260.89115016, 183.19864659, 105.60921861, 175.91776536,
       176.89214476, 156.04517274, 146.9267604 , 157.34264891,
       198.17580795, 168.02789586, 237.98697647,  71.33866228,
       237.45207957, 108.07352281, 152.11732859,  50.30628893,
       199.55743787, 139.52692004, 110.45619532, 101.52014142,
       154.91606348, 228.90426313,  54.52750793, 187.29002271,
       106.15579971,  93.61499106, 190.00654826, 226.40047522,
       124.53876748,  97.0908313 , 168.71526129, 249.76

In [71]:
print(lr.coef_)

[  33.40877011 -292.24672884  481.07153405  369.06269614 -966.37849405
  589.81383056  232.61924401  288.3263166   802.72704593   37.81285219]


In [72]:
#預測值與實際時的差距 使用MSE
print('Mean Square error %.2f' % mean_squared_error(y_test, y_pred))

Mean Square error 2939.42


In [116]:
alphalist =[0.001, 0.01, 0.1, 1 ,10]

### LASSO 

In [120]:
def lassoModel(alpha):
    #建立線性回歸模型 with LASSO 正規化
    lso = linear_model.Lasso(alpha)

    #訓練回歸模型
    lso.fit(X_train, y_train)

    #測試回歸模型
    y_pred = lso.predict(X_test)
    print(lso.coef_)
    
    #預測值與實際時的差距 使用MSE
    print('alpha: %s ' % alpha, ' Mean Square error: %.2f' % mean_squared_error(y_test, y_pred), '\n')
    
    
for i in alphalist:
    lassoModel(i)

[  33.67286538 -291.04286293  481.70890733  368.32942092 -898.42842086
  538.72767558  199.24998287  274.4977345   778.40546382   37.12251045]
alpha: 0.001   Mean Square error: 2933.45 

[  33.90076776 -279.56061243  487.80357151  362.98760434 -454.07885677
  194.49093524   -0.          204.9980631   615.48787274   31.81300179]
alpha: 0.01   Mean Square error: 2905.60 

[   0.         -198.92007047  480.66671601  330.63402569  -26.57095924
   -0.         -209.48136823    0.          490.41780815    0.49979948]
alpha: 0.1   Mean Square error: 2877.23 

[  0.          -0.         321.203877    57.74744332   0.
   0.          -0.           0.         332.41817196   0.        ]
alpha: 1   Mean Square error: 3505.84 

[ 0.  0.  0.  0.  0.  0. -0.  0.  0.  0.]
alpha: 10   Mean Square error: 5460.56 



### Ridge

In [121]:
def ridgeModel(alpha):
    #建立線性回歸模型 with Ridge 正規化
    ridge = linear_model.Ridge(alpha)
    
    #訓練回歸模型
    ridge.fit(X_train, y_train)

    #測試回歸模型
    y_pred = ridge.predict(X_test)
    
    print(ridge.coef_)
    
    #預測值與實際時的差距 使用MSE
    print('alpha: %s ' % alpha, ' Mean Square error: %.2f' % mean_squared_error(y_test, y_pred), '\n')
    
    
for i in alphalist:
    ridgeModel(i)



[  34.52119955 -290.84626721  482.39358233  368.07364661 -852.49470901
  501.62792289  180.12864616  270.76704605  759.76023254   37.49066186]
alpha: 0.001   Mean Square error: 2931.77 

[  38.87160545 -283.00366006  485.04076958  362.44759568 -419.24994069
  168.12496699  -18.83740996  203.84384894  594.04104023   37.9221091 ]
alpha: 0.01   Mean Square error: 2908.78 

[  44.02025512 -241.69666596  452.98163524  332.04993719  -76.34010147
  -68.52063199 -164.98817213  149.9687712   431.61985919   58.51762582]
alpha: 0.1   Mean Square error: 2894.59 

[  48.8125786   -85.49511577  270.22532535  201.91767903   17.41308665
  -19.04346706 -136.47737574  122.26503311  247.60074795   95.59855598]
alpha: 1   Mean Square error: 3221.42 

[ 19.7381678   -2.31653333  62.15925697  49.54623554  18.92715009
  12.4573754  -39.60090964  42.81978067  61.57147383  35.24730561]
alpha: 10   Mean Square error: 4589.00 



In [None]:
#LASSO 及 Ridge 結果最好的情形是alpha=0.1的情況下 