## [作業重點]
使用 Sklearn 中的 Lasso, Ridge 模型，來訓練各種資料集，務必了解送進去模型訓練的**資料型態**為何，也請了解模型中各項參數的意義。

機器學習的模型非常多種，但要訓練的資料多半有固定的格式，確保你了解訓練資料的格式為何，這樣在應用新模型時，就能夠最快的上手開始訓練！

## 練習時間
試著使用 sklearn datasets 的其他資料集 (boston, ...)，來訓練自己的線性迴歸模型，並加上適當的正則話來觀察訓練情形。

In [86]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import datasets, linear_model
from sklearn.linear_model import Lasso, Ridge, LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score
from sklearn.preprocessing import MinMaxScaler, LabelEncoder
import warnings

warnings.filterwarnings(action='ignore')
pd.set_option('display.max_columns',20)
pd.set_option('display.width', 320)

In [80]:
def evaluate_lm(model, train_X, test_X, train_y, test_y):    
    model.fit(train_X, train_y)
    pred_y = model.predict(test_X)
    mse = mean_squared_error(test_y , pred_y)
    return model.coef_, mse
   

In [97]:
def corss_evaluate_lm(models, dataset):
    train_X, test_X, train_y, test_y =train_test_split(dataset['data'], dataset['target'], test_size=0.25, random_state=4)
    
    coef_list=list()
    mse_list = list()
    for m in models:
        coef, mse = evaluate_lm(m, train_X, test_X, train_y, test_y)
        coef_list.append(coef)
        mse_list.append(mse)
        
    df = pd.DataFrame({"coef":coef_list, "MSE":mse_list}, index=['LR','LASSO', 'Ridge'])
    return df


In [178]:
# Iris dataset
dataset = datasets.load_iris()
df = corss_evaluate_lm([LinearRegression(), Lasso(alpha=0.001),Ridge(alpha=0.09)], dataset)
df.head()

Unnamed: 0,coef,MSE
LR,"[-0.16123525788185863, -0.025981173734618224, ...",0.04902
LASSO,"[-0.15701034971858832, -0.02032600900275493, 0...",0.048536
Ridge,"[-0.16146469248420342, -0.024625896943732673, ...",0.048877


In [167]:
# Wine dataset
dataset = datasets.load_wine()
df = corss_evaluate_lm([LinearRegression(), Lasso(alpha=0.000001),Ridge(alpha=0.0001)], dataset)
df.head()

Unnamed: 0,coef,MSE
LR,"[-0.10139884196346022, 0.00890643104939967, -0...",0.065124
LASSO,"[-0.10139377952886096, 0.008905867597935418, -...",0.065125
Ridge,"[-0.1013988069782459, 0.008906441807992525, -0...",0.065124


In [158]:
# boston dataset
dataset = datasets.load_boston()
df = corss_evaluate_lm([LinearRegression(), Lasso(alpha=0.00000001),Ridge(alpha=0.00001)], dataset)
df.head()

Unnamed: 0,coef,MSE
LR,"[-0.11890155774498122, 0.050568940069523956, -...",26.951426
LASSO,"[-0.1189015560577721, 0.050568940310572734, -0...",26.951426
Ridge,"[-0.1189014965936104, 0.05056895021575447, -0....",26.951433


In [140]:
# breast_cancer dataset

dataset = datasets.load_breast_cancer()
df = corss_evaluate_lm([LinearRegression(), Lasso(alpha=0.00003),Ridge(alpha=0.004)], dataset)
df.head()

Unnamed: 0,coef,MSE
LR,"[0.24562233023137803, -0.004481769031142763, -...",0.059693
LASSO,"[0.03770127341004365, -0.006183951931555587, -...",0.056349
Ridge,"[0.13373932187137977, -0.008777678820532198, -...",0.057293
