### 규제
* 학습이 과대적합도는 것을 방지하고자 하는 알고리즘
* 라쏘(Lasso)
    - L1규제를 추가한 모형
    - 영향력이 크지 않은 회귀계수 값을 0으로 만드는 특성이 있다.
        * 회귀계수 : 독립변수의 값이 변화함에 따라 종속변수에 미치는 영향력 크기
    - alpha를 이용하여 가중치 제어. alpha값에 따라 과적합될 우려가 있다.
* 릿지(Ridge)
    - L2규제를 추가한 모형
    - 계수값을 0이 아닌 작게 만드는 특성이 있다.
    - alpha를 이용하여 가중치 제어. alpha값에 따라 과적합될 우려가 있다.
* 엘라스틱넷(ElasticNet)
    - L1, L2를 함께 결합한 모형
    - 피처가 많은 데이터세트에 적용
    - L1 규제로 feature의 수를 줄이고 L2규제로 계수값의 크기를 조정
    - 파라미터
        * alpha : L1규제의 alpha(a) + L2규제의 alpha(b). L1과 L2의 alpha를 합처논 것이다.
        * l1_ratio = 0 : 0에 가까워 질수록 L2규제와 동일
        * l1_ratio = 1 : 1에 가까워 질수록 L1규제와 동일
        * 0 < l1_ratio < 1 : L1과 L2규제를 적절히 적용
* 계수 : 계산해서 얻은 값

In [1]:
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')

### 데이터 셋 로드

In [2]:
from sklearn.datasets import load_boston
data = load_boston()
data['DESCR'] #데이터 셋 정보

".. _boston_dataset:\n\nBoston house prices dataset\n---------------------------\n\n**Data Set Characteristics:**  \n\n    :Number of Instances: 506 \n\n    :Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.\n\n    :Attribute Information (in order):\n        - CRIM     per capita crime rate by town\n        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.\n        - INDUS    proportion of non-retail business acres per town\n        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)\n        - NOX      nitric oxides concentration (parts per 10 million)\n        - RM       average number of rooms per dwelling\n        - AGE      proportion of owner-occupied units built prior to 1940\n        - DIS      weighted distances to five Boston employment centres\n        - RAD      index of accessibility to radial highways\n        - TAX      full-value property-tax rate per $10,000

In [3]:
df = pd.DataFrame(data['data'], columns=data['feature_names'])
df['PRICE'] = data['target']
df.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,PRICE
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33,36.2


### train, test셋 분리

In [4]:
from sklearn.model_selection import train_test_split

y = df['PRICE']
X = df.drop(['PRICE'], axis=1)

X_train, X_test, y_train, y_test = \
                            train_test_split(X,y, test_size = 0.3, random_state=62)

In [5]:
X_train.shape, X_test.shape

((354, 13), (152, 13))

### Lasso(L1) 모델 생성


In [6]:
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import Lasso

lasso = Lasso(alpha = 0.07)
lasso.fit(X_train, y_train)
train_pred = lasso.predict(X_train)
test_pred = lasso.predict(X_test)

print("test : ", lasso.score(X_test, y_test),
     'mse : ', mean_squared_error(y_test, test_pred))

print("train : ", lasso.score(X_train, y_train),
     'mse : ', mean_squared_error(y_train, train_pred))

test :  0.7315170233570516 mse :  24.24600967947954
train :  0.7248696715438718 mse :  22.46812664143133


### 하이퍼 파라미터 및 그리드서치

In [20]:
alphas = [0.07, 0.1, 0.5, 1.3, 2]
for alpha in alphas : 
    lasso = Lasso(alpha=alpha)
    lasso.fit(X_train, y_train)
    
    train_pred = lasso.predict(X_train)
    test_pred = lasso.predict(X_test)
    
    train_score = lasso.score(X_train, y_train)
    test_score = lasso.score(X_test, y_test)
    train_mse = mean_squared_error(y_train, train_pred)
    test_mse = mean_squared_error(y_test, test_pred)
    
    print('alpha : ', alpha, )
    print('train score : ',train_score, 'mse : ',train_mse)
    print('test score : ',test_score, 'mse : ',test_mse) 
    print('============')

alpha :  0.07
train score :  0.7248696715438718 mse :  22.46812664143133
test score :  0.7315170233570516 mse :  24.24600967947954
alpha :  0.1
train score :  0.7237532819130617 mse :  22.559295011516067
test score :  0.7319784944128576 mse :  24.20433541086938
alpha :  0.5
train score :  0.7064351942782625 mse :  23.9735519869266
test score :  0.7214728619622657 mse :  25.15307215861802
alpha :  1.3
train score :  0.6481352834449008 mse :  28.73453121861824
test score :  0.6623076594679015 mse :  30.496129995295945
alpha :  2
train score :  0.6196354069647545 mse :  31.061933063461336
test score :  0.6385534109506746 mse :  32.641315312737454


In [10]:
from sklearn.model_selection import GridSearchCV
params = {'alpha': [0.07, 0.1, 0.5, 1.3, 2]}

lasso = Lasso()
grid_cv = GridSearchCV(lasso, param_grid=params, cv=5, n_jobs=-1)

grid_cv.fit(X_train, y_train)

print('최적의 하이퍼 파라미터 : ', grid_cv.best_params_)
print('train : ',grid_cv.score(X_train, y_train))
print('test : ',grid_cv.score(X_test, y_test))



최적의 하이퍼 파라미터 :  {'alpha': 0.07}
train :  0.7248696715438718
test :  0.7315170233570516


### 영향력이 적은 회귀계수값 확인
* 회귀계수값은 coef_ 변수에 저장되어있다.

In [15]:
lasso = Lasso(alpha =0.07)
lasso.fit(X_train,y_train)
print(X_train.columns)
lasso.coef_

Index(['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX',
       'PTRATIO', 'B', 'LSTAT'],
      dtype='object')


array([-0.10166253,  0.04797193, -0.02495535,  2.27344937, -0.        ,
        3.64086796, -0.01396383, -1.21125038,  0.2190226 , -0.01217624,
       -0.77323459,  0.00771795, -0.57778092])

### Ridge( L2 ) 모델생성
* 회귀계수 값을 0이 아닌 작게 만드는 특성
### 하이퍼파라미터 및 GridSearchCV

In [16]:
from sklearn.linear_model import Ridge

alphas = [0.01, 0.1, 1, 10, 100]

In [19]:
for alpha in alphas : 
    lasso = Ridge(alpha=alpha)
    lasso.fit(X_train, y_train)
    
    train_pred = lasso.predict(X_train)
    test_pred = lasso.predict(X_test)
    
    train_score = lasso.score(X_train, y_train)
    test_score = lasso.score(X_test, y_test)
    train_mse = mean_squared_error(y_train, train_pred)
    test_mse = mean_squared_error(y_test, test_pred)
    
    print('alpha : ', alpha, )
    print('train score : ',train_score, 'mse : ',train_mse)
    print('test score : ',test_score, 'mse : ',test_mse) 
    print('============')


alpha :  0.01
train score :  0.7346764750602819 mse :  21.667267991675196
test score :  0.7479396123757658 mse :  22.762927745240564
alpha :  0.1
train score :  0.7346066990135265 mse :  21.672966153211355
test score :  0.7471900208389715 mse :  22.830621436229922
alpha :  1
train score :  0.7325079915460022 mse :  21.844354111159326
test score :  0.7415374169631822 mse :  23.341093608433663
alpha :  10
train score :  0.7263040676657218 mse :  22.35098872391948
test score :  0.7334265758857685 mse :  24.07356288350886
alpha :  100
train score :  0.707472496673353 mse :  23.888842163370867
test score :  0.7186488673320299 mse :  25.408099877677298


In [21]:
from sklearn.model_selection import GridSearchCV
params = {'alpha': [0.01, 0.1, 1, 10, 100]}

ridge = Ridge()
grid_cv = GridSearchCV(ridge, param_grid=params, cv=5, n_jobs=-1)

grid_cv.fit(X_train, y_train)

print('최적의 하이퍼 파라미터 : ', grid_cv.best_params_)
print('train : ',grid_cv.score(X_train, y_train))
print('test : ',grid_cv.score(X_test, y_test))


최적의 하이퍼 파라미터 :  {'alpha': 0.01}
train :  0.7346764750602819
test :  0.7479396123757658


In [23]:
ridge= Ridge(alpha =0.01)
ridge.fit(X_train,y_train)
print(X_train.columns)
ridge.coef_

Index(['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX',
       'PTRATIO', 'B', 'LSTAT'],
      dtype='object')


array([-1.11737208e-01,  4.44769908e-02,  2.72343147e-02,  3.19935163e+00,
       -1.56424780e+01,  3.70108235e+00, -3.40152534e-03, -1.46034656e+00,
        2.60073435e-01, -1.09004395e-02, -9.30784278e-01,  7.51138791e-03,
       -5.34889305e-01])

### ElasticNet
* l1, l2 를 함께 결합한 모형
* l1_ratto
    - 0에 가까울 수록 l2 (릿지)
    - 1에 가까울 수록 l1 (랏소)

In [24]:
from sklearn.linear_model import ElasticNet

ratios = [0.2,0.5,0.8]
alphas = [0.1,0.7,1.5]

In [25]:
for alpha in alphas : 
    for ratio in ratios: 
        el = ElasticNet(alpha=alpha, l1_ratio = ratio)
        el.fit(X_train, y_train)
        
        train_pred = el.predict(X_train)
        test_pred = el.predict(X_test)

        train_score = el.score(X_train, y_train)
        test_score = el.score(X_test, y_test)
        
        train_mse = mean_squared_error(y_train, train_pred)
        test_mse = mean_squared_error(y_test, test_pred)
        
        
        print('alpha : ', alpha, 'ratio : ',ratio)
        print('train score : ',train_score, 'mse : ',train_mse)
        print('test score : ',test_score, 'mse : ',test_mse) 
        print('============')

alpha :  0.1 ratio :  0.2
train score :  0.7204973693611653 mse :  22.82518447546551
test score :  0.7296547702305272 mse :  24.41418498763473
alpha :  0.1 ratio :  0.5
train score :  0.7214541237372115 mse :  22.74705249122961
test score :  0.7303225463424128 mse :  24.35387983803101
alpha :  0.1 ratio :  0.8
train score :  0.7228141466531166 mse :  22.63598815571533
test score :  0.7314335443018998 mse :  24.253548459049664
alpha :  0.7 ratio :  0.2
train score :  0.6901739272760627 mse :  25.301505209698316
test score :  0.7035669872214132 mse :  26.77010582576043
alpha :  0.7 ratio :  0.5
train score :  0.6902294842780352 mse :  25.29696822621438
test score :  0.7043923079947568 mse :  26.695573221461366
alpha :  0.7 ratio :  0.8
train score :  0.692424964353938 mse :  25.117677470953314
test score :  0.7065509814240449 mse :  26.500628955289283
alpha :  1.5 ratio :  0.2
train score :  0.6690866533323439 mse :  27.023567419810416
test score :  0.6852820138548638 mse :  28.421374918

In [27]:
params = {'l1_ratio':[0.2, 0.5, 0.8],
           'alpha': [0.001 ,0.1, 0.7]}

el = ElasticNet()
grid_cv = GridSearchCV(el, param_grid=params, cv=5, n_jobs=-1)

grid_cv.fit(X_train, y_train)

print('최적의 하이퍼 파라미터 : ', grid_cv.best_params_)
print('train : ',grid_cv.score(X_train, y_train))
print('test : ',grid_cv.score(X_test, y_test))


최적의 하이퍼 파라미터 :  {'alpha': 0.001, 'l1_ratio': 0.8}
train :  0.7346180073100931
test :  0.747286425699128
