# Grid Search
- 모델에 가장 적합한 하이퍼 파라미터를 찾기 위해서 **반드시** 사용
- Hyper Parameter : 모델의 외적인 요소로 사용자에 의해 결정되는 값
- Parameter : 모델의 내적인 요소로 학습을 통해 결정되는 값

<br/>

- 최적의 Hyper Paramter 튜닝을 위해 실험 가능한 모든 조합을 학습하고 평가 필요
    - n_estimators : 트리모델의 트리 갯수
    - learning_rate : 경사하강법에서 데이터들의 최적의 선을 얼마만큼 단위로 할것이냐 
        - 기울기의 변화량
        - 0.1 ~ 0.01 정도로 아주 작게준다. 
    - min_samples_leaf, max_features
- max_depth : 트리 구조에서 성능에 가장 영향을 줌

# 교차 검증(Cross Validation)
- 고정적인 학습 데이터 세트로 모델을 만드는 경우 과대적합이 발생할 수 있음

# K-겹 교차 검증(K-Fold Cross Validation)
- 데이터를 동일한 개수인 K개의 Fold로 분할
    - K번 학습을 진행하며 매 학습마다 사용되는 Fold를 변경하면서 학습 및 평가
    - scikit-learn의 model_Selection.KFold를 이용(K - 여러번, Fold - 접어서, 여러번 나눠서라는 뜻)
- 교차 검증 분할 방법
    - model은 테스트 하고싶은 모델 객체를 만들어서
    - cv는 몇개로 쪼갤것인지 정해준다.
    - cross_val_score함수를 이용하면 교차 검증으로 점수 평가
    - cross_val_predict함수를 이용하면 교차 검증된 예측값 생성

In [1]:
from sklearn.datasets import load_digits
digits = load_digits()

y = digits['target']
x = digits['data']

In [2]:
# KNN 모델링
from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier()

# 데이터 분할 하지 않아도 Cross Validation이 쪼개준다.
from sklearn.model_selection import cross_val_score
cross_val_score(model, x, y, cv=5)

array([0.94722222, 0.95555556, 0.96657382, 0.98050139, 0.9637883 ])

In [3]:
# Grid Search
from sklearn.model_selection import GridSearchCV

params = {
    'n_neighbors' : range(1, 10)
}

gs = GridSearchCV(model, params).fit(x, y)

In [7]:
# 10번 돌린 결과
import pandas as pd

pd.DataFrame(gs.cv_results_)

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_n_neighbors,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score
0,0.000876,0.000379,0.017084,0.005656,1,{'n_neighbors': 1},0.961111,0.952778,0.966574,0.986072,0.955432,0.964393,0.011838,3
1,0.000413,9.4e-05,0.012427,0.002052,2,{'n_neighbors': 2},0.961111,0.966667,0.969359,0.977716,0.961003,0.967171,0.006181,1
2,0.000356,8e-06,0.012341,0.000419,3,{'n_neighbors': 3},0.955556,0.958333,0.966574,0.986072,0.966574,0.966622,0.010672,2
3,0.000366,8e-06,0.014077,0.000358,4,{'n_neighbors': 4},0.947222,0.958333,0.966574,0.980501,0.966574,0.963841,0.010946,4
4,0.000365,1.2e-05,0.013697,0.000645,5,{'n_neighbors': 5},0.947222,0.955556,0.966574,0.980501,0.963788,0.962728,0.011169,5
5,0.000366,7e-06,0.014107,0.00024,6,{'n_neighbors': 6},0.944444,0.958333,0.966574,0.97493,0.952646,0.959386,0.010612,7
6,0.000384,2.3e-05,0.013606,0.000174,7,{'n_neighbors': 7},0.936111,0.961111,0.969359,0.980501,0.952646,0.959946,0.015059,6
7,0.000387,3.3e-05,0.01381,0.000531,8,{'n_neighbors': 8},0.936111,0.958333,0.969359,0.977716,0.949861,0.958276,0.01458,8
8,0.000378,4e-06,0.013885,0.000617,9,{'n_neighbors': 9},0.930556,0.952778,0.972145,0.977716,0.949861,0.956611,0.016887,9


In [8]:
gs.best_score_

0.9671711544413494

In [9]:
gs.best_params_

{'n_neighbors': 2}

In [10]:
gs.best_estimator_

KNeighborsClassifier(n_neighbors=2)

# XGBOOST
- Decision Tree 기반 앙상블 머신 러닝 알고리즘

In [26]:
conda install py-xgboost

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /Users/harin/opt/anaconda3

  added / updated specs:
    - py-xgboost


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    _py-xgboost-mutex-2.0      |            cpu_0           8 KB
    conda-4.14.0               |   py39hecd8cb5_0         925 KB
    libxgboost-1.5.0           |       he9d5cce_2         1.2 MB
    py-xgboost-1.5.0           |   py39hecd8cb5_2         154 KB
    ------------------------------------------------------------
                                           Total:         2.3 MB

The following NEW packages will be INSTALLED:

  _py-xgboost-mutex  pkgs/main/osx-64::_py-xgboost-mutex-2.0-cpu_0
  libxgboost         pkgs/main/osx-64::libxgboost-1.5.0-he9d5cce_2
  py-xgboost         pkgs/main/osx-64::py-xgboost-1.5.0-py39hecd8cb5_2

The followi

In [29]:
import warnings
warnings.filterwarnings('ignore')

import xgboost as xgb
from sklearn.datasets import load_boston

boston = load_boston()

x = boston['data']
y = boston['target']

from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier()

from sklearn.model_selection import train_test_split
x_tr, x_te, y_tr, y_te = train_test_split(
        x,y,random_state=0
    )

model = xgb.XGBRegressor()
model.fit(x_tr, y_tr)
model.score(x_tr, y_tr), model.score(x_te, y_te)


(0.999999003030776, 0.7476326752660457)

In [31]:
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor()

model.fit(x_tr, y_tr)
model.score(x_tr, y_tr), model.score(x_te, y_te)

(0.9838691924682718, 0.8140666424127151)

In [None]:
# LightGBM
- Gradient Boosting Tree기반 학습 알고리즘
    - 트리가 수직적으로 확장되지만 (깊이 우선), 한쪽으로 깊게 확장되기 때문에 다른쪽에 의해 상쇄된다. 
    - 기본적으로 트리를 100개 만듬
- 속도가 빠름
- 트리의 깊이가 깊어지면 과대적합 발생, tr 높고 te 낮다.

# LightBGM

In [33]:
conda install lightgbm

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /Users/harin/opt/anaconda3

  added / updated specs:
    - lightgbm


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    lightgbm-3.2.1             |   py39h23ab428_0         996 KB
    ------------------------------------------------------------
                                           Total:         996 KB

The following NEW packages will be INSTALLED:

  lightgbm           pkgs/main/osx-64::lightgbm-3.2.1-py39h23ab428_0



Downloading and Extracting Packages
lightgbm-3.2.1       | 996 KB    | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Retrieving notices: ...working... done

Note: you may need to restart the kernel to use updated packages.


In [35]:
from lightgbm import LGBMRegressor

model = LGBMRegressor()
model.fit(x_tr, y_tr)
model.score(x_tr, y_tr), model.score(x_te, y_te)

(0.9758903113309795, 0.7410313895708159)