✔️ 하이퍼 파라미터 (Hyper Parameter)

: 모델 성능에 영향을 미치며 수동 조절

✔️ 모델 파라미터 (Model Parameter)

: 모델 구성 변수들 즉 회귀계수, 가중치 학습으로 새로운 샘플에 대한 예측을 하기 위해 사용

In [1]:
def add(a, b):
    return a + b

# a, b : parameter (매개변수)

In [2]:
print(add(4, 5))

# 4, 5 : argument (인자)

9


### 모델 성능 향상 위한 튜닝 (Tuning)
✔️ 하이퍼파라미터의 값을 변경해 모델 성능 평가를 향상시키는 것

✔️ 하이퍼파라미터는 학습 모델 마다 다름‼️

✔️ sklearn에서는 자동화 함수로 GridSearchCV() 함수 제공

In [3]:
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV

In [4]:
model = SVC()

In [5]:
SVC(C = 0.1, kernel = 'linear')

SVC(C=0.1, kernel='linear')

In [6]:
SVC(C = 0.1, kernel = 'rbf')

SVC(C=0.1)

In [7]:
from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV
iris = datasets.load_iris()
parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
svc = svm.SVC()
clf = GridSearchCV(svc, parameters, return_train_score = True)
clf.fit(iris.data, iris.target)

sorted(clf.cv_results_.keys())

['mean_fit_time',
 'mean_score_time',
 'mean_test_score',
 'mean_train_score',
 'param_C',
 'param_kernel',
 'params',
 'rank_test_score',
 'split0_test_score',
 'split0_train_score',
 'split1_test_score',
 'split1_train_score',
 'split2_test_score',
 'split2_train_score',
 'split3_test_score',
 'split3_train_score',
 'split4_test_score',
 'split4_train_score',
 'std_fit_time',
 'std_score_time',
 'std_test_score',
 'std_train_score']

In [20]:
# 결과 => GridSearchCV의 속성에 존재
clf.cv_results_, clf.best_params_, clf.best_score_

({'mean_fit_time': array([0.00039983, 0.00019979, 0.        , 0.        ]),
  'std_fit_time': array([0.00048973, 0.00039959, 0.        , 0.        ]),
  'mean_score_time': array([0.00031133, 0.00060153, 0.0001049 , 0.0005022 ]),
  'std_score_time': array([0.00041773, 0.00049126, 0.00020981, 0.00063596]),
  'param_C': masked_array(data=[1, 1, 10, 10],
               mask=[False, False, False, False],
         fill_value='?',
              dtype=object),
  'param_kernel': masked_array(data=['linear', 'rbf', 'linear', 'rbf'],
               mask=[False, False, False, False],
         fill_value='?',
              dtype=object),
  'params': [{'C': 1, 'kernel': 'linear'},
   {'C': 1, 'kernel': 'rbf'},
   {'C': 10, 'kernel': 'linear'},
   {'C': 10, 'kernel': 'rbf'}],
  'split0_test_score': array([0.96666667, 0.96666667, 1.        , 0.96666667]),
  'split1_test_score': array([1.        , 0.96666667, 1.        , 1.        ]),
  'split2_test_score': array([0.96666667, 0.96666667, 0.9       , 0.

In [10]:
# CV 결과를 데이터프레임으로 변환
import pandas as pd

cvDF = pd.DataFrame(clf.cv_results_)

In [23]:
cvDF

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_C,param_kernel,params,split0_test_score,split1_test_score,split2_test_score,...,mean_test_score,std_test_score,rank_test_score,split0_train_score,split1_train_score,split2_train_score,split3_train_score,split4_train_score,mean_train_score,std_train_score
0,0.0004,0.00049,0.000311,0.000418,1,linear,"{'C': 1, 'kernel': 'linear'}",0.966667,1.0,0.966667,...,0.98,0.01633,1,0.975,0.975,0.991667,0.983333,0.983333,0.981667,0.006236
1,0.0002,0.0004,0.000602,0.000491,1,rbf,"{'C': 1, 'kernel': 'rbf'}",0.966667,0.966667,0.966667,...,0.966667,0.021082,4,0.983333,0.958333,0.983333,0.983333,0.958333,0.973333,0.012247
2,0.0,0.0,0.000105,0.00021,10,linear,"{'C': 10, 'kernel': 'linear'}",1.0,1.0,0.9,...,0.973333,0.038873,3,0.966667,0.966667,0.991667,0.991667,0.975,0.978333,0.011304
3,0.0,0.0,0.000502,0.000636,10,rbf,"{'C': 10, 'kernel': 'rbf'}",0.966667,1.0,0.966667,...,0.98,0.01633,1,0.975,0.983333,0.991667,0.991667,0.983333,0.985,0.006236


In [12]:
clf.best_params_

{'C': 1, 'kernel': 'linear'}

In [13]:
clf.best_score_

0.9800000000000001

In [24]:
# 최적의 하이퍼파라미터 적용된 모델
svc = clf.best_estimator_

In [25]:
svc

SVC(C=1, kernel='linear')

In [16]:
parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
svc1 = svm.SVC(kernel = 'linear', C = 10)

In [17]:
svc2 = svm.SVC(kernel = 'linear', C = 3)

In [18]:
svc3 = svm.SVC(kernel = 'rbf', C = 10)