## 모듈 기능 정의

| 파라미터 | 설명 |
|---|---|
| `C`(cost) | 어느정도의 오차를 허용할지에 대한 파라미터 |
| `kernel` | 사용하고자 하는 커널함수 종류. 'linear', 'rbf'(기본값), 'sigmoid', 'poly' |
| `degree` | 다항 차수로 분류할 때 어느 차수까지 분류할지 결정 (kernel이 `poly`인 경우만 사용 가능) |

In [1]:
# 모듈이 사용할 패키지
from sklearn.model_selection import cross_val_score, cross_validate
from sklearn.model_selection import GridSearchCV
from pandas import DataFrame

In [2]:
def singleML(modelName, x, y=None, cv=5, *kargs):
    # 모델 생성
    model = modelName( *kargs)
    # 교차 검증
    score = cross_val_score(model, x, y, cv=cv).mean()
    # 결과 데이터 프레임
    df = DataFrame(cross_validate(model, x, y, cv=cv))
    return [model, score, df]

In [3]:
def gridML(modelName, x, y=None, params={}, cv=5, *kargs):
    model = modelName( *kargs)
    grid = GridSearchCV(model, param_grid=params, cv=cv)
    try:
        grid.fit(x, y)
    except:
        grid.fit(x)
    result_df = DataFrame(grid.cv_results_['params'])
    result_df['mean_test_score'] = grid.cv_results_['mean_test_score']
    result_df.sort_values(by='mean_test_score', ascending=False)
    return [grid.best_estimator_, grid.best_params_, result_df]

모듈 테스트

In [4]:
from pandas import read_excel
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVC

In [5]:
origin = read_excel('https://data.hossam.kr/G02/breast_cancer.xlsx')
x = origin.drop('target', axis=1)
y = origin['target']
x.shape, y.shape


((569, 30), (569,))

In [6]:
model, score, df = singleML(LinearRegression, x, y)
model

LinearRegression()

In [7]:
score

0.704686173464433

In [8]:
df

Unnamed: 0,fit_time,score_time,test_score
0,0.002999,0.001001,0.623595
1,0.003999,0.001,0.698961
2,0.00403,0.001973,0.755933
3,0.003002,0.000998,0.773021
4,0.003973,0.0,0.67192


In [9]:
model, score, df = singleML(SVC, x, y, kernel='linear', C=0.1, random_state=777)
model

TypeError: singleML() got an unexpected keyword argument 'kernel'

모듈 테스트 2

In [10]:
model, best, score = gridML(LinearRegression, x, y)
model

LinearRegression()

In [11]:
score

Unnamed: 0,mean_test_score
0,0.704686


In [13]:
params = {
    'C': [0.001, 0.01],
    'kernel': ['linear', 'rbf'],
    }
model, best, score = gridML(SVC, x, y, params=params, random_state=777)
model

TypeError: gridML() got an unexpected keyword argument 'random_state'

In [14]:
score

Unnamed: 0,mean_test_score
0,0.704686


In [15]:
best

{}