# 지도학습 (Supervised learning)
- 라벨링이 된 데이터를 학습시키는 것을 의미
- 타깃의 형태에 따라 세부적으로 두 가지 종류로 나눌 수 있음.
 * 타깃=범주형 : 분류(Classification)
 * 타깃=연속형 : 회귀(Regression)

## KNN
- 비교 대상이 되는 데이터 포인트 주변에 가장 가까이 존재하는 k개의 데이터와 비교해 가장 가까운 데이터 종류로 판별.    
- 타깃이 연속형 숫자라면 주변 k개의 데이터의 평균값으로 예측하는 방법을 사용.
- 게으른 학습(Lazy learning) : 트레이닝 데이터 전체를 메모리상에 보관하면서 테스트 데이터가 새로 들어왔을 때 바로 학습하는 것을 의미함.    
 - 장점:추가적인 학습시간 없이 곧바로 학습 결과를 얻을 수 있음.    
 - 단점:예측 시 메모리상에 학습용 데이터를 항상 보관하고 있어야 하므로 메모리 용량보다 데이터가 지나치게 큼.

__데이터 불러오기. 피처와 타깃 데이터 지정하기__

In [50]:
from sklearn import datasets
raw_iris=datasets.load_iris() #꽃 분류
X = raw_iris.data
y = raw_iris.target

__트레이닝, 테스트 데이터 분할__

In [4]:
from sklearn.model_selection import train_test_split
X_tn, X_te, y_tn, y_te = train_test_split(X,y,random_state=0) #random seed 정하기

__데이터 표준화__

In [6]:
from sklearn.preprocessing import StandardScaler
std_scale = StandardScaler()
std_scale.fit(X_tn)
X_tn_std = std_scale.transform(X_tn)
X_te_std = std_scale.transform(X_te)

__데이터 학습__

In [16]:
from sklearn.neighbors import KNeighborsClassifier
clf_knn = KNeighborsClassifier(n_neighbors=2)
clf_knn.fit(X_tn_std,y_tn)

KNeighborsClassifier(n_neighbors=2)

__데이터 예측__

In [17]:
knn_pred = clf_knn.predict(X_te_std)
print(knn_pred)

[2 1 0 2 0 2 0 1 1 1 1 1 1 1 1 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0 2 1 0 2 2 1 0
 2]


__정확도 평가__

In [18]:
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_te,knn_pred)
print(accuracy)

0.9473684210526315


__confusion matrix 확인__

In [19]:
from sklearn.metrics import confusion_matrix
conf_matrix = confusion_matrix(y_te,knn_pred)
print(conf_matrix)

[[13  0  0]
 [ 0 15  1]
 [ 0  1  8]]


__분류 리포트 확인__

In [20]:
from sklearn.metrics import classification_report
class_report = classification_report(y_te,knn_pred)
print(class_report)

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        13
           1       0.94      0.94      0.94        16
           2       0.89      0.89      0.89         9

    accuracy                           0.95        38
   macro avg       0.94      0.94      0.94        38
weighted avg       0.95      0.95      0.95        38



## 선형 회귀 분석(linear regression)
- Lasso 회귀분석(L1제약식)
- Ridge 회귀분석(L2제약식)
- Elastic net(L1제약식 + L2제약식)

__데이터 불러오기. 피처와 타깃 데이터 지정하기__

In [25]:
from sklearn import datasets
raw_boston = datasets.load_boston() #집값 예측
X = raw_boston.data
y = raw_boston.target

__데이터 분할 및 표준화__

In [27]:
from sklearn.model_selection import train_test_split
X_tn, X_te, y_tn, y_te = train_test_split(X,y,random_state=1)

from sklearn.preprocessing import StandardScaler
std_scale = StandardScaler()
std_scale.fit(X_tn)
X_tn_std = std_scale.transform(X_tn)
X_te_std = std_scale.transform(X_te)

__데이터학습 (선형회귀분석)__

In [29]:
from sklearn.linear_model import LinearRegression
clf_lr = LinearRegression()
clf_lr.fit(X_tn_std,y_tn)
print(clf_lr.coef_)
print(clf_lr.intercept_)

[-1.07145146  1.34036243  0.26298069  0.66554537 -2.49842551  1.97524314
  0.19516605 -3.14274974  2.66736136 -1.80685572 -2.13034748  0.56172933
 -4.03223518]
22.344591029023768


__데이터 학습 (L2 제약식 적용, 릿지 회귀분석)__

In [31]:
from sklearn.linear_model import Ridge
clf_ridge = Ridge(alpha=1)
clf_ridge.fit(X_tn_std,y_tn)
print(clf_ridge.coef_)
print(clf_ridge.intercept_)

[-1.05933451  1.31050717  0.23022789  0.66955241 -2.45607567  1.99086611
  0.18119169 -3.09919804  2.56480813 -1.71116799 -2.12002592  0.56264409
 -4.00942448]
22.344591029023768


__데이터 학습 (L1 제약식 적용, 라쏘 회귀분석)__

In [32]:
from sklearn.linear_model import Lasso
clf_lasso = Lasso(alpha=0.01)
clf_lasso.fit(X_tn_std,y_tn)
print(clf_lasso.coef_)
print(clf_lasso.intercept_)

[-1.04326518  1.27752711  0.1674367   0.66758228 -2.41559964  1.99244179
  0.14733958 -3.09473711  2.46431135 -1.60552274 -2.11046422  0.55200229
 -4.00809905]
22.344591029023768


__데이터 학습 (엘라스틱 넷)__

In [33]:
from sklearn.linear_model import ElasticNet
clf_elastic = ElasticNet(alpha=0.01,l1_ratio=0.01)
clf_elastic.fit(X_tn_std,y_tn)
print(clf_elastic.coef_)
print(clf_elastic.intercept_)

[-1.02916603  1.23681955  0.15236504  0.67859622 -2.34646781  2.02965524
  0.14575132 -2.98592423  2.32013379 -1.48829485 -2.09271972  0.56506801
 -3.9495281 ]
22.344591029023768


__데이터 예측__

In [34]:
pred_lr = clf_lr.predict(X_te_std)
pred_ridge = clf_ridge.predict(X_te_std)
pred_lasso = clf_lasso.predict(X_te_std)
pred_elastic = clf_elastic.predict(X_te_std)

__모형 평가 - R 제곱값__

In [36]:
from sklearn.metrics import r2_score
print(r2_score(y_te,pred_lr))
print(r2_score(y_te,pred_ridge))
print(r2_score(y_te,pred_lasso))
print(r2_score(y_te,pred_elastic))

0.7789410172622856
0.7789704562726605
0.7787621490259894
0.7787876079239251


__모형 평가 - MSE__

In [37]:
from sklearn.metrics import mean_squared_error
print(mean_squared_error(y_te,pred_lr))
print(mean_squared_error(y_te,pred_ridge))
print(mean_squared_error(y_te,pred_lasso))
print(mean_squared_error(y_te,pred_elastic))

21.897765396049508
21.894849212618745
21.91548381050483
21.912961890936877


## 로지스틱 회귀분석
* 분류하는 상황에 적합
* __시그모이드 함수__ : 선형 회귀식을 0과 1사이의 값만 가지게 하는 함수

__데이터 불러오기 및 데이터 지정 및 분할__

In [38]:
from sklearn import datasets
raw_cancer = datasets.load_breast_cancer() #유방암 분류
X = raw_cancer.data
y = raw_cancer.target

from sklearn.model_selection import train_test_split
X_tn, X_te, y_tn, y_te = train_test_split(X,y,random_state=0)

__데이터 표준화__

In [39]:
from sklearn.preprocessing import StandardScaler
std_scale = StandardScaler()
std_scale.fit(X_tn)
X_tn_std = std_scale.transform(X_tn)
X_te_std = std_scale.transform(X_te)

__데이터 학습 및 로지스틱 회귀분석 추정 계수__

In [40]:
from sklearn.linear_model import LogisticRegression
clf_logi_l2 = LogisticRegression(penalty='l2')
clf_logi_l2.fit(X_tn_std,y_tn)

print(clf_logi_l2.coef_)
print(clf_logi_l2.intercept_)

[[-0.29792942 -0.58056355 -0.3109406  -0.377129   -0.11984232  0.42855478
  -0.71131106 -0.85371164 -0.46688191  0.11762548 -1.38262136  0.0899184
  -0.94778563 -0.94686238  0.18575731  0.99305313  0.11090349 -0.3458275
   0.20290919  0.80470317 -0.91626377 -0.91726667 -0.8159834  -0.86539197
  -0.45539191  0.10347391 -0.83009341 -0.98445173 -0.5920036  -0.61086989]]
[0.02713751]


__데이터 예측__

In [41]:
pred_logistic = clf_logi_l2.predict(X_te_std)
print(pred_logistic)

[0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 0 0 0 0 0 1 1 0 1 1 0 1 0 1 0 1 0 1 0 1
 0 1 0 0 1 0 1 1 0 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 1 0
 0 1 1 1 1 1 0 0 0 1 0 1 1 1 0 0 1 0 0 0 1 1 0 1 1 1 1 1 1 1 0 1 0 1 1 1 1
 0 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 0 0 1 1 1 0]


__클래스 확률로 예측__

In [42]:
pred_proba = clf_logi_l2.predict_proba(X_te_std)
print(pred_proba) #해당 클래스에 속할 확률. 두 가지 클래스이므로 결괏값이 두 개의 열로 이러어짐.

[[9.98638613e-01 1.36138656e-03]
 [3.95544804e-02 9.60445520e-01]
 [1.30896362e-03 9.98691036e-01]
 [1.24473354e-02 9.87552665e-01]
 [2.44132101e-04 9.99755868e-01]
 [4.50491513e-03 9.95495085e-01]
 [1.13985968e-04 9.99886014e-01]
 [1.82475894e-03 9.98175241e-01]
 [9.67965506e-05 9.99903203e-01]
 [1.75222878e-06 9.99998248e-01]
 [1.76572612e-01 8.23427388e-01]
 [8.24119135e-02 9.17588087e-01]
 [9.66067493e-06 9.99990339e-01]
 [5.39343196e-01 4.60656804e-01]
 [3.98187854e-01 6.01812146e-01]
 [9.95762760e-01 4.23724017e-03]
 [2.75612083e-03 9.97243879e-01]
 [9.99997097e-01 2.90271401e-06]
 [9.99926506e-01 7.34935682e-05]
 [9.99999997e-01 2.78313939e-09]
 [9.98738365e-01 1.26163489e-03]
 [9.81405399e-01 1.85946008e-02]
 [1.77902039e-02 9.82209796e-01]
 [9.65876713e-04 9.99034123e-01]
 [9.99464578e-01 5.35421808e-04]
 [6.73385015e-04 9.99326615e-01]
 [5.50833875e-05 9.99944917e-01]
 [9.69828919e-01 3.01710813e-02]
 [1.62119075e-03 9.98378809e-01]
 [9.99997821e-01 2.17867101e-06]
 [6.005712

__정밀도 평가__

In [45]:
from sklearn.metrics import precision_score
precision = precision_score(y_te,pred_logistic)
print(precision)

0.9666666666666667


__confusion matrix__

In [48]:
from sklearn.metrics import confusion_matrix
conf_matrix = confusion_matrix(y_te,pred_logistic)
print(conf_matrix)

[[50  3]
 [ 3 87]]


__분류 리포트 확인__

In [49]:
from sklearn.metrics import classification_report
class_report = classification_report(y_te,pred_logistic)
print(class_report)

              precision    recall  f1-score   support

           0       0.94      0.94      0.94        53
           1       0.97      0.97      0.97        90

    accuracy                           0.96       143
   macro avg       0.96      0.96      0.96       143
weighted avg       0.96      0.96      0.96       143



## 나이브 베이즈

__데이터 불러오기 및 분할__

In [2]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
raw_wine = datasets.load_wine()
X = raw_wine.data
y = raw_wine.target

X_tn, X_te, y_tn, y_te = train_test_split(X,y,random_state=0)

__데이터 표준화__

In [3]:
from sklearn.preprocessing import StandardScaler
std_scale = StandardScaler()
std_scale.fit(X_tn)
X_tn_std = std_scale.transform(X_tn)
X_te_std = std_scale.transform(X_te)

__데이터 학습__

In [5]:
from sklearn.naive_bayes import GaussianNB #설명변수가 연속형일때 가우시안 나이브베이즈 사용
clf_gnb = GaussianNB()
clf_gnb.fit(X_tn_std,y_tn)

GaussianNB()

__데이터 예측__

In [6]:
pred_gnb = clf_gnb.predict(X_te_std)
print(pred_gnb)

[0 2 1 0 1 1 0 2 1 1 2 2 0 0 2 1 0 0 2 0 0 0 0 1 1 1 1 1 1 2 0 0 1 0 0 0 2
 1 1 2 0 0 1 1 1]


__리콜 평가 및 confusion matrix 확인__

In [10]:
from sklearn.metrics import recall_score, confusion_matrix
recall = recall_score(y_te,pred_gnb,average='macro')
print(f'recall score: {recall}')

conf_matrix = confusion_matrix(y_te,pred_gnb)
print(f'confusion matrix : \n{conf_matrix}')

recall score: 0.9523809523809524
confusion matrix : 
[[16  0  0]
 [ 2 18  1]
 [ 0  0  8]]


__모델에 가중치 부여 후 재평가__

In [15]:
clf_gnb = GaussianNB(priors=[20/100,60/100,20/100])
clf_gnb.fit(X_tn_std,y_tn)
pred_gnb = clf_gnb.predict(X_te_std)

recall = recall_score(y_te,pred_gnb,average='macro')
print(f'recall score: {recall}')

conf_matrix = confusion_matrix(y_te,pred_gnb)
print(f'confusion matrix : \n{conf_matrix}') #가중치 후 recall score 상승과 2번째 오답률 하락

recall score: 0.9682539682539683
confusion matrix : 
[[16  0  0]
 [ 1 19  1]
 [ 0  0  8]]


__분류 리포트 확인__

In [17]:
from sklearn.metrics import classification_report
class_report = classification_report(y_te,pred_gnb)
print(class_report)

              precision    recall  f1-score   support

           0       0.94      1.00      0.97        16
           1       1.00      0.90      0.95        21
           2       0.89      1.00      0.94         8

    accuracy                           0.96        45
   macro avg       0.94      0.97      0.95        45
weighted avg       0.96      0.96      0.96        45



## 의사결정나무
-범주형 자료의 데이터 분류
#### 엔트로피
-불순도의 정도를 측정(서로 다른 데이터가 얼마나 섞여 있는지)하며, 낮을수록 좋음.
#### 지니계수
-불순도의 정도를 측정(데이터 셋에서 랜덤으로 선택한 데이터에 임의로 라벨링을 정했을 때 틀릴 확률), 낮을수록 좋음.
## 회귀나무
-연속형 변수 자료의 데이터 분류

In [18]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
raw_wine = datasets.load_wine()

X = raw_wine.data
y = raw_wine.target

X_tn, X_te, y_tn, y_te = train_test_split(X,y,random_state=0)

__데이터 표준화 및 학습__

In [19]:
from sklearn.preprocessing import StandardScaler
std_scaler = StandardScaler()
std_scaler.fit(X_tn)
X_tn_std = std_scaler.transform(X_tn)
X_te_std = std_scaler.transform(X_te)

In [20]:
from sklearn import tree
clf_tree = tree.DecisionTreeClassifier(random_state=0)
clf_tree.fit(X_tn_std,y_tn)

DecisionTreeClassifier(random_state=0)

__데이터 예측 및 평가__

In [21]:
pred_tree = clf_tree.predict(X_te_std)
print(pred_tree)

[0 2 1 0 1 1 0 2 1 1 2 2 0 1 2 1 0 0 2 0 1 0 1 1 1 1 1 1 1 2 0 0 1 0 0 0 2
 1 1 2 1 0 1 1 1]


In [22]:
#f1 score 평가
from sklearn.metrics import f1_score
f1 = f1_score(y_te,pred_tree,average='macro')
print('f1 score:{}'.format(f1))

#confusion_matrix 확인
from sklearn.metrics import confusion_matrix
conf_matrix = confusion_matrix(y_te,pred_tree)
print('confusion matrix :\n{}'.format(conf_matrix))

f1 score:0.9349141206870346
confusion matrix :
[[14  2  0]
 [ 0 20  1]
 [ 0  0  8]]


In [23]:
from sklearn.metrics import classification_report
class_report = classification_report(y_te,pred_tree)
print(class_report)

              precision    recall  f1-score   support

           0       1.00      0.88      0.93        16
           1       0.91      0.95      0.93        21
           2       0.89      1.00      0.94         8

    accuracy                           0.93        45
   macro avg       0.93      0.94      0.93        45
weighted avg       0.94      0.93      0.93        45



## 서포트 벡터 머신 SVC
서포트 벡터를 기준으로 클래스를 판별.중심선과 경계선을 이용하여 구분하는데, 이 경계선을 서포트 벡터라고 함.    
#### 소프트마진
서포트 벡터 머신의 기준을 완화해 잘못 분류된 데이터를 어느 정도 허용하는 방법(슬랙 변수를 이용)
#### 커널 서포트 벡터 머신
피처 공간을 변경한 후 서포트 벡터 머신을 적용.(빳빳한 종이 -> 구부린 종이)
#### 서포트 벡터 회귀
서포트 벡터를 회귀 모형으로 만드는데 확용하는 방법. 

__데이터 불러오기 및 분할__

In [1]:
from sklearn import datasets
raw_wine = datasets.load_wine()
X = raw_wine.data
y = raw_wine.target

from sklearn.model_selection import train_test_split
X_tn, X_te, y_tn, y_te = train_test_split(X,y,random_state=0)

__데이터 표준화 및 학습__

In [2]:
from sklearn.preprocessing import StandardScaler
std_scale = StandardScaler()
std_scale.fit(X_tn)
X_tn_std = std_scale.transform(X_tn)
X_te_std = std_scale.transform(X_te)

from sklearn import svm
clf_svm_lr = svm.SVC(kernel='linear',random_state=0)
clf_svm_lr.fit(X_tn_std,y_tn)
# SVC : 분류문제 / SVR : 회귀문제
# 커널종류 : linear, poly, rbf, sigmoid, precomputed

SVC(kernel='linear', random_state=0)

__데이터 예측__

In [4]:
pred_svm = clf_svm_lr.predict(X_te_std)
print(pred_svm)

[0 2 1 0 1 1 0 2 1 1 2 2 0 1 2 1 0 0 1 0 1 0 0 1 1 1 1 1 1 2 0 0 1 0 0 0 2
 1 1 2 0 0 1 1 1]


__정확도 평가, confusion metrics, 분류 리포트 확인__

In [7]:
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_te,pred_svm)
print('정확도 : {}'.format(accuracy))

from sklearn.metrics import confusion_matrix
conf_matrix = confusion_matrix(y_te,pred_svm)
print('\nconfusion matrix : \n{}'.format(conf_matrix))

from sklearn.metrics import classification_report
class_report = classification_report(y_te,pred_svm)
print('\n분류 리포트 : \n{}'.format(class_report))

정확도 : 1.0

confusion matrix : 
[[16  0  0]
 [ 0 21  0]
 [ 0  0  8]]

분류 리포트 : 
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        16
           1       1.00      1.00      1.00        21
           2       1.00      1.00      1.00         8

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45



## Cross Validation 
하이퍼 파라미터 튜닝을 통한 크로스 밸리데이션 기법

In [8]:
from sklearn import datasets
raw_wine = datasets.load_wine()

X = raw_wine.data
y = raw_wine.target

from sklearn.model_selection import train_test_split
X_tn, X_te, y_tn, y_te = train_test_split(X,y,random_state=0)

from sklearn.preprocessing import StandardScaler
std_scale = StandardScaler()
std_scale.fit(X_tn)
X_tn_std = std_scale.transform(X_tn)
X_te_std = std_scale.transform(X_te)

In [10]:
from sklearn import svm
from sklearn.model_selection import StratifiedKFold # 라벨링의 비율을 유지하며 데이터를 추출
from sklearn.model_selection import GridSearchCV

param_grid = {'kernel':('linear','rbf'),
              'C':[0.5,1,10,100]}
kfold = StratifiedKFold(n_splits=5,shuffle=True,random_state=0)
svc = svm.SVC(random_state=0)
grid_cv = GridSearchCV(svc, param_grid,cv=kfold,scoring='accuracy')
grid_cv.fit(X_tn_std,y_tn)

GridSearchCV(cv=StratifiedKFold(n_splits=5, random_state=0, shuffle=True),
             estimator=SVC(random_state=0),
             param_grid={'C': [0.5, 1, 10, 100], 'kernel': ('linear', 'rbf')},
             scoring='accuracy')

In [11]:
grid_cv.cv_results_

{'mean_fit_time': array([0.00149064, 0.00170121, 0.00140681, 0.00100102, 0.00081921,
        0.00200429, 0.00179949, 0.00179415]),
 'std_fit_time': array([1.03138712e-03, 1.13354235e-03, 4.89861722e-04, 2.94246007e-05,
        4.11996107e-04, 6.40980404e-04, 7.54956361e-04, 3.98659802e-04]),
 'mean_score_time': array([0.00059743, 0.00092101, 0.        , 0.00062957, 0.        ,
        0.00080671, 0.00079732, 0.00019927]),
 'std_score_time': array([0.00048881, 0.00095741, 0.        , 0.00051757, 0.        ,
        0.0004038 , 0.00039866, 0.00039854]),
 'param_C': masked_array(data=[0.5, 0.5, 1, 1, 10, 10, 100, 100],
              mask=[False, False, False, False, False, False, False, False],
        fill_value='?',
             dtype=object),
 'param_kernel': masked_array(data=['linear', 'rbf', 'linear', 'rbf', 'linear', 'rbf',
                    'linear', 'rbf'],
              mask=[False, False, False, False, False, False, False, False],
        fill_value='?',
             dtype=ob

In [12]:
import numpy as np
import pandas as pd
np.transpose(pd.DataFrame(grid_cv.cv_results_))

Unnamed: 0,0,1,2,3,4,5,6,7
mean_fit_time,0.001491,0.001701,0.001407,0.001001,0.000819,0.002004,0.001799,0.001794
std_fit_time,0.001031,0.001134,0.00049,0.000029,0.000412,0.000641,0.000755,0.000399
mean_score_time,0.000597,0.000921,0.0,0.00063,0.0,0.000807,0.000797,0.000199
std_score_time,0.000489,0.000957,0.0,0.000518,0.0,0.000404,0.000399,0.000399
param_C,0.5,0.5,1,1,10,10,100,100
param_kernel,linear,rbf,linear,rbf,linear,rbf,linear,rbf
params,"{'C': 0.5, 'kernel': 'linear'}","{'C': 0.5, 'kernel': 'rbf'}","{'C': 1, 'kernel': 'linear'}","{'C': 1, 'kernel': 'rbf'}","{'C': 10, 'kernel': 'linear'}","{'C': 10, 'kernel': 'rbf'}","{'C': 100, 'kernel': 'linear'}","{'C': 100, 'kernel': 'rbf'}"
split0_test_score,0.888889,0.962963,0.888889,0.925926,0.888889,0.925926,0.888889,0.925926
split1_test_score,0.962963,1.0,0.962963,0.962963,0.962963,0.962963,0.962963,0.962963
split2_test_score,0.925926,0.962963,0.925926,0.962963,0.925926,0.962963,0.925926,0.962963


In [15]:
print(grid_cv.best_score_)
print(grid_cv.best_params_)

0.9774928774928775
{'C': 0.5, 'kernel': 'rbf'}


In [16]:
clf = grid_cv.best_estimator_
print(clf)

SVC(C=0.5, random_state=0)


In [17]:
grid_cv.best_estimator_

SVC(C=0.5, random_state=0)

__크로스 밸리데이션 스코어 확인(1)__

In [18]:
# 첫번째 방법
from sklearn.model_selection import cross_validate
metrics = ['accuracy','precision_macro','recall_macro','f1_macro']
cv_scores = cross_validate(clf, X_tn_std, y_tn, cv=kfold, scoring=metrics)
cv_scores

{'fit_time': array([0.00301456, 0.00200462, 0.00098133, 0.00185299, 0.00130367]),
 'score_time': array([0.00196218, 0.0019896 , 0.0019927 , 0.00250006, 0.00314784]),
 'test_accuracy': array([0.96296296, 1.        , 0.96296296, 0.96153846, 1.        ]),
 'test_precision_macro': array([0.96296296, 1.        , 0.96969697, 0.96969697, 1.        ]),
 'test_recall_macro': array([0.96666667, 1.        , 0.96296296, 0.95833333, 1.        ]),
 'test_f1_macro': array([0.9628483 , 1.        , 0.96451914, 0.96190476, 1.        ])}

__크로스 밸리데이션 스코어 확인(2)__

In [19]:
# 두번째 방법
from sklearn.model_selection import cross_val_score
cv_score = cross_val_score(clf, X_tn_std, y_tn, cv=kfold, scoring='accuracy')
print(cv_score) #split 별로 정확도 확인
print(cv_score.mean()) #정확도의 평균
print(cv_score.std()) #정확도의 표준편차

[0.96296296 1.         0.96296296 0.96153846 1.        ]
0.9774928774928775
0.01838434849561446


__예측__

In [20]:
pred_svm = clf.predict(X_te_std)
print(pred_svm)

[0 2 1 0 1 1 0 2 1 1 2 2 0 1 2 1 0 0 1 0 1 0 0 1 1 1 1 1 1 2 0 0 1 0 0 0 2
 1 1 2 0 0 1 1 1]


__정확도 및 confusion matrix, 분류리포트 확인__

In [21]:
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_te,pred_svm)
print('정확도 : {}'.format(accuracy))

from sklearn.metrics import confusion_matrix
conf_matrix = confusion_matrix(y_te,pred_svm)
print('\nconfusion matrix : \n{}'.format(conf_matrix))

from sklearn.metrics import classification_report
class_report = classification_report(y_te,pred_svm)
print('\n분류 리포트 : \n{}'.format(class_report))

정확도 : 1.0

confusion matrix : 
[[16  0  0]
 [ 0 21  0]
 [ 0  0  8]]

분류 리포트 : 
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        16
           1       1.00      1.00      1.00        21
           2       1.00      1.00      1.00         8

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45

