### Breast Cancer Wisconsin Data Set
### 세침흡인 세포검사를 통해 얻은 683개 유방조직의 9개 특성

### caret::confusionMatrix(
   #### data,     # 예측값 또는 분할표
   ####  reference # 실제 값
### )

In [32]:
library(lattice)
library(ggplot2)
library(e1071)
library(caret)

In [33]:
cancer <- read.csv('cancer.csv')
head(cancer)

X1,X2,X3,X4,X5,X6,X7,X8,X9,X10,Y
1000025,5,1,1,1,2,1,3,1,1,benign
1002945,5,4,4,5,7,10,3,2,1,benign
1015425,3,1,1,1,2,2,3,1,1,benign
1016277,6,8,8,1,3,4,3,7,1,benign
1017023,4,1,1,3,2,1,3,1,1,benign
1017122,8,10,10,8,7,10,9,7,1,malignant


In [34]:
cancer <- cancer[,-1]

In [35]:
set.seed(111)
N = nrow(cancer)
tr.idx <- sample(1:N, size = N*2/3, replace = FALSE)
train <- cancer[tr.idx,]
test <- cancer[-tr.idx,]

In [36]:
m1 <- svm(Y~ ., train)
m2 <- svm(Y~ ., train, kernel = 'polynomial')
m3 <- svm(Y~ ., train, kernel = 'sigmoid')
m4 <- svm(Y~ ., train, kernel = 'linear')

In [37]:
summary(m1)


Call:
svm(formula = Y ~ ., data = train)


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  radial 
       cost:  1 
      gamma:  0.1111111 

Number of Support Vectors:  70

 ( 22 48 )


Number of Classes:  2 

Levels: 
 benign malignant




In [38]:
pred11 <- predict(m1, test)
confusionMatrix(pred11, test$Y)

Confusion Matrix and Statistics

           Reference
Prediction  benign malignant
  benign       146         2
  malignant      7        73
                                          
               Accuracy : 0.9605          
                 95% CI : (0.9264, 0.9818)
    No Information Rate : 0.6711          
    P-Value [Acc > NIR] : <2e-16          
                                          
                  Kappa : 0.9121          
 Mcnemar's Test P-Value : 0.1824          
                                          
            Sensitivity : 0.9542          
            Specificity : 0.9733          
         Pos Pred Value : 0.9865          
         Neg Pred Value : 0.9125          
             Prevalence : 0.6711          
         Detection Rate : 0.6404          
   Detection Prevalence : 0.6491          
      Balanced Accuracy : 0.9638          
                                          
       'Positive' Class : benign          
                                          

In [39]:
summary(m2)


Call:
svm(formula = Y ~ ., data = train, kernel = "polynomial")


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  polynomial 
       cost:  1 
     degree:  3 
      gamma:  0.1111111 
     coef.0:  0 

Number of Support Vectors:  73

 ( 34 39 )


Number of Classes:  2 

Levels: 
 benign malignant




In [40]:
pred12 <- predict(m2, test)
confusionMatrix(pred12, test$Y)

Confusion Matrix and Statistics

           Reference
Prediction  benign malignant
  benign       152        11
  malignant      1        64
                                          
               Accuracy : 0.9474          
                 95% CI : (0.9099, 0.9725)
    No Information Rate : 0.6711          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.8766          
 Mcnemar's Test P-Value : 0.009375        
                                          
            Sensitivity : 0.9935          
            Specificity : 0.8533          
         Pos Pred Value : 0.9325          
         Neg Pred Value : 0.9846          
             Prevalence : 0.6711          
         Detection Rate : 0.6667          
   Detection Prevalence : 0.7149          
      Balanced Accuracy : 0.9234          
                                          
       'Positive' Class : benign          
                                          

### False positive와 False negative 중 어느것이 위험한지 판단해야한다.
### 병에 대한 진단이기 때문에 False negative가 위험하다고 판단됨

In [41]:
summary(m3)


Call:
svm(formula = Y ~ ., data = train, kernel = "sigmoid")


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  sigmoid 
       cost:  1 
      gamma:  0.1111111 
     coef.0:  0 

Number of Support Vectors:  32

 ( 16 16 )


Number of Classes:  2 

Levels: 
 benign malignant




In [42]:
pred13 <- predict(m3, test)
confusionMatrix(pred13, test$Y)

Confusion Matrix and Statistics

           Reference
Prediction  benign malignant
  benign       146         2
  malignant      7        73
                                          
               Accuracy : 0.9605          
                 95% CI : (0.9264, 0.9818)
    No Information Rate : 0.6711          
    P-Value [Acc > NIR] : <2e-16          
                                          
                  Kappa : 0.9121          
 Mcnemar's Test P-Value : 0.1824          
                                          
            Sensitivity : 0.9542          
            Specificity : 0.9733          
         Pos Pred Value : 0.9865          
         Neg Pred Value : 0.9125          
             Prevalence : 0.6711          
         Detection Rate : 0.6404          
   Detection Prevalence : 0.6491          
      Balanced Accuracy : 0.9638          
                                          
       'Positive' Class : benign          
                                          

In [43]:
summary(m4)


Call:
svm(formula = Y ~ ., data = train, kernel = "linear")


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  linear 
       cost:  1 
      gamma:  0.1111111 

Number of Support Vectors:  36

 ( 18 18 )


Number of Classes:  2 

Levels: 
 benign malignant




In [45]:
pred14 <- predict(m4, test)
confusionMatrix(pred14, test$Y)

Confusion Matrix and Statistics

           Reference
Prediction  benign malignant
  benign       149         3
  malignant      4        72
                                          
               Accuracy : 0.9693          
                 95% CI : (0.9378, 0.9876)
    No Information Rate : 0.6711          
    P-Value [Acc > NIR] : <2e-16          
                                          
                  Kappa : 0.9307          
 Mcnemar's Test P-Value : 1               
                                          
            Sensitivity : 0.9739          
            Specificity : 0.9600          
         Pos Pred Value : 0.9803          
         Neg Pred Value : 0.9474          
             Prevalence : 0.6711          
         Detection Rate : 0.6535          
   Detection Prevalence : 0.6667          
      Balanced Accuracy : 0.9669          
                                          
       'Positive' Class : benign          
                                          

### linear 일때 정확도가 가장 높지만 false negative가 높아 위험하다고 판단됨
### radial basis function를 사용하는게 더 적합하다고 판단됨