### 커널이란 새로운 공간으로 원래의 x 데이터를 변환해주는 것
### 좋은 변환함수란 원래 x 데이터 갖고 있는 정보를 보존하면서 class를 잘 분류할 수 있는 것

In [2]:
library(lattice)
library(ggplot2)
library(e1071)
library(caret)
# 오분류율 교차표(confusion matrix) 생성을 위한 패키지 : caret

In [4]:
iris <- read.csv('iris.csv')

In [5]:
set.seed(11)
N <- nrow(iris)
tr.idx <- sample(1:N, size = N*2/3, replace = FALSE)
y <- iris[,5]
train <- iris[tr.idx, ]
test <- iris[-tr.idx, ]

In [7]:
m1 <-svm(Species ~ ., train)
summary(m1)


Call:
svm(formula = Species ~ ., data = train)


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  radial 
       cost:  1 
      gamma:  0.25 

Number of Support Vectors:  39

 ( 7 17 15 )


Number of Classes:  3 

Levels: 
 setosa versicolor virginica




In [9]:
pred11 <- predict(m1, test)
confusionMatrix(pred11, test$Species)

Confusion Matrix and Statistics

            Reference
Prediction   setosa versicolor virginica
  setosa         14          0         0
  versicolor      0         15         1
  virginica       0          1        19

Overall Statistics
                                          
               Accuracy : 0.96            
                 95% CI : (0.8629, 0.9951)
    No Information Rate : 0.4             
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.9393          
 Mcnemar's Test P-Value : NA              

Statistics by Class:

                     Class: setosa Class: versicolor Class: virginica
Sensitivity                   1.00            0.9375           0.9500
Specificity                   1.00            0.9706           0.9667
Pos Pred Value                1.00            0.9375           0.9500
Neg Pred Value                1.00            0.9706           0.9667
Prevalence                    0.28          

### Reference : 실제 범주, Prediction : 예측 범주
### accuracy : 0.96

In [11]:
m2 <- svm(Species ~ ., train, kernel = 'polynomial')
summary(m2)


Call:
svm(formula = Species ~ ., data = train, kernel = "polynomial")


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  polynomial 
       cost:  1 
     degree:  3 
      gamma:  0.25 
     coef.0:  0 

Number of Support Vectors:  38

 ( 5 20 13 )


Number of Classes:  3 

Levels: 
 setosa versicolor virginica




In [18]:
pred12 <- predict(m2, test)
confusionMatrix(pred12, test$Species)

Confusion Matrix and Statistics

            Reference
Prediction   setosa versicolor virginica
  setosa         14          0         0
  versicolor      0         16         6
  virginica       0          0        14

Overall Statistics
                                          
               Accuracy : 0.88            
                 95% CI : (0.7569, 0.9547)
    No Information Rate : 0.4             
    P-Value [Acc > NIR] : 2.514e-12       
                                          
                  Kappa : 0.8206          
 Mcnemar's Test P-Value : NA              

Statistics by Class:

                     Class: setosa Class: versicolor Class: virginica
Sensitivity                   1.00            1.0000           0.7000
Specificity                   1.00            0.8235           1.0000
Pos Pred Value                1.00            0.7273           1.0000
Neg Pred Value                1.00            1.0000           0.8333
Prevalence                    0.28          

### virginica를 cersicolor로 오분류 6개 함. 
### polynomial의 성능이 낮은 것을 알 수 있다.

In [13]:
m3 <- svm(Species ~ ., train, kernel = 'sigmoid')
summary(m3)


Call:
svm(formula = Species ~ ., data = train, kernel = "sigmoid")


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  sigmoid 
       cost:  1 
      gamma:  0.25 
     coef.0:  0 

Number of Support Vectors:  37

 ( 3 18 16 )


Number of Classes:  3 

Levels: 
 setosa versicolor virginica




In [14]:
pred13 <- predict(m3, test)
confusionMatrix(pred13, test$Species)

Confusion Matrix and Statistics

            Reference
Prediction   setosa versicolor virginica
  setosa         14          0         0
  versicolor      0         13         1
  virginica       0          3        19

Overall Statistics
                                          
               Accuracy : 0.92            
                 95% CI : (0.8077, 0.9778)
    No Information Rate : 0.4             
    P-Value [Acc > NIR] : 1.565e-14       
                                          
                  Kappa : 0.878           
 Mcnemar's Test P-Value : NA              

Statistics by Class:

                     Class: setosa Class: versicolor Class: virginica
Sensitivity                   1.00            0.8125           0.9500
Specificity                   1.00            0.9706           0.9000
Pos Pred Value                1.00            0.9286           0.8636
Neg Pred Value                1.00            0.9167           0.9643
Prevalence                    0.28          

In [16]:
m4 <- svm(Species ~ ., train, kernel = 'linear')
summary(m4)


Call:
svm(formula = Species ~ ., data = train, kernel = "linear")


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  linear 
       cost:  1 
      gamma:  0.25 

Number of Support Vectors:  22

 ( 2 11 9 )


Number of Classes:  3 

Levels: 
 setosa versicolor virginica




In [17]:
pred14 <- predict(m4, test)
confusionMatrix(pred14, test$Species)

Confusion Matrix and Statistics

            Reference
Prediction   setosa versicolor virginica
  setosa         14          0         0
  versicolor      0         15         0
  virginica       0          1        20

Overall Statistics
                                          
               Accuracy : 0.98            
                 95% CI : (0.8935, 0.9995)
    No Information Rate : 0.4             
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.9696          
 Mcnemar's Test P-Value : NA              

Statistics by Class:

                     Class: setosa Class: versicolor Class: virginica
Sensitivity                   1.00            0.9375           1.0000
Specificity                   1.00            1.0000           0.9667
Pos Pred Value                1.00            1.0000           0.9524
Neg Pred Value                1.00            0.9714           1.0000
Prevalence                    0.28          

### linear 함수가 꽤 높은 성능을 보여준다.
### 데이터 구조가 단순할땐 충분히 좋은 성능을 얻을 수 있다.