In [1]:
suppressWarnings(suppressMessages(library("kernlab")))
suppressWarnings(suppressMessages(library("caret")))
suppressWarnings(suppressMessages(library(kknn)))

In [2]:
data <- read.table("./credit_card_data.txt", header = FALSE)

In [3]:
c_list <- c(0.00001, 0.001, 0.01, 0.1, 1, 3, 5, 10, 15, 20, 100, 1000, 1000000)

In [4]:
for (c in c_list){
    cat("\n----- For C = ", c, " -----\n")
    model <- ksvm(as.matrix(data[, 1:10]),as.factor(data[, 11]),type="C-svc",kernel="vanilladot",C=c,scaled=TRUE)
    prediction <- predict(model, as.matrix(data[, 1:10]))
    result <- confusionMatrix(prediction, as.factor(data[, 11]))
    accuracy <- result$overall["Accuracy"]*100
    precision <- result$byClass['Pos Pred Value']
    recall <- result$byClass['Sensitivity']
    cat("\nAccuracy: ", accuracy, "%")
    cat("\nPrecision: ", precision)
    cat("\nRecall: ", recall)
    f_measure <- 2 * ((precision * recall) / (precision + recall))
    cat("\nF1 score: ", f_measure)
}


----- For C =  1e-05  -----
 Setting default kernel parameters  

Accuracy:  54.74006 %
Precision:  0.5474006
Recall:  1
F1 score:  0.7075099
----- For C =  0.001  -----
 Setting default kernel parameters  

Accuracy:  83.79205 %
Precision:  0.8088235
Recall:  0.9217877
F1 score:  0.8616188
----- For C =  0.01  -----
 Setting default kernel parameters  

Accuracy:  86.39144 %
Precision:  0.9438944
Recall:  0.7988827
F1 score:  0.8653555
----- For C =  0.1  -----
 Setting default kernel parameters  

Accuracy:  86.39144 %
Precision:  0.9438944
Recall:  0.7988827
F1 score:  0.8653555
----- For C =  1  -----
 Setting default kernel parameters  

Accuracy:  86.39144 %
Precision:  0.9438944
Recall:  0.7988827
F1 score:  0.8653555
----- For C =  3  -----
 Setting default kernel parameters  

Accuracy:  86.39144 %
Precision:  0.9438944
Recall:  0.7988827
F1 score:  0.8653555
----- For C =  5  -----
 Setting default kernel parameters  

Accuracy:  86.39144 %
Precision:  0.9438944
Recall:  0.7

Conclusion:
Accuracy decreases for extremely low and higher values of C. 
<br>C = 1e-05 ----> Accuracy = 54.74%
<br>C = 1e+06 ----> Accuracy = 62.54%

<br>It is interesting to note that the metrics accuracy, precision, recall remain same for all values tested between 0.01 and 100.

In [5]:
#picking the default model used in the homework C = 100 to obtain the coefficients.
model <- ksvm(as.matrix(data[, 1:10]),as.factor(data[, 11]),type="C-svc",kernel="vanilladot",C=100,scaled=TRUE)

# calculate a1...am
a <- colSums(model@xmatrix[[1]] * model@coef[[1]]) 
a
# calculate a0
a0 <- -model@b
a0

 Setting default kernel parameters  


In [6]:
#trying different kernel. Someone mentioned on the slack channel rbfdot kernel gets almost 100% accuracy. So lets try
model <- ksvm(as.matrix(data[, 1:10]),as.factor(data[, 11]),type="C-svc",kernel="rbfdot",C=100,scaled=TRUE)
pred <- predict(model,data[,1:10])
sum(pred == data[,11]) / nrow(data)

It can be seen that 95% accuracy is achieved. However, this is a clear case of overfitting where the classifier performs well on the training data (data which the classifier has seen or rather data used to train the classifier) but will yield poor performance on data which it has not seen. Hence, we will get poor generalization.

In [7]:
data$V11 <- as.factor(data$V11)

In [8]:
for (k in 1:20){
    pred <- c()
    for (i in 1:nrow(data)){
        knn <- kknn(V11 ~ .,data[-i, ], data[i, ], k = k, distance = 2, kernel = "rectangular", scale=TRUE)
        pred <- c(pred, knn$fitted.values)
    }
    pred <- pred-1
    accuracy <- sum(pred == data[,11]) / nrow(data)
    cat("\n For K = ", k, "accuracy = ", accuracy, "\n")       
}


 For K =  1 accuracy =  0.8149847 

 For K =  2 accuracy =  0.8149847 

 For K =  3 accuracy =  0.82263 

 For K =  4 accuracy =  0.82263 

 For K =  5 accuracy =  0.8455657 

 For K =  6 accuracy =  0.8455657 

 For K =  7 accuracy =  0.8470948 

 For K =  8 accuracy =  0.8470948 

 For K =  9 accuracy =  0.8318043 

 For K =  10 accuracy =  0.8318043 

 For K =  11 accuracy =  0.8348624 

 For K =  12 accuracy =  0.8348624 

 For K =  13 accuracy =  0.8302752 

 For K =  14 accuracy =  0.8302752 

 For K =  15 accuracy =  0.8256881 

 For K =  16 accuracy =  0.8256881 

 For K =  17 accuracy =  0.82263 

 For K =  18 accuracy =  0.82263 

 For K =  19 accuracy =  0.82263 

 For K =  20 accuracy =  0.82263 
