# Grid Search 

Importing the dataset

In [1]:
dataset = read.csv('Social_Network_Ads.csv')
dataset = dataset[, 3:5]
head(dataset)

Age,EstimatedSalary,Purchased
19,19000,0
35,20000,0
26,43000,0
27,57000,0
19,76000,0
27,58000,0


Encoding the target feature as factor

In [2]:
dataset$Purchased = factor(dataset$Purchased, levels = c(0, 1))

Splitting the dataset into the Training set and Test set

In [3]:
library(caTools)
set.seed(123)
split = sample.split(dataset$Purchased, SplitRatio = 0.75)
training_set = subset(dataset, split == TRUE)
test_set = subset(dataset, split == FALSE)

"package 'caTools' was built under R version 3.4.3"

Feature Scaling

In [4]:
training_set[-3] = scale(training_set[-3])
test_set[-3] = scale(test_set[-3])

Fitting Kernel SVM to the Training set

In [5]:
library(e1071)
classifier = svm(formula = Purchased ~ .,
                 data = training_set,
                 type = 'C-classification',
                 kernel = 'radial')

"package 'e1071' was built under R version 3.4.3"

Predicting the Test set results

In [6]:
y_pred = predict(classifier, newdata = test_set[-3])
y_pred

Making the Confusion Matrix

In [7]:
cm = table(test_set[, 3], y_pred)
cm

   y_pred
     0  1
  0 58  6
  1  4 32

Applying k-Fold Cross Validation

In [8]:
library(caret)
folds = createFolds(training_set$Purchased, k = 10)
cv = lapply(folds, function(x) {
  training_fold = training_set[-x, ]
  test_fold = training_set[x, ]
  classifier = svm(formula = Purchased ~ .,
                   data = training_fold,
                   type = 'C-classification',
                   kernel = 'radial')
  y_pred = predict(classifier, newdata = test_fold[-3])
  cm = table(test_fold[, 3], y_pred)
  accuracy = (cm[1,1] + cm[2,2]) / (cm[1,1] + cm[2,2] + cm[1,2] + cm[2,1])
  return(accuracy)
})
accuracy = mean(as.numeric(cv))

accuracy

Loading required package: lattice
Loading required package: ggplot2
"package 'ggplot2' was built under R version 3.4.3"

We got an average accuracy of 91% with 10 different cross validation sets

Applying Grid Search to find the best parameters

In [9]:
classifier = train(form = Purchased ~ ., data = training_set, method = 'svmRadial')
classifier



Attaching package: 'kernlab'

The following object is masked from 'package:ggplot2':

    alpha



Support Vector Machines with Radial Basis Function Kernel 

300 samples
  2 predictor
  2 classes: '0', '1' 

No pre-processing
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 300, 300, 300, 300, 300, 300, ... 
Resampling results across tuning parameters:

  C     Accuracy   Kappa    
  0.25  0.9148058  0.8172672
  0.50  0.9166393  0.8211613
  1.00  0.9172711  0.8224596

Tuning parameter 'sigma' was held constant at a value of 2.251496
Accuracy was used to select the optimal model using  the largest value.
The final values used for the model were sigma = 2.251496 and C = 1.

In [10]:
classifier$bestTune

Unnamed: 0,sigma,C
3,2.251496,1
