# Learning SVM training and Testing using Hold-Out method

# The code below demonstrates how SVM model is trained and tested under Hold-Out method. For better understanding, codes below explains building and testing SVM model on data set supported in scikit learn and user specific data set. 


# 1. Loading libraries

In [1]:
import pandas as pd
from sklearn import datasets # imports built in data sets supported in scikit learn 
from sklearn import svm   # imports SVM classifier
from sklearn import metrics
from sklearn.model_selection import train_test_split # using scikit learn for hold-out

# 2. Loading data set

In [2]:
# Loading load_wine() data set 

dataset_wine = datasets.load_wine()


# 3. Creating Hold-out Enviornment

In [3]:
# Creating hold-out enviornment

winedata_train, winedata_test, winetarget_train, winetarget_test = train_test_split(dataset_wine.data, dataset_wine.target, test_size=0.3)

#The pair of arrays winedata_train and  winetarget_train will be used for learning
#the sueprvised model. 
#Whereas, winedata_test and  winetarget_test for model testing

# 4.  Building SVM model 

We will use training data set created in step 3 for model training(or learning). The supervised model we will be building now is Support Vector Machines(SVM). SVM by theory supports different types of kernels such as, linear, polynomial and radial basis. Where linear kernel is suitable for linearly separable problems whereas, other for non-linearly separable problems. For building SVM model, we try with different types of kernel to discover best kernel for given problem based on performance of model on unknown observation. 

In [18]:
#Create a svm Classifier. The various kernel supported in scikit learn are linear, poly and rbf.

SVMmodel_1 = svm.SVC(kernel='rbf') # Linear Kernel

#Train the model using the training sets

SVMfitted_1 = SVMmodel_1.fit(winedata_train, winetarget_train)

# 5. Testing trained SVM model on the test set 

In [19]:

#Predict the response on the test data set

SVM_predictions_1 = SVMfitted_1.predict((winedata_test))

# 6. Evaluating the performance of the model 

In [20]:
# Computing Model Accuracy

print("Accuracy:",round(metrics.accuracy_score(winetarget_test, SVM_predictions_1),2) * 100, "%")

print ("---------------")

# Printing confusion matrix

print ("Confusion matrix")

print ("---------------")

print(metrics.confusion_matrix(winetarget_test, SVM_predictions_1))

# Model detailed classification report
target_names = ['class 0', 'class 1', 'class 2']


print ("---------------")

print("Classification report", metrics.classification_report(winetarget_test, SVM_predictions_1,target_names =target_names))

Accuracy: 41.0 %
---------------
Confusion matrix
---------------
[[ 1 20  0]
 [ 0 19  0]
 [ 0 12  2]]
---------------
Classification report              precision    recall  f1-score   support

    class 0       1.00      0.05      0.09        21
    class 1       0.37      1.00      0.54        19
    class 2       1.00      0.14      0.25        14

avg / total       0.78      0.41      0.29        54



# SVM on user specific data set

# 1. Loading data set

In [7]:
# Loading data set from local machine. The data set on predicting liver disorder.
    
My_dataset = pd.read_csv('/Users/sakshibabbar/Documents/ML/Supervised learning/datasets/liver_dataset.csv')

# 2. Dividing data set into sets  of indicator and predictive variables

In [10]:
# My_data contains all data points from My_data set from from first feature to  6th feature(indicator features)
My_data = My_dataset.iloc[:,0:6].values 

# My_target contains class information which is 7th feature in the data set of all the data points in My_dataset

My_target=My_dataset.iloc[:,6].values 

print(My_data)
print(My_target)

[[85. 92. 45. 27. 31.  0.]
 [85. 64. 59. 32. 23.  0.]
 [86. 54. 33. 16. 54.  0.]
 ...
 [98. 77. 55. 35. 89. 15.]
 [91. 68. 27. 26. 14. 16.]
 [98. 99. 57. 45. 65. 20.]]
[1 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 2 2 2 2 1 1 1 1
 1 1 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 1 1 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1
 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 2 1 1 1 1 1 1 1 2 2 2 2 2 1 1 2 2
 2 2 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1
 1 1 1 1 2 2 2 2 2 2 2 2 1 1 1 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 2 2 2 2
 2 1 1 2 2 2 2 1 2 1 1 1]


# 3. Creating Hold-out enviornment for the data set in step above

In [11]:
#  The pair of arrays liverdata_train and  livertarget_train will be used for learning the sueprvised model. 
# Whereas, liverdata_test and  livertarget_test for model testing

liverdata_train, liverdata_test, livertarget_train, livertarget_test = train_test_split(My_data, My_target, test_size=0.3)



# 4.  Building SVM model

In [15]:
#Create a svm Classifier. The various kernel supported in scikit learn are linear, poly and rbf.

SVMmodel_2 = svm.SVC(kernel='rbf') # Linear Kernel

#Train the model using the training sets

SVMfitted_2 = SVMmodel_2.fit(liverdata_train, livertarget_train)

# 5. Testing trained SVM model on the test set

In [16]:

#Predict the response on the test data set

SVM_predictions_2 = SVMfitted_2.predict((liverdata_test))

# 6. Evaluating the performance of the model

In [17]:
# Computing Model Accuracy

print("Accuracy:",round(metrics.accuracy_score(livertarget_test, SVM_predictions_2),2) * 100, "%")

print ("---------------")

# Printing confusion matrix


print ("Confusion matrix")

print ("---------------")
print(metrics.confusion_matrix(livertarget_test, SVM_predictions_2))


# User specific target names   
    
target_names = ['disorder', 'nodisorder']

# Model detailed classification report

print ("---------------")
print("Classification report", metrics.classification_report(livertarget_test, SVM_predictions_2,target_names =target_names))

Accuracy: 62.0 %
---------------
Confusion matrix
---------------
[[ 3 39]
 [ 0 62]]
---------------
Classification report              precision    recall  f1-score   support

   disorder       1.00      0.07      0.13        42
 nodisorder       0.61      1.00      0.76        62

avg / total       0.77      0.62      0.51       104

