# Foundations of AI & ML
## Session 08
### Casestudy 2
###  Comparison of Linear, Logistic regression, Quadratic, MLP and SVM 

## SVM

“Support Vector Machine” (SVM) is a supervised machine learning algorithm which can be used for both classification or regression challenges. In this algorithm, we plot each data item as a point in n-dimensional space (where n is number of features you have) with the value of each feature being the value of a particular coordinate. Then, we perform classification by finding the hyper-plane that differentiate the two classes very well.

In this experiment 
1. We will apply SVM classifier on credit card dataset to classify the data into fraud or genuine. 
2. We will tune the hyper parameters of the svm classifier(kernels).
3. We will calculate the training and testing time.

#### Importing Required packages

In [1]:
from sklearn import datasets
import numpy as np
import time
import pandas as pd
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn import svm

In [2]:
accuracy_data = {}

#### Generating the dataset

In [3]:
# Loading credit card dataset
data = pd.read_csv("../Datasets/10kcc.csv")
# Loading the features and storing them in X
X = data.iloc[:, 0:30]
# Loading the labels and storing them in y
y = data["Class"]

In [4]:
X.head()

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount
0,0,-1.359807,-0.072781,2.536347,1.378155,-0.338321,0.462388,0.239599,0.098698,0.363787,...,0.251412,-0.018307,0.277838,-0.110474,0.066928,0.128539,-0.189115,0.133558,-0.021053,149.62
1,0,1.191857,0.266151,0.16648,0.448154,0.060018,-0.082361,-0.078803,0.085102,-0.255425,...,-0.069083,-0.225775,-0.638672,0.101288,-0.339846,0.16717,0.125895,-0.008983,0.014724,2.69
2,1,-1.358354,-1.340163,1.773209,0.37978,-0.503198,1.800499,0.791461,0.247676,-1.514654,...,0.52498,0.247998,0.771679,0.909412,-0.689281,-0.327642,-0.139097,-0.055353,-0.059752,378.66
3,1,-0.966272,-0.185226,1.792993,-0.863291,-0.010309,1.247203,0.237609,0.377436,-1.387024,...,-0.208038,-0.1083,0.005274,-0.190321,-1.175575,0.647376,-0.221929,0.062723,0.061458,123.5
4,2,-1.158233,0.877737,1.548718,0.403034,-0.407193,0.095921,0.592941,-0.270533,0.817739,...,0.408542,-0.009431,0.798278,-0.137458,0.141267,-0.20601,0.502292,0.219422,0.215153,69.99


#### Splitting the data into train,test and validation sets.

In [5]:
## Train = 60 % , Test = 20 % , Train = 20 %
X_train, X_test, X_validation = np.split(X, [int(.6 * len(X)), int(.8 * len(X))])
Y_train, Y_test, Y_validation = np.split(y, [int(.6 * len(y)), int(.8 * len(y))])

#### Implementing Svm

#### Applying Linearsvc

In [6]:
from sklearn.svm import LinearSVC
## Creating the svm object
clf = LinearSVC(random_state=0)
# Intilizing the time object
t0 = time.time()
t1 = time.time()
## Fitting the data into the trained model
clf.fit(X_train, Y_train)
print("Training time is ", round(time.time() - t0, 3),  "seconds")

Training time is  0.017 seconds


#### Predicting the values

In [7]:
# Testing the data on the trained model
y_pred = clf.predict(X_test)
print("Testing time is ", round(time.time() - t1, 3),  "seconds")

Testing time is  0.029 seconds


#### Calculating the accuracy

In [8]:
print(accuracy_score(Y_test, y_pred))
accuracy_data[clf.__class__] = accuracy_score(Y_test, y_pred)

0.011


#### Parameter tuning  

Let us try to tune kernel parameter of svm

### rbf kernel

In [9]:
# Creating the svm object
clf = svm.SVC(kernel='rbf', gamma=0.7)
# Creating the time object
t0 = time.time()
t1 = time.time()
# Fitting the data into the model
clf.fit(X_train, Y_train)
#Calculating the training time
print("Training time is ", round(time.time() - t0, 3),  "seconds")

Training time is  1.412 seconds


#### predicting the values

In [10]:
#testing the data on the trained model
y_pred = clf.predict(X_test)
# Calculating the testing time
print("Testing time is ", round(time.time() - t1, 3),  "seconds")

Testing time is  1.671 seconds


#### Calculating the accuracy

In [11]:
accuracy_score(Y_test,y_pred)
accuracy_data[clf.__class__] = accuracy_score(Y_test, y_pred)

WTF!!!

### polynomial kernel

In [12]:
# Creating the svm object
clf = svm.SVC(kernel='poly', degree=3)
# Creating the time object
t0 = time.time()
t1 = time.time()
# Fitting the data into the model
clf.fit(X_train, Y_train)
# Calculating the training time
print("training time is ", round(time.time() - t0), "seconds")

training time is  1 seconds


#### Predicting the values

In [13]:
# Testing the data on the trained model
y_pred = clf.predict(X_test)
# Calculating the testing time
print("testingtime is ", round(time.time() - t1), "seconds")

testingtime is  1 seconds


#### Calculating the accuracy

In [14]:
accuracy_score(Y_test,y_pred)
accuracy_data[clf.__class__] = accuracy_score(Y_test, y_pred)

** Exercise 1:** Apply Linear regression, Logistic regression, MLP, and quadratic on the data 

** Exercise 2:** Tabularize the accuracy of each classifier 

In [15]:
from sklearn.linear_model import LinearRegression

clf = LinearRegression()

t0 = time.time()
# Fitting the data into the model
clf.fit(X_train, Y_train)
# Calculating the training time
print("training time is ", round(time.time() - t0), "seconds")

# Testing the data on the trained model
y_pred = clf.predict(X_test)
# Calculating the testing time
print("testingtime is ", round(time.time() - t1), "seconds")

print(accuracy_score(Y_test, np.round(y_pred)))
accuracy_data[clf.__class__] = accuracy_score(Y_test, np.round(y_pred))

training time is  0 seconds
testingtime is  1 seconds
0.9995


In [16]:
from sklearn.linear_model import LogisticRegression

clf = LogisticRegression()

t0 = time.time()
# Fitting the data into the model
clf.fit(X_train, Y_train)
# Calculating the training time
print("training time is ", round(time.time() - t0), "seconds")

# Testing the data on the trained model
t1 = time.time()
y_pred = clf.predict(X_test)
# Calculating the testing time
print("testingtime is ", round(time.time() - t1), "seconds")

print(accuracy_score(Y_test,y_pred))
accuracy_data[clf.__class__] = accuracy_score(Y_test, y_pred)

training time is  0 seconds
testingtime is  0 seconds
0.99


In [17]:
from sklearn.neural_network import MLPRegressor

clf = MLPRegressor()

t0 = time.time()
# Fitting the data into the model
clf.fit(X_train, Y_train)
# Calculating the training time
print("training time is ", round(time.time() - t0), "seconds")

# Testing the data on the trained model
t1 = time.time()
y_pred = clf.predict(X_test)
# Calculating the testing time
print("testingtime is ", round(time.time() - t1), "seconds")

print(accuracy_score(Y_test, np.round(y_pred)))
accuracy_data[clf.__class__] = accuracy_score(Y_test, np.round(y_pred))

training time is  1 seconds
testingtime is  0 seconds
0.957


In [22]:
from pprint import pprint
pprint(accuracy_data)
pprint?

{<class 'sklearn.linear_model.base.LinearRegression'>: 0.9995,
 <class 'sklearn.svm.classes.SVC'>: 0.9985,
 <class 'sklearn.neural_network.multilayer_perceptron.MLPRegressor'>: 0.957,
 <class 'sklearn.svm.classes.LinearSVC'>: 0.011,
 <class 'sklearn.linear_model.logistic.LogisticRegression'>: 0.99}
