# Different models with the different parameter to find the best one

- one of the problems we have is that the dataset is small so we use Polynomial Features to increase the number of features from 6 feature to 27 feature.
- Polynomial Features with degree = 2

#### Models
- 1) Stochastic Gradient Descent
- 2) Support vector machine

- load dataset

In [1]:
# import library
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

from sklearn.preprocessing import PolynomialFeatures

from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import SGDClassifier
from sklearn.svm import SVC

from sklearn.metrics import classification_report
import joblib

from subprocess import check_output
from datetime import time

# load data
data = pd.read_csv("../../data.csv")

# Get valuse from data (datafram)
data = data.values
X = data[:,0:6] # all rows, no Lebal
y = data[:,6] # all rows, label only


- expand dataset features

In [2]:
# Define PolynomialFeatures variable with degree 2 and without bias
poly = PolynomialFeatures(degree=2 , include_bias=False)
X=poly.fit_transform(X)
X.shape

(11077, 27)

- split dataset to train and test dataset
    - with ratio 75% to train the dataset and 25% to the test dataset

In [3]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

### Train Stochastic Gradient Descent Model

 It can be regarded as a stochastic approximation of gradient descent optimization since it replaces the actual gradient (calculated from the entire data set) by an estimate thereof (calculated from a randomly selected subset of the data)

- Define the Stochastic Gradient Descent model
- Train(fit) the Stochastic Gradient Descent model
- Predict the output of X_test using Stochastic Gradient Descent model
- Calculate the accuracy of Stochastic Gradient Descent model
- Evaluate Model with classification report 

In [4]:
# Stochastic Gradient Descent
SGD = SGDClassifier(loss="hinge", penalty="l2")
SGD.fit(X_train, y_train)
ypred3=SGD.predict(X_test)
m3accurcy = SGD.score(X_test, y_test)

In [5]:
# print Stochastic Gradient Descent accuracy
print("model3_accuracy = " , m3accurcy)

model3_accuracy =  0.4660649819494585


In [6]:
# classification report of Stochastic Gradient Descent
print(classification_report(y_test, ypred3))

              precision    recall  f1-score   support

         1.0       0.97      0.41      0.57       498
         2.0       0.67      0.82      0.74       480
         3.0       0.38      0.01      0.02       654
         4.0       0.67      0.07      0.13       223
         5.0       0.36      0.66      0.46       382
         6.0       0.34      0.78      0.48       533

    accuracy                           0.47      2770
   macro avg       0.56      0.46      0.40      2770
weighted avg       0.55      0.47      0.40      2770



### Train Support vector machine Model

is supervised learning model with associated learning algorithms that analyze data used for classification and regression analysis based on the statistical learning framework.

An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on the side of the gap on which they fall.


- Define the Support vector machine model with parameter
    - kernel = radial basis function(RBF kernel)
    - max_iter = 2000
    - c = 1.0 (how much you want to avoid misclassifying each training)
    - gamma = auto
- Train(fit) the Support vector machine model
- Predict the output of X_test using Support vector machine model
- Calculate the accuracy of Support vector machine model
- Evaluate Model with classification report 

In [7]:
# Support vector machine
svcmodel = SVC(kernel='rbf',
               max_iter=2000,
               C=1.0,
              gamma='auto')
svcmodel.fit(X_train, y_train)
ypred4=SGD.predict(X_test)
m4accurcy = svcmodel.score(X_test, y_test)



In [8]:
# print Support vector machine accuracy
print("model4_accuracy = " , m4accurcy) 

model4_accuracy =  0.6379061371841155


In [9]:
# classification report of Support vector machine
print(classification_report(y_test, ypred4))

              precision    recall  f1-score   support

         1.0       0.97      0.41      0.57       498
         2.0       0.67      0.82      0.74       480
         3.0       0.38      0.01      0.02       654
         4.0       0.67      0.07      0.13       223
         5.0       0.36      0.66      0.46       382
         6.0       0.34      0.78      0.48       533

    accuracy                           0.47      2770
   macro avg       0.56      0.46      0.40      2770
weighted avg       0.55      0.47      0.40      2770



### save the model after training.
- so we can load it and run without the need to train anymore

In [10]:
# save the model to disk
joblib.dump(SGD, 'SGDC.sav')
joblib.dump(svcmodel,'svcmodel.sav')

['svcmodel.scav']

## Train Support vector machine models with different parameters
- Define the Support vector machine model 
- Train(fit) the Support vector machine model
- Calculate the accuracy of Support vector machine model

####  Support vector machine parameters
- kernel = radial basis function(RBF kernel)
- max_iter = 1000
- gamma = auto

In [11]:
test_model = SVC(kernel='rbf',
               max_iter=1000,
              gamma='auto')

test_model.fit(X_train, y_train)
accurcy = test_model.score(X_test, y_test)

print("test_model_accuracy = " , accurcy) 




test_model_accuracy =  0.6375451263537906


#### Support vector machine parameters
- kernel = sigmoid
- max_iter = 1000
- gamma = auto

In [12]:
test_model = SVC(kernel='sigmoid',
               max_iter=1000,
              gamma='auto')

test_model.fit(X_train, y_train)
accurcy = test_model.score(X_test, y_test)

print("test_model_accuracy = " , accurcy) 



test_model_accuracy =  0.19241877256317688


#### Support vector machine parameters
- kernel = sigmoid
- max_iter = 2000
- gamma = auto

In [13]:
test_model = SVC(kernel='sigmoid',
                        max_iter=2000,
                        gamma='auto')

test_model.fit(X_train, y_train)
accurcy = test_model.score(X_test, y_test)

print("test_model_accuracy = " , accurcy)  

test_model_accuracy =  0.23610108303249097


####  Support vector machine parameters
- kernel = radial basis function(RBF kernel)
- max_iter = 2000
- gamma = auto

In [14]:
test_model = SVC(kernel='rbf',
                 max_iter=2000,
                 gamma='auto')

test_model.fit(X_train, y_train)
accurcy = test_model.score(X_test, y_test)

print("test_model_accuracy = " , accurcy)   



test_model_accuracy =  0.6379061371841155


####  Support vector machine parameters
- kernel = radial basis function(RBF kernel)
- max_iter = 3000
- gamma = auto

In [15]:
test_model = SVC(kernel='rbf',
                 max_iter=3000,
                 gamma='auto')
 
test_model.fit(X_train, y_train)
accurcy = test_model.score(X_test, y_test)

print("test_model_accuracy = " , accurcy)   



test_model_accuracy =  0.6393501805054151


####  Support vector machine parameters
- kernel = linear
- max_iter = 2500
- c = 0.1 (how much you want to avoid misclassifying each training)
- gamma = auto

In [16]:
# svc
test_model = SVC(kernel='linear',
                 max_iter=2500,
                 C=0.1,
                 gamma='auto')
 
test_model.fit(X_train, y_train)
accurcy = test_model.score(X_test, y_test)

print("test_model_accuracy = " , accurcy)  



test_model_accuracy =  0.3931407942238267


#### Support vector machine parameters
- kernel = radial basis function(RBF kernel)
- max_iter = 10000
- c = 10 (how much you want to avoid misclassifying each training)
- gamma = auto

In [17]:
# svc
test_model = SVC(kernel='rbf',
               max_iter=10000,
               C=10,
              gamma='auto')
 
test_model.fit(X_train, y_train)
accurcy = test_model.score(X_test, y_test)

print("test_model_accuracy = " , accurcy)  



test_model_accuracy =  0.6274368231046932


####  Support vector machine parameters
- kernel = poly
- max_iter = 10000
- c = 10 (how much you want to avoid misclassifying each training)
- gamma = auto

In [18]:
# svc
test_model = SVC(kernel='poly',
               max_iter=10000,
               C=10,
              gamma='auto')
 
test_model.fit(X_train, y_train)
accurcy = test_model.score(X_test, y_test)

print("test_model_accuracy = " , accurcy)  



test_model_accuracy =  0.2884476534296029


####  Support vector machine parameters
- kernel = sigmoid
- max_iter = 10000
- c = 10 (how much you want to avoid misclassifying each training)
- gamma = auto

In [19]:
# svc
test_model = SVC(kernel='sigmoid',
               max_iter=10000,
               C=10,
              gamma='auto')
 
test_model.fit(X_train, y_train)
accurcy = test_model.score(X_test, y_test)

print("test_model_accuracy = " , accurcy)  

test_model_accuracy =  0.23610108303249097
