## ECE 579M ST: Machine Learning in Cybersecurity
### Project Three: Side-Channel Attack Analysis

Implement various Machine Learning/Deep Learning techniques (kNN,Decision Tree,SVM,AutoEncoder,CNN,Ensemble,Random Forest,Naive Bayes) etc for the analysis of side-channel data obtained from web broswer profiling.

| ML/DL Technique                  | Training Accuracy (%) | Testing Accuracy (%) |
| :-                               |:-                     | :-
|1. k-Nearest Neighbor             |73.2                   |68.0                  |
|----------------------------------|-----------------------|----------------------|
|2. Support Vector Classifier      |80.2                   |73.1                  |
|----------------------------------|-----------------------|----------------------|
|3. Adaboost/Linear SVM            |58.9                   |53.0                  |
|----------------------------------|-----------------------|----------------------|
|4. Ensemble Voting Classifier     |100.0                  |79.5                  |
|----------------------------------|-----------------------|----------------------|
|5. Multilayer Perceptron          |100.0                  |78.2                  |
|----------------------------------|-----------------------|----------------------|

For these methods above, severe hyperparameter tuning was needed which took up a lot of time. Because of the small dataset, the training and test sets were concatenated and run through k-fold cross-validation to generate repeated instances of the data Additionally, the data was normalized so the features have equal weighting, and it was shuffled (as the training labels seemed to be sequential).


In fact, all but the fifth algorithm went though 5-folds. The algorithms were k-Nearest Neighbors (ten neighbors), a support vector classifier with an rbf kernel, adaboost algorithm (using 15 weak linear svm classifiers), ensemble soft voting classifier (with three sub classifiers- a random forest classifier, svm classifier, and gradient boosting algorithm). Finally, a simple multi-layer perceptron with 500 hidden layers was used.

Use of a Convolution Neural Network/ Autoencoder/LSTM-RNN networks were also looked at, but due to GPU compute constraints weren't pursued much more.

*It was seen that a Principal Components Analysis could be performed on the initial dataset to reduce the number of features from 6000 to obtain 1500 latent features ( a much smaller number of features), which would significantly reduce the training time, but slightly increase test/train errors.*

 ---
## Step 0: Import required packages

In [146]:
## LIST OF ALL IMPORTS
import os
import csv
import math
import random
import time
import os.path as path
import numpy as np
import scipy as sp

from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA,KernelPCA
from sklearn.pipeline import Pipeline
from sklearn.svm import LinearSVC,SVC

from sklearn.model_selection import cross_validate,ShuffleSplit,cross_val_predict,cross_val_score,GridSearchCV
from sklearn.model_selection import StratifiedKFold
from sklearn.neighbors import KNeighborsClassifier
from sklearn import metrics
from sklearn.metrics import accuracy_score

from sklearn.ensemble import AdaBoostClassifier,VotingClassifier,RandomForestClassifier,GradientBoostingClassifier
from sklearn.tree import DecisionTreeClassifier

from sklearn.neural_network import MLPClassifier
from keras.models import Sequential
from keras.layers import Dense

from keras.wrappers.scikit_learn import KerasClassifier

from sknn import ae, mlp

random_seed=5

In [121]:
from keras import backend as K

In [123]:
from keras import backend as K
import os

def set_keras_backend(backend):

    if K.backend() != backend:
        os.environ['KERAS_BACKEND'] = backend
        reload(K)
        assert K.backend() == backend

set_keras_backend("tensorflow")

---
## Step 1: Load Datasets & Basic Exploration of Dataset

In [2]:
## DATA PATHS
train_data_path='PerfWeb/X_train.dat'
train_label_path='PerfWeb/Y_train.dat'
test_data_path='PerfWeb/X_test.dat'
test_label_path='PerfWeb/Y_test.dat'

print("Reading test and train data files.")
X_train=np.genfromtxt(train_data_path,delimiter=',')
Y_train=np.genfromtxt(train_label_path,delimiter=',').T
X_test=np.genfromtxt(test_data_path,delimiter=',')
Y_test=np.genfromtxt(test_label_path,delimiter=',').T

print("Loaded side-channel data.")

Reading test and train data files.
Loaded side-channel data.


In [3]:
print("Training data has a shape of {}.".format(X_train.shape))
print("Training labels has a shape of {}.".format(Y_train.shape))
print("Testing data has a shape of {}.".format(X_test.shape))
print("Testing labels has a shape of {}.".format(Y_test.shape))

print("Training data has {} features and {} data points.".format(X_train.shape[1],X_train.shape[0]))
print("Testing data has {} features and {} data points.".format(X_test.shape[1],X_test.shape[0]))

assert X_train.shape[0]==Y_train.shape[0],'Train data and train labels must have the same shape.'
assert X_test.shape[0]==Y_test.shape[0],'Test data and train labels must have the same shape.' 

Training data has a shape of (1600, 6000).
Training labels has a shape of (1600,).
Testing data has a shape of (400, 6000).
Testing labels has a shape of (400,).
Training data has 6000 features and 1600 data points.
Testing data has 6000 features and 400 data points.


Since the number of data points is much smaller than the number of features (1600 vs. 6000) AND the the test size is significant compared to the the training data size (almost one fourth) ==> **Concatenate training and testing data and then perform K-Fold cross validation with every training algorithm.**

In [4]:
scaler=StandardScaler().fit(X_train)
X_train=scaler.transform(X_train) 
X_test=scaler.transform(X_test)

X=np.vstack((X_train,X_test))
Y_train=np.reshape(Y_train,(len(Y_train),1))
Y_test=np.reshape(Y_test,(len(Y_test),1))
Y=np.vstack((Y_train,Y_test))
print("Data Shape {}, Labels Shape {}".format(X.shape,Y.shape))
print("Shuffling data.")

from sklearn.utils import shuffle
X,Y=shuffle(X,Y,random_state=0)

Y=np.reshape(Y,(len(Y),))

Data Shape (2000, 6000), Labels Shape (2000, 1)
Shuffling data.


----

## Step 2: Data Analysis

In [28]:
# Perform dimensionality reduction on features
def perform_pca(dataset,n_components=None,perform_pca=True):
    if perform_pca:
        pca=PCA(n_components=n_components,svd_solver='randomized').fit(dataset)
        reduced_dataset=pca.transform(dataset)
        return reduced_dataset,pca
    else:
        return dataset

# Find the optimium number of components for dimension reduction using a simple/Linear-SVM classifier
def pca_ocheck(X_dataset,y_dataset):
    t0_pcacheck=time.time()
    pca=PCA(svd_solver='randomized')
#     pca=PCA(kernel='linear',random_state=random_seed,n_jobs=-1)
    clf=SVC(C=2, kernel='rbf', max_iter=1000,shrinking=True,
                random_state=random_seed, tol=0.0001,verbose=0)
    
    pipeline=Pipeline(steps=[('pca',pca),('svm',clf)])
    
    n_components=(500,1000,1500,2000,2500,3000,3500,4000,4500,5000,5500,6000)
   
    random_estimator=GridSearchCV(pipeline,dict(pca__n_components=n_components))
    random_estimator.fit(X_dataset,y_dataset)
    t1_pcacheck=time.time()
    print("GridSearchCV took {} seconds.".format(round(t1_pcacheck-t0_pcacheck,3)))
    
    return (random_estimator.cv_results_,random_estimator.best_estimator_)

In [29]:
# Applying PCA to obtain reduced dimension representation.

print("Performing PCA decomposition to find the best reduced dimensionality representation using grid search cross-validation.")
print("Assuming no features have to be normalized as they are all from the same data source.")
results,choice=pca_ocheck(X_train,np.reshape(Y_train,(len(Y_train),)))

print("Cross-Validation Results:", results)
print("Best Estimator:", choice)

Performing PCA decomposition to find the best reduced dimensionality representation using grid search cross-validation.
Assuming no features have to be normalized as they are all from the same data source.
GridSearchCV took 472.937 seconds.
Cross-Validation Results: {'split1_train_score': array([0.98888889, 0.96388889, 0.95833333, 0.95833333, 0.95833333,
       0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,
       0.95833333, 0.95833333]), 'mean_test_score': array([0.673125, 0.685625, 0.686875, 0.686875, 0.686875, 0.686875,
       0.686875, 0.686875, 0.686875, 0.686875, 0.686875, 0.686875]), 'split0_train_score': array([0.98557692, 0.95961538, 0.95769231, 0.95769231, 0.95769231,
       0.95769231, 0.95769231, 0.95769231, 0.95769231, 0.95769231,
       0.95769231, 0.95769231]), 'param_pca__n_components': masked_array(data=[500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500,
                   5000, 5500, 6000],
             mask=[False, False, False, False, False, False, F

**The above PCA analysis shows that the train dataset and labels can be reduced from a dimension of [n_examples x n_features] where n_features is 6000 to [n_examples x n_reduced] where n_reduced is 1500. PCA simplifies the dataset to have latent features with maximum variance (with some loss in data), which would make training faster however, will result in slightly lower accuracies.**

---
### 2.1: k-Nearest Neighbors

In [5]:
n_splits=20

knn_clf=KNeighborsClassifier(n_neighbors=10)

cross_validate_shuffle=ShuffleSplit(n_splits=n_splits,test_size=0.2,random_state=random_seed)
knn_scores=cross_validate(knn_clf,X,Y,cv=cross_validate_shuffle,scoring='accuracy',return_train_score=True,n_jobs=-1)

print(sorted(knn_scores.keys()))
print("Scores",knn_scores)

train_accuracy_array=knn_scores['train_score']
test_accuracy_array=knn_scores['test_score']

print("Training accuracy: {} (+/- {})" .format(round(train_accuracy_array.mean(),3),
                                               round(train_accuracy_array.std()*2,3)))

print("Testing accuracy: {} (+/- {})" .format(round(test_accuracy_array.mean(),3),
                                               round(test_accuracy_array.std()*2,3)))


['fit_time', 'score_time', 'test_score', 'train_score']
Scores {'score_time': array([9.63397121, 9.60387492, 9.73712039, 9.67707562, 9.81315112,
       9.66166234, 9.58163428, 9.65785241, 9.60549831, 9.8603456 ,
       9.61689854, 9.26228714, 9.2852025 , 9.76329279, 9.66433597,
       9.55012488, 9.75235558, 9.56144786, 9.28945518, 9.52209425]), 'fit_time': array([0.3906846 , 0.80480051, 0.71776509, 0.7332921 , 0.70535374,
       0.66731048, 0.60951853, 0.6636796 , 0.67183137, 0.67307115,
       0.71574068, 0.67207074, 0.6440208 , 0.55777931, 0.67455888,
       0.6189394 , 0.65745544, 0.73351693, 0.57717037, 0.58255339]), 'train_score': array([0.74125 , 0.735   , 0.73    , 0.736875, 0.731875, 0.734375,
       0.7325  , 0.735   , 0.731875, 0.73125 , 0.7275  , 0.72625 ,
       0.72625 , 0.731875, 0.73375 , 0.72875 , 0.73375 , 0.728125,
       0.733125, 0.735   ]), 'test_score': array([0.6725, 0.6775, 0.6875, 0.67  , 0.6925, 0.6925, 0.67  , 0.705 ,
       0.6725, 0.655 , 0.6875, 0.725 , 0

---
### 2.2: Support Vector Machine

In [32]:
# linear_svc_clf=LinearSVC(C=1.0, loss='squared_hinge', max_iter=1000, dual=True, 
#                 penalty='l2', random_state=random_seed, tol=0.0001,verbose=0)

svc_clf=SVC(C=2, kernel='rbf', max_iter=1000,shrinking=True,
                random_state=random_seed, tol=0.0001,verbose=1) # 2,poly good 2, rbf better


n_splits=5

cross_validate_shuffle=ShuffleSplit(n_splits=n_splits,test_size=0.2,random_state=random_seed)
svm_scores=cross_validate(svc_clf,X,Y,cv=cross_validate_shuffle,scoring='accuracy',
                          return_train_score=True,n_jobs=-1)

print(sorted(svm_scores.keys()))
print("Scores",svm_scores)

train_accuracy_array=svm_scores['train_score']
test_accuracy_array=svm_scores['test_score']

print("Training accuracy: {} (+/- {})" .format(round(train_accuracy_array.mean(),3),
                                               round(train_accuracy_array.std()*2,3)))

print("Testing accuracy: {} (+/- {})" .format(round(test_accuracy_array.mean(),3),
                                               round(test_accuracy_array.std()*2,3)))

['fit_time', 'score_time', 'test_score', 'train_score']
Scores {'score_time': array([6.38561392, 6.48737097, 6.41299987, 6.42047834, 4.48027539]), 'fit_time': array([11.85021877, 12.02202535, 11.92711115, 11.75989532, 11.14598298]), 'train_score': array([0.81    , 0.801875, 0.804375, 0.795   , 0.80125 ]), 'test_score': array([0.72  , 0.7175, 0.735 , 0.7375, 0.745 ])}
Training accuracy: 0.802 (+/- 0.01)
Testing accuracy: 0.731 (+/- 0.021)


---
### 2.3: SVM/ Adaboost

In [110]:
# Without K-Fold cross-validation (results in extremely low test/train accuracy)
# With K-Fold cross-validation, still low values
n_splits=5

Y_train=np.reshape(Y_train,(len(Y_train),))

adaboost_clf=AdaBoostClassifier(SVC(probability=True,kernel='linear'),
                                n_estimators=10,learning_rate=2,algorithm='SAMME.R',random_state=0)

cross_validate_shuffle=ShuffleSplit(n_splits=n_splits,test_size=0.25,random_state=random_seed)
adaboost_scores=cross_validate(adaboost_clf,X,Y,cv=cross_validate_shuffle,scoring='accuracy',return_train_score=True,n_jobs=-1)

print(sorted(adaboost_scores.keys()))
print("Scores",adaboost_scores)

train_accuracy_array=adaboost_scores['train_score']
test_accuracy_array=adaboost_scores['test_score']

print("Training accuracy: {} (+/- {})" .format(round(train_accuracy_array.mean(),3),
                                               round(train_accuracy_array.std()*2,3)))

print("Testing accuracy: {} (+/- {})" .format(round(test_accuracy_array.mean(),3),
                                               round(test_accuracy_array.std()*2,3)))


['fit_time', 'score_time', 'test_score', 'train_score']
Scores {'score_time': array([80.47567606, 80.81861782, 80.76564455, 69.78369212, 60.25447512]), 'fit_time': array([1262.34320617, 1232.52710223, 1232.32786369, 1176.02218604,
       1165.50443244]), 'train_score': array([0.578     , 0.60533333, 0.59333333, 0.56733333, 0.6       ]), 'test_score': array([0.54 , 0.534, 0.542, 0.518, 0.514])}
Training accuracy: 0.589 (+/- 0.028)
Testing accuracy: 0.53 (+/- 0.023)


---
### 2.4: Voting Classifier

In [52]:
voting_clf_1=RandomForestClassifier(n_estimators=15,criterion='gini', 
                                     max_depth=None,min_samples_split=2,
                                     min_samples_leaf=2,max_features='auto',max_leaf_nodes=None,
                                     bootstrap=True,n_jobs=-1,
                                     random_state=random_seed,warm_start=True, class_weight=None)

voting_clf_2=SVC(C=1, kernel='rbf', max_iter=1000,shrinking=True,probability=True,
                random_state=random_seed, tol=0.0001,verbose=1)

voting_clf_3=GradientBoostingClassifier(learning_rate=0.1,n_estimators=100,
                                        subsample=1.0,criterion='friedman_mse',max_depth=4, 
                                        max_leaf_nodes=None,presort='auto')

estimators=[('randforest',voting_clf_1),('svm',voting_clf_2),('gradientboost',voting_clf_3)]                                    

voting_classifier=VotingClassifier(estimators=estimators,voting='soft',flatten_transform=True)

voting_classifier=voting_classifier.fit(X_train,Y_train)
Y_train_predict=voting_classifier.predict(X_train)
Y_test_predict=voting_classifier.predict(X_test)   

train_accuracy=accuracy_score(Y_train,Y_train_predict)
test_accuracy=accuracy_score(Y_test,Y_test_predict)

print("Training accuracy: {}" .format(round(train_accuracy,3)))

print("Testing accuracy: {}" .format(round(test_accuracy,3)))

[LibSVM]

  if diff:


Training accuracy: 1.0
Testing accuracy: 0.795


  if diff:


---
### 2.5: MultiLayer Perceptron (MLP)

In [71]:
mlp_clf=MLPClassifier(hidden_layer_sizes=(500,),activation='relu',solver='lbfgs',
                      alpha=0.0001,batch_size='auto',learning_rate='adaptive', 
                      learning_rate_init=0.001,max_iter=300, 
                      shuffle=True,random_state=random_seed,tol=0.00001, 
                      verbose=True,warm_start=True,momentum=0.9,
                      nesterovs_momentum=True,early_stopping=False,validation_fraction=0.1,
                      epsilon=1e-08)

mlp_clf=mlp_clf.fit(X_train,Y_train)

Y_train_predict=mlp_clf.predict(X_train)
Y_test_predict=mlp_clf.predict(X_test)   

train_accuracy=accuracy_score(Y_train,Y_train_predict)
test_accuracy=accuracy_score(Y_test,Y_test_predict)

print("Training accuracy: {}" .format(round(train_accuracy,3)))

print("Testing accuracy: {}" .format(round(test_accuracy,3)))

Training accuracy: 1.0
Testing accuracy: 0.795


---
### 2.6: Recurrent Neural Network

---
### 2.7: AutoEncoder