**The dataset** is collected from UCI Machine Learning Repository through the following [link](https://archive.ics.uci.edu/ml/datasets/Unmanned+Aerial+Vehicle+%28UAV%29+Intrusion+Detection)

This application is working in first dataset (Bidirectional-flow/Parrot Bebop1), combined first dataset can be [downloaded](http://mason.gmu.edu/~lzhao9/materials/data/UAV/data/pub_dataset1.mat) from Liang Zhao homepage.Bidirectional-flow mode will involve 9 features × 2 sources × 3 direction flow = 54 features for more info visit this [link](http://mason.gmu.edu/~lzhao9/materials/data/UAV/)

extract data with its default name `pub_dataset1.mat` in `__data__` directory

In [1]:
import numpy as np
import pandas as pd
import h5py

import matplotlib.pyplot as plt
import seaborn as sns

from pprint import pprint

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

In [2]:
# use HDF reader for matlab v7.3 files
data = { k:np.array(v).T for k, v in h5py.File('./__data__/pub_dataset1.mat').items()}
data.keys()

dict_keys(['D', 'H', 'data_te', 'data_tr'])

In [None]:
$n$ is the number of training samples   
$k$ is the number of feature   
$n^{\prime}$ is the number of testing samples    
$k^{\prime}$ is the number of feature computational components and k is the numbe of features.  
The last column of `data_te` and `data_tr` is the label: `1 means UAV, 0 otherwise`

--- 
$\text{data_tr} \in R^{n×(k+1)}$   
$\text{data_te} \in R^{n^{\prime}×(k+1)}$   
$D \in R^{k×1}$. The generation runtime for each feature.  
$H \in R^{k^{\prime}×k}$. The incident matrix of the feature computational hypergraph (see the paper for details). 


In [3]:
def reset_random_seed(seed=1917):
    np.random.seed(seed)

In [4]:
X = data['data_tr'][:, :-1]
y = data['data_tr'][:, -1]

X_test = data['data_te'][:, :-1]
y_test = data['data_te'][:, -1]

## MLP
### Accuracy 0.9937035566396278

In [5]:
from sklearn.neural_network import MLPClassifier

In [6]:
reset_random_seed()
model = MLPClassifier()
model.fit(X, y)
model.score(X_test, y_test)

0.9937035566396278

In [7]:
def encoder(data, ae, encoding_layers_count=3):
    data = np.asmatrix(data)

    layer = data
    for i in range(encoding_layers_count):
        layer = layer*ae.coefs_[i] + ae.intercepts_[i]
        encoder1 = np.tanh(layer)
    
    return np.asarray(layer)

## Auto Encoder
### Accuracy 0.5536332179930796

In [8]:
from sklearn.neural_network import MLPRegressor

In [9]:
# Encoder structure
n_encoder1 = 25
n_encoder2 = 10

n_latent = 2

encoding_layers_count = 3

# Decoder structure
n_decoder2 = 10
n_decoder1 = 25

hidden_layer_sizes = (
    n_encoder1, 
    n_encoder2, 
    n_latent, 
    n_decoder2, 
    n_decoder1
)
reset_random_seed()
auto_encoder = MLPRegressor(
                   hidden_layer_sizes=hidden_layer_sizes, 
                   activation = 'tanh', 
                   solver = 'adam', 
                   learning_rate_init = 0.0001, 
                   max_iter = 200, 
                   tol = 0.0000001, 
                   verbose = True
)
auto_encoder.fit(X, X)

NameError: name 'n_enco' is not defined

In [None]:
# soft max
accuracy_score(y_test, np.argmax(encoder(X_test, auto_encoder), axis=1))

## AUTO ENCODER + SVM + Standarad Scaler
### Accuracy 0.9157637982869136

In [None]:
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.pipeline import make_pipeline

In [None]:
svm = make_pipeline(StandardScaler(), SVC(gamma='auto'))
svm.fit(encoder(X, auto_encoder), y)
accuracy_score(y_test, svm.predict(encoder(X_test, auto_encoder)))

## AUTO ENCODER {Multi laten} + SVM + Standarad Scaler
### Accuracy 0.9934766577797947

In [None]:
AE = MLPRegressor(
    hidden_layer_sizes=(100,30,7,30,50), 
    activation = 'tanh', 
    solver = 'adam', 
    learning_rate_init = 0.0001, 
    max_iter = 30, 
    tol = 0.0000001, 
    verbose = True
)
AE.fit(X, X)
svm = make_pipeline(StandardScaler(), SVC(gamma='auto'))
svm.fit(encoder(X, AE), y)
accuracy_score(y_test, svm.predict(encoder(X_test, AE)))

proposed method will be the follwoing  
- classify   
- print wrong data   
- train model on wrong data such as svm  
- use svm for that classified datas
- o.w use mlp normal model

## XGBOOST
### Acuuracy 100%

In [None]:
from sklearn.ensemble import GradientBoostingClassifier

In [None]:
random
xgboost = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0, max_depth=1)
xgboost.fit(X, y)

In [None]:
accuracy_score(y_test, xgboost.predict(X_test))

## Runtime

> Real-time responses are often understood to be in the order of milliseconds, and sometimes microseconds. 
 
So xgboost can be concider aa real-time process

In [None]:
import time
prediction_times = []

for x in X_test:
    x = x.reshape(1,-1)
    t0 = time.time()
    xgboost.predict(x)
    t1 = time.time()
    prediction_times.append(t1 - t0)

prediction_times = np.array(prediction_times)

In [None]:
print(f"prediction_times ~ N({np.mean(prediction_times)}, {np.std(prediction_times)})")
print(f"prediction_times slowers={prediction_times.max()*1000} ms (miliseconds)")
print(f"prediction_times fastest={prediction_times.min()*1000} ms (miliseconds)")

In [None]:
import platform
print(f"plarfomr machine {platform.machine()}")
print(f"plarfomr system {platform.system()}")
print(f"plarfomr processor {platform.processor()}")
print(f"plarfomr detail {platform.platform()}")