## Botnet Detection with an Autoencoder
20 May 2021  
This notebook was created for a course at Istanbul Technical University.
- We implement (a simplified version of) the autoencoder-based anomaly detection described in the N-BaIoT paper [1].

[1] Meidan, Yair, et al. "N-BaIoT—Network-based Detection of IoT Botnet Attacks Using Deep Autoencoders." IEEE Pervasive Computing 17.3 (2018): 12-22. https://arxiv.org/pdf/1805.03409  

In [None]:
import os
import numpy as np
import tensorflow as tf

from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras import layers, losses, Sequential
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping

---
1. N-BaIoT Dataset
2. Autoencoder Architecture
3. Python Reimlementation
4. Conclusion

---
# 1. N-BaIoT Dataset
https://archive.ics.uci.edu/ml/datasets/detection_of_IoT_botnet_attacks_N_BaIoT
- Normal traffic was captured for 9 IoT devices connected to the network.
- Then, they were infected with Mirai and BASHLITE (aka gafgyt) malware.
- Traffic was captured for each device for different phases of the malware execution.
- From the network traffic, 115 features were extracted as described in [1].

For now, we start with data from a smart doorbell: normal execution and the different phases of Mirai.

In [None]:
def load_nbaiot(filename):
    return np.loadtxt(
        os.path.join("/kaggle/input/nbaiot-dataset", filename),
        delimiter=",",
        skiprows=1
    )

benign = load_nbaiot("1.benign.csv")
X_train = benign[:40000]
X_test0 = benign[40000:]
X_test1 = load_nbaiot("1.mirai.scan.csv")
X_test2 = load_nbaiot("1.mirai.ack.csv")
X_test3 = load_nbaiot("1.mirai.syn.csv")
X_test4 = load_nbaiot("1.mirai.udp.csv")
X_test5 = load_nbaiot("1.mirai.udpplain.csv")

In [None]:
print(X_train.shape, X_test0.shape, X_test1.shape, X_test2.shape,
      X_test3.shape, X_test4.shape, X_test5.shape)

---
# 2. Autoencoder Architecture
Relevant parts of [1] describing the autoencoder architecture:

- The general idea is autoencoder-based anomaly detection, p. 4:
> [W]e use deep autoencoders and maintain a  
> model for each IoT device separately. An autoencoder is a neural  
> network which is trained to reconstruct its inputs after some  
> compression. The compression ensures that the network learns the  
> meaningful concepts and the relation among its input features. If an  
> autoencoder is trained on benign instances only, then it will succeed  
> at reconstructing normal observations, but fail at reconstructing  
> abnormal observations (unknown concepts). When a significant re-  
> construction error is detected, then we classify the given  
> observations as being an anomaly.

- Details, p. 5:
> Each autoencoder had an input layer whose dimension is equal to the  
> number of features in the dataset (i.e., 115). As noted by [16] and  
> [15], autoencoders effectively perform dimen- sionality reduction  
> internally, such that the code layer be- tween the encoder(s) and  
> decoder(s) efficiently compresses the input layer and reflects its  
> essential characteristics. In our experiments, four hidden layers of  
> encoders were set at decreasing sizes of 75%, 50%, 33%, and 25% of the  
> input layer’s dimension. The next layers were decoders, with the same  
> sizes as the encoders, however with an increasing order (starting from  
> 33%).

- Anomaly Detection threshold, p.4:
> This anomaly threshold, above which an instance is considered  
> anomalous, is calculated as the sum of the sample mean and standard  
> deviation of [the mean squared error over the validation set].

- Sequences of packets, p.4:
> Preliminary experiments revealed that deciding whether a device’s  
> packet stream is anomalous or not based on a single instance enables  
> very accurate detection of IoT-based botnet attacks (high TPR).  
> However, benign instances were too often (in approximately 5-7% of  
> cases) falsely marked as anomalous. Thus we base the abnormality  
> decision on a sequence of instances by implementing a majority vote on  
> a moving window. We determine the minimal window size ws∗ as the  
> shortest sequence of instances, a majority vote which produces 0% FPR  
> on [the validation set].

- Final hyperparameters for the Danmini smart doorbell, p. 5:  
    - Learning rate: 0.012  
    - Number of epochs: 800  
    - Anomaly Threshold: 0.042  
    - Window Size: 82

---
# 3. Python Reimplementation
Adapting this Keras tutorial: https://github.com/tensorflow/docs/blob/master/site/en/tutorials/generative/autoencoder.ipynb

In [None]:
class Autoencoder(Model):
    def __init__(self):
        super(Autoencoder, self).__init__()
        self.encoder = Sequential([
            layers.Dense(115, activation="relu"),
            layers.Dense(86, activation="relu"),
            layers.Dense(57, activation="relu"),
            layers.Dense(37, activation="relu"),
            layers.Dense(28, activation="relu")
        ])
        self.decoder = Sequential([
            layers.Dense(37, activation="relu"),
            layers.Dense(57, activation="relu"),
            layers.Dense(86, activation="relu"),
            layers.Dense(115, activation="sigmoid")
        ])
    
    def call(self, x):
        encoded = self.encoder(x)
        decoded = self.decoder(encoded)
        return decoded

How can we determine the hyperparameters?
- In Keras, the fault learning rate for Adam optimizer is `0.001`. With that, training is relatively slow, so we quickly tried `0.01`.
- We use Early Stopping to find the number of epochs.
- The anomaly threshold is calculated as one standard deviation above the mean of training data losses.

In [None]:
scaler = MinMaxScaler()
x = scaler.fit_transform(X_train)

ae = Autoencoder()
ae.compile(optimizer=Adam(learning_rate=0.01), loss='mse')
monitor = EarlyStopping(
    monitor='val_loss',
    min_delta=1e-9,
    patience=5,
    verbose=1,
    mode='auto'
)
ae.fit(
    x=x,
    y=x,
    epochs=800,
    validation_split=0.3,
    shuffle=True,
    callbacks=[monitor]
)

training_loss = losses.mse(x, ae(x))
threshold = np.mean(training_loss)+np.std(training_loss)

In [None]:
def predict(x, threshold=threshold, window_size=82):
    x = scaler.transform(x)
    predictions = losses.mse(x, ae(x)) > threshold
    # Majority voting over `window_size` predictions
    return np.array([np.mean(predictions[i-window_size:i]) > 0.5
                     for i in range(window_size, len(predictions)+1)])

def print_stats(data, outcome):
    print(f"Shape of data: {data.shape}")
    print(f"Detected anomalies: {np.mean(outcome)*100}%")
    print()

In [None]:
test_data = [X_test0, X_test1, X_test2, X_test3, X_test4, X_test5]

for i, x in enumerate(test_data):
    print(i)
    outcome = predict(x)
    print_stats(x, outcome)

---
# 4. Conclusion
According to the above output, it seems to work well.

The following are some possibilities where to go from here:
- Run on the full N-BaIoT dataset: all 9 devices, all attacks.
- Implement (some of) the improvements from [2].
- In [3], it is suggested to use a subset of 23 features instead of all 115. However, different algorithms were used.
    - Question: Are these 23 features enough also for an autoencoder system like in [1] or [2]?
- Run on the MedBIoT dataset [4].
- Run on the IoT-23 dataset [5] after performing feature extraction.

# References
[1] Meidan, Yair, et al. "N-BaIoT—Network-based Detection of IoT Botnet Attacks Using Deep Autoencoders." IEEE Pervasive Computing 17.3 (2018): 12-22. https://arxiv.org/pdf/1805.03409  
[2] Mirsky, Yisroel et al. "Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection", NDSS (2018). https://arxiv.org/abs/1802.09089v2  
[3] Alhowaide, Alaa, et al. "Towards the design of real-time autonomous IoT NIDS." Cluster Computing (2021): 1-14. https://doi.org/10.1007/s10586-021-03231-5  
[4] Guerra-Manzanares, Alejandro, et al. "MedBIoT: Generation of an IoT Botnet Dataset in a Medium-sized IoT Network." ICISSP 1 (2020): 207-218. https://doi.org/10.5220/0009187802070218  
[5] Garcia, Sebastian et al. "IoT-23: A labeled dataset with malicious and benign IoT network traffic" (2020). (Version 1.0.0) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.4743746