<a href="https://www.kaggle.com/code/sullivansmith12/cyber-security-poster?scriptVersionId=218500218" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

These are the imports needed for the project.

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.preprocessing import MinMaxScaler
print("Done with initial imports.")

Done with initial imports.


This is the code that kaggle needs to use to get the directory system all setup and good to go.

In [2]:
# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
print("Done setting up file structure")

/kaggle/input/wsnds/WSN-DS.csv
Done setting up file structure


This is the line that reads the data in from the file.

In [3]:
#Read the csv file into a dataframe
df = pd.read_csv('../input/wsnds/WSN-DS.csv')
print("Done reading from file.")

Done reading from file.


This is the code that cleans up the data variously as needed. 
1.) Need to drop the id column as this isn't needed for the analysis.
2.) Rename 'Attack type' column to 'label' for easier manipulation.
3.) See what the different attack types that this data set holds.
4.) Re-map the literal text of attack types to a more machine friendly integer.
5.) Get rid of rows where there may have been null data.
6.) Seperate the labels and features from each other for easier manipulation.
7.) Divide the data into training/testing features/labels.
8.) Convert the arrays to numpy arrays. 
9.) Normalize the data in each column.
After all of these steps are taken, the data is now ready to be used to train the model.

In [4]:
#Drop the id, this is useless towards the model's calculations
df = df.drop(columns=[' id'])

#Fix the output label
df = df.rename(columns={'Attack type': "label"})
#Convert the string values to ints for predictions
print(df['label'].unique())
attack_mapping = {'Normal':0, 'Grayhole':1, 'Blackhole':2, 'TDMA':3, 'Flooding':4}
df['label'] = df['label'].map(attack_mapping)

#Convert all the data to numerical data and drop null values from df
df = df.apply(pd.to_numeric, errors='coerce')
df = df.dropna()

#Seperate the features from the labels
features = df.drop('label', axis=1)
labels = df['label']

#Split the data into testing/training data (features and labels)
feat_train, feat_test, label_train, label_test = train_test_split(features, labels, test_size=0.3, random_state=83) 

#Just for noting the shapes of the data
print("Training features shape:", feat_train.shape)
print("Training labels shape:", label_train.shape)
print("Testing features shape:", feat_test.shape)
print("Testing labels shape:", label_test.shape)

#Convert them all to NumPy arrays
feat_train = np.array(feat_train)
feat_test = np.array(feat_test)
label_train = np.array(label_train)
label_test = np.array(label_test)

#Normalize the data
scaler = MinMaxScaler()
feat_train = scaler.fit_transform(feat_train)
feat_test = scaler.fit_transform(feat_test)

print("Done cleaning data.")

['Normal' 'Flooding' 'TDMA' 'Grayhole' 'Blackhole']
Training features shape: (262262, 17)
Training labels shape: (262262,)
Testing features shape: (112399, 17)
Testing labels shape: (112399,)
Done cleaning data.


This is the import statement for the model.

In [5]:
from tensorflow import keras
from tensorflow.keras import layers
print("Done importing libraries.")

Done importing libraries.


We define our model here, compile it, and set the random seed for reproducibility. 

In [6]:
model = keras.Sequential([
        layers.Dense(512, activation="relu"),
        layers.Dense(1024, activation="relu"),
        layers.Dense(512, activation="softmax")
])

model.compile(optimizer="rmsprop",
              loss="sparse_categorical_crossentropy",
              metrics=["accuracy"])

keras.utils.set_random_seed(812)

print("The model has been created.")

The model has been created.


We can now train our model using the prepared data from earlier. We do this for 5 epochs to prevent the model from overfitting the training data. A batch size of 128 is taken to speed up the training process.

In [7]:
model.fit(feat_train, label_train, epochs=5, batch_size=128)
print("Done training model")

Epoch 1/5
[1m2049/2049[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m37s[0m 17ms/step - accuracy: 0.9529 - loss: 0.1831
Epoch 2/5
[1m2049/2049[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m36s[0m 18ms/step - accuracy: 0.9746 - loss: 0.0573
Epoch 3/5
[1m2049/2049[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 18ms/step - accuracy: 0.9797 - loss: 0.0466
Epoch 4/5
[1m2049/2049[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m37s[0m 18ms/step - accuracy: 0.9807 - loss: 0.0431
Epoch 5/5
[1m2049/2049[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m38s[0m 19ms/step - accuracy: 0.9821 - loss: 0.0411
Done training model


Now that the model is trained, we can evaluate it's accuracy and loss over the test data from earlier.

In [8]:
test_loss, test_acc = model.evaluate(feat_test, label_test) 
print(f"test_acc: {test_acc}")
print(f"test_loss: {test_loss}")

[1m3513/3513[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 4ms/step - accuracy: 0.9825 - loss: 0.0371
test_acc: 0.9819126725196838
test_loss: 0.037685226649045944


To see how well our model is doing for each specific class of intrusion, we need to see the specific predictions of the model for the test data. We assume that the intrusion that the model most probably predicts is its prediction for that input.

In [9]:
predictions = model.predict(feat_test)
predicted_classes = np.argmax(predictions, axis=1)

[1m3513/3513[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 4ms/step


Finally, we print that stats for the model's performance over each intrusion type. 

In [10]:
attack_mapping = {'Normal':0, 'Grayhole':1, 'Blackhole':2, 'TDMA':3, 'Flooding':4}
report = classification_report(label_test, predicted_classes, target_names=list(attack_mapping.keys()))
print(report)

              precision    recall  f1-score   support

      Normal       1.00      1.00      1.00    102105
    Grayhole       0.74      0.91      0.82      4317
   Blackhole       0.84      0.58      0.69      3025
        TDMA       1.00      0.92      0.96      1977
    Flooding       0.90      0.99      0.94       975

    accuracy                           0.98    112399
   macro avg       0.90      0.88      0.88    112399
weighted avg       0.98      0.98      0.98    112399

