# Machine Learning for Network Intrusion Detection

Riyanda Cavin Sinambela
<br>
Network Security Class

## **Objective**

This assessment introduces students to **machine learning (ML) for network intrusion detection**. Students will:

- Work with a **PCAP** dataset from **Kaggle**.
- Extract network traffic features using **Scapy**.
- Train a **machine learning model** to classify traffic as **normal or malicious**.
- Evaluate model performance.

## Load Data and Extract Features

In [1]:
import pandas as pd

# cicds
usecols = ["ttl", "total_len", "protocol", "t_delta", "label"]
df_cicds = pd.read_csv("Payload_data_CICIDS2017.csv", usecols=usecols)

# unsw
df_unsw = pd.read_csv("Payload_data_UNSW.csv")
df_unsw = df_unsw[["ttl", "total_len", "protocol", "t_delta", "label"]]

In [2]:
print(df_unsw["protocol"].value_counts())

protocol
tcp            42428
udp            29188
others          5657
ospf             521
sctp             345
gre               95
sep               81
mobile            81
swipe             81
sun-nd            81
unas              60
pim               50
crtp              46
ipip              46
rdp               46
pipe              46
micp              46
vmtp              46
snp               46
gmtp              46
iplt              46
nvp               46
etherip           46
ax.25             46
leaf-2            46
hmp               46
sps               46
ib                46
secure-vmtp       46
rsvp              46
ipv6              46
fire              46
egp               46
ggp               46
sccopmce          46
emcon             46
dgp               46
crudp             46
arp                9
fc                 5
icmp               3
Name: count, dtype: int64


In [3]:
print(df_unsw["label"].value_counts())

label
normal            21000
generic           17580
exploits          13992
fuzzers           12722
reconnaissance     7562
dos                3397
backdoor           1239
analysis           1208
shellcode          1088
worms                93
Name: count, dtype: int64


## Prepare the Data for ML

In [4]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder


def preprocess_data(df, dataset_type="", binary=True):
    df = df.copy()
    df["label"] = df["label"].astype(str).str.lower()
    encoder = LabelEncoder()
    df["protocol"] = encoder.fit_transform(df["protocol"])

    if binary:
        if dataset_type == "cicds":
            df["label"] = df["label"].apply(lambda x: 0 if x == "benign" else 1)
        elif dataset_type == "unsw":
            df["label"] = df["label"].apply(lambda x: 0 if x == "normal" else 1)
    else:
        df["label"] = LabelEncoder().fit_transform(df["label"])
    return df

# CICIDS
df_cicds_binary = preprocess_data(df_cicds, dataset_type="cicds", binary=True)
df_cicds_multi = preprocess_data(df_cicds, dataset_type="cicds", binary=False)

# UNSW
df_unsw_binary = preprocess_data(df_unsw, dataset_type="unsw", binary=True)
df_unsw_multi = preprocess_data(df_unsw, dataset_type="unsw", binary=False)

# Split dataset
def split_data(df):
    X = df.drop(columns=["label"])
    y = df["label"]
    return train_test_split(X, y, test_size=0.3, random_state=42)

# CICIDS
X_train_cb, X_test_cb, y_train_cb, y_test_cb = split_data(df_cicds_binary)
X_train_cm, X_test_cm, y_train_cm, y_test_cm = split_data(df_cicds_multi)

# UNSW
X_train_ub, X_test_ub, y_train_ub, y_test_ub = split_data(df_unsw_binary)
X_train_um, X_test_um, y_train_um, y_test_um = split_data(df_unsw_multi)

In [5]:
print(df_unsw_binary["label"].value_counts())
print(df_unsw_multi["label"].value_counts())

label
1    58881
0    21000
Name: count, dtype: int64
label
6    21000
5    17580
3    13992
4    12722
7     7562
2     3397
1     1239
0     1208
8     1088
9       93
Name: count, dtype: int64


## Train Machine Learning Models

In [6]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report


def train_evaluate_dt(X_train, X_test, y_train, y_test, name=""):
    clf = DecisionTreeClassifier()
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    cm = confusion_matrix(y_test, y_pred)
    cr = classification_report(y_test, y_pred)

    print(f"==> Dataset: {name}")
    print("=== Decision Tree ===")
    print(f"Akurasi: {accuracy*100:.2f}%")
    print("Confusion Matrix:")
    print(cm)
    print("Classification Report:")
    print(cr)
    print("\n")

In [7]:
train_evaluate_dt(X_train_cb, X_test_cb, y_train_cb, y_test_cb, "CICIDS - Binary")
train_evaluate_dt(X_train_cm, X_test_cm, y_train_cm, y_test_cm, "CICIDS - Multiclass")

==> Dataset: CICIDS - Binary
=== Decision Tree ===
Akurasi: 99.87%
Confusion Matrix:
[[107714    411]
 [   144 314808]]
Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00    108125
           1       1.00      1.00      1.00    314952

    accuracy                           1.00    423077
   macro avg       1.00      1.00      1.00    423077
weighted avg       1.00      1.00      1.00    423077



==> Dataset: CICIDS - Multiclass
=== Decision Tree ===
Akurasi: 76.04%
Confusion Matrix:
[[107702     94      3      0      0      0      0      0      0    325
       0      0      1      0      0]
 [   107    658      0      0      0      0      0      0      0      6
       0      0      0      0      0]
 [     6      0  72677      0      4      1      2      1      0      0
       5      0      0      0      0]
 [     0      0      3   2067  32693   1854   1725     34      0      0
       0      8     29      1      2]

In [8]:
train_evaluate_dt(X_train_ub, X_test_ub, y_train_ub, y_test_ub, "UNSW - Binary")
train_evaluate_dt(X_train_um, X_test_um, y_train_um, y_test_um, "UNSW - Multiclass")

==> Dataset: UNSW - Binary
=== Decision Tree ===
Akurasi: 99.90%
Confusion Matrix:
[[ 6337    18]
 [    7 17603]]
Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00      6355
           1       1.00      1.00      1.00     17610

    accuracy                           1.00     23965
   macro avg       1.00      1.00      1.00     23965
weighted avg       1.00      1.00      1.00     23965



==> Dataset: UNSW - Multiclass
=== Decision Tree ===
Akurasi: 77.10%
Confusion Matrix:
[[  28   58  107   46   64   53    0   20    0    0]
 [  66   26  104   35   60   53    0   25    0    0]
 [  91   41  166  420  118   92    0   50    6    0]
 [ 110   70  310 3079  287  134    2  129   28    6]
 [  81   53  145  809 2474  138    2   47   77    3]
 [  68   40  124  340  171 4510    3   32    7    4]
 [   0    0    1    9    6    2 6337    0    0    0]
 [  77   45  118  196   76   57    0 1686    0    0]
 [   0    0    7   45  

In [9]:
from sklearn.linear_model import LogisticRegression

def train_evaluate_lr(X_train, X_test, y_train, y_test, name=""):
    log_model = LogisticRegression(
        max_iter=1000
    )
    log_model.fit(X_train, y_train)
    log_pred = log_model.predict(X_test)
    print(f"==> Dataset: {name}")
    print("=== Logistic Regression ===")
    print(f"Model Accuracy: {accuracy_score(y_test, log_pred) * 100:.2f}%")

    print("Confusion Matrix:")
    print(confusion_matrix(y_test, log_pred))

    print("\nClassification Report:")
    print(classification_report(y_test, log_pred))

In [10]:
train_evaluate_lr(X_train_cb, X_test_cb, y_train_cb, y_test_cb, "CICIDS - Binary")
train_evaluate_lr(X_train_cm, X_test_cm, y_train_cm, y_test_cm, "CICIDS - Multiclass")

==> Dataset: CICIDS - Binary
=== Logistic Regression ===
Model Accuracy: 94.22%
Confusion Matrix:
[[ 86847  21278]
 [  3179 311773]]

Classification Report:
              precision    recall  f1-score   support

           0       0.96      0.80      0.88    108125
           1       0.94      0.99      0.96    314952

    accuracy                           0.94    423077
   macro avg       0.95      0.90      0.92    423077
weighted avg       0.94      0.94      0.94    423077

==> Dataset: CICIDS - Multiclass
=== Logistic Regression ===
Model Accuracy: 61.00%
Confusion Matrix:
[[86057     0  2871     0 10728     0     9   333     0  7828    97   202
      0     0     0]
 [  339     0     8     0   204     0     0    15     0   205     0     0
      0     0     0]
 [    6     0 72616     0    74     0     0     0     0     0     0     0
      0     0     0]
 [    6     0     0     0 38387     0     0    23     0     0     0     0
      0     0     0]
 [    4     0     0     0 75070   

STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [11]:
train_evaluate_lr(X_train_ub, X_test_ub, y_train_ub, y_test_ub, "UNSW - Binary")
train_evaluate_lr(X_train_um, X_test_um, y_train_um, y_test_um, "UNSW - Multiclass")

==> Dataset: UNSW - Binary
=== Logistic Regression ===
Model Accuracy: 99.72%
Confusion Matrix:
[[ 6323    32]
 [   35 17575]]

Classification Report:
              precision    recall  f1-score   support

           0       0.99      0.99      0.99      6355
           1       1.00      1.00      1.00     17610

    accuracy                           1.00     23965
   macro avg       1.00      1.00      1.00     23965
weighted avg       1.00      1.00      1.00     23965

==> Dataset: UNSW - Multiclass
=== Logistic Regression ===
Model Accuracy: 59.65%
Confusion Matrix:
[[  16    8    0   22   14  314    2    0    0    0]
 [  12   12    0    8   21  309    7    0    0    0]
 [  15    9    0  360   66  531    3    0    0    0]
 [  22   12    0 2574  325 1217    5    0    0    0]
 [  14   11    0  735  670 2394    5    0    0    0]
 [  13   12    0  351  220 4699    4    0    0    0]
 [   0    0    0   24    2    6 6323    0    0    0]
 [   6    8    0    5   20 2207    9    0    0    0

STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [None]:
from sklearn.ensemble import RandomForestClassifier

def train_evaluate_RF(X_train, X_test, y_train, y_test, name=""):
    rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
    rf_model.fit(X_train, y_train)
    rf_pred = rf_model.predict(X_test)
    print(f"==> Dataset: {name}")
    print("=== Random Forest ===")
    print(f"Accuracy: {accuracy_score(y_test, rf_pred) * 100:.2f}%")

    print("Confusion Matrix:")
    print(confusion_matrix(y_test, rf_pred))

    print("\nClassification Report:")
    print(classification_report(y_test, rf_pred))

In [13]:
train_evaluate_RF(X_train_cb, X_test_cb, y_train_cb, y_test_cb, "CICIDS - Binary")
train_evaluate_RF(X_train_cm, X_test_cm, y_train_cm, y_test_cm, "CICIDS - Multiclass")

==> Dataset: CICIDS - Binary

=== Random Forest ===
Accuracy: 99.84%
Confusion Matrix:
[[107574    551]
 [   129 314823]]

Classification Report:
              precision    recall  f1-score   support

           0       1.00      0.99      1.00    108125
           1       1.00      1.00      1.00    314952

    accuracy                           1.00    423077
   macro avg       1.00      1.00      1.00    423077
weighted avg       1.00      1.00      1.00    423077

==> Dataset: CICIDS - Multiclass

=== Random Forest ===
Accuracy: 76.16%
Confusion Matrix:
[[107580     90      2      0      0      0      0      0      0    452
       0      0      1      0      0]
 [    92    668      0      0      0      0      0      0      0      8
       0      3      0      0      0]
 [     6      0  72672      0      4      1      2      1      0      0
      10      0      0      0      0]
 [     0      0      0   1297  32449   2446   2138     35      0      0
       0     10     38      1     

In [14]:
train_evaluate_RF(X_train_ub, X_test_ub, y_train_ub, y_test_ub, "UNSW - Binary")
train_evaluate_RF(X_train_um, X_test_um, y_train_um, y_test_um, "UNSW - Multiclass")

==> Dataset: UNSW - Binary

=== Random Forest ===
Accuracy: 99.92%
Confusion Matrix:
[[ 6337    18]
 [    2 17608]]

Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00      6355
           1       1.00      1.00      1.00     17610

    accuracy                           1.00     23965
   macro avg       1.00      1.00      1.00     23965
weighted avg       1.00      1.00      1.00     23965

==> Dataset: UNSW - Multiclass

=== Random Forest ===
Accuracy: 77.12%
Confusion Matrix:
[[  24   28   67   51  110   62    0   34    0    0]
 [  36   17   61   47  104   62    0   41    1    0]
 [  52   28  117  421  180  103    0   72   11    0]
 [  59   40  184 3108  379  164    2  172   43    4]
 [  46   31  112  794 2503  152    1   67  119    4]
 [  39   21   74  363  222 4502    2   60   11    5]
 [   0    0    1    9    6    2 6337    0    0    0]
 [  49   20   85  187  131   65    0 1714    4    0]
 [   0    0    9   4

In [15]:
from sklearn.ensemble import GradientBoostingClassifier

def train_evaluate_GB(X_train, X_test, y_train, y_test, name=""):
    gb_model = GradientBoostingClassifier(n_estimators=100, random_state=42)
    gb_model.fit(X_train, y_train)
    gb_pred = gb_model.predict(X_test)

    print("\n=== Gradient Boosting ===")
    print(f"Accuracy: {accuracy_score(y_test, gb_pred) * 100:.2f}%")

    print("Confusion Matrix:")
    print(confusion_matrix(y_test, gb_pred))

    print("\nClassification Report:")
    print(classification_report(y_test, gb_pred))

In [16]:
train_evaluate_GB(X_train_cb, X_test_cb, y_train_cb, y_test_cb, "CICIDS - Binary")
train_evaluate_GB(X_train_cm, X_test_cm, y_train_cm, y_test_cm, "CICIDS - Multiclass")


=== Gradient Boosting ===
Accuracy: 99.37%
Confusion Matrix:
[[105979   2146]
 [   524 314428]]

Classification Report:
              precision    recall  f1-score   support

           0       1.00      0.98      0.99    108125
           1       0.99      1.00      1.00    314952

    accuracy                           0.99    423077
   macro avg       0.99      0.99      0.99    423077
weighted avg       0.99      0.99      0.99    423077


=== Gradient Boosting ===
Accuracy: 76.80%
Confusion Matrix:
[[106378      0      0      0      0      0      0      1      5   1708
      17      3      9      0      4]
 [   468    151      0      0     65      1      0      0      0     74
       0     10      0      0      2]
 [     6      0  72641      0      6      0      0     12      0      0
       0     31      0      0      0]
 [     0      0      0    425  35505    960   1284     53      0      0
       1     24    158      1      5]
 [     0      0      0    395  68800   1198   4417

In [17]:
train_evaluate_GB(X_train_ub, X_test_ub, y_train_ub, y_test_ub, "UNSW - Binary")
train_evaluate_GB(X_train_um, X_test_um, y_train_um, y_test_um, "UNSW - Multiclass")


=== Gradient Boosting ===
Accuracy: 99.89%
Confusion Matrix:
[[ 6337    18]
 [    8 17602]]

Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00      6355
           1       1.00      1.00      1.00     17610

    accuracy                           1.00     23965
   macro avg       1.00      1.00      1.00     23965
weighted avg       1.00      1.00      1.00     23965


=== Gradient Boosting ===
Accuracy: 78.13%
Confusion Matrix:
[[ 143    5  119   36   53    2    0   18    0    0]
 [ 136   10  103   49   56    3    0   12    0    0]
 [ 162    7  116  442  193   22    1   35    6    0]
 [ 156    8  134 3349  360   40    0  101    7    0]
 [ 159    6  126  788 2633   29    0   30   57    1]
 [ 142    7  101  395  256 4360    4   26    8    0]
 [   0    0    0    8    8    2 6337    0    0    0]
 [ 166    6  101  193   90    2    0 1695    2    0]
 [   0    0    0   22  195    7    0   13   81    0]
 [   0    0    0 