## GNN4ID Heterogeneous Graph Model

In this notebook, we provide instructions for using our developed heterogeneous graph models. We have created two different architectures:

1. **Model without Edge Attributes**: In this model, edges provide only the connection information between nodes. This means the model focuses solely on the structural relationships within the graph.
2. **Model with Edge Attributes**: In this model, edges have their own attributes/features in addition to providing connection information between nodes. This allows the model to leverage additional information carried by the edges, potentially improving its performance and insights.

In [1]:
from Utility.Functions import *
from Utility.Model import *
from Utility.Training import *
from Utility.Additional_Features import *
from torch_geometric.loader import DataLoader
from tqdm import tqdm

### Reading Graph Objects

**dir**: Where grapgh data is stored in processed folder.
    data directory will have two folders inside: raw and processed.
    graph objects will be stored in this processed folder

In [4]:
Dict_x = {'Fuzzers': 0 , 
          'Reconnaissance': 1, 
          'Shellcode': 2,
          'Analysis' : 3,
          'Backdoors' : 4,
          'Dos' : 5,
          'Exploits' : 6,
          'Worms': 7
         }


In [5]:
import os, shutil

# 원본 전처리 파일들
src_dir = r'data\processed_unsw'
files = ['train.csv','val.csv','test.csv']  # 이미 생성된 파일들

# PyG가 보는 raw 디렉토리 (root="data" 기준)
raw_dir = os.path.join('data','raw')
os.makedirs(raw_dir, exist_ok=True)

# 복사
for f in files:
    src = os.path.join(src_dir, f)
    dst = os.path.join(raw_dir, f)
    if os.path.exists(src) and not os.path.exists(dst):
        shutil.copy(src, dst)
        print('copied', src, '->', dst)
    else:
        print('skip', src)
# 이후 NIDSDataset에 filename으로는 basenames 사용
Files = ['train.csv'] 
data_Hetero = NIDSDataset(root='data', label_dict=Dict_x, filename=Files, skip_processing=False, test=False, single_file=True)

copied data\processed_unsw\train.csv -> data\raw\train.csv
copied data\processed_unsw\val.csv -> data\raw\val.csv
copied data\processed_unsw\test.csv -> data\raw\test.csv


Processing...
Reading File ---> train.csv
100%|██████████| 1419254/1419254 [25:46<00:00, 917.44it/s] 
Done!


In [6]:
data_Hetero

NIDSDataset(1419254)

### Initializing the Model

In [14]:
## Arguments for running the model
args = {
    'device': torch.device('cuda' if torch.cuda.is_available() else 'cpu'),
    'hidden_size': 64,
    'epochs': 10,
    'weight_decay': 1e-5,
    'lr': 0.01,
    'attn_size': 32,
    'eps': 1.0,
}

In [15]:
## Initializing a Data Instance for Model Initialization
data_model=data_Hetero[0].to(args['device'])

In [16]:
## Model without edge attributes
model = HeteroGNN(data_model, args, aggr="mean").to(args['device'])

## Model with Edge attributes
# model = HeteroGNN_Edge(data_model, args, aggr="mean").to(args['device'])

### Training Loop


In [17]:
train_loader = DataLoader(data_Hetero, batch_size=512, shuffle=True)

In [18]:
# For training the model without edge attributes
train(train_loader, model, args, args["device"])

# # For training the model with edge attributes 
# train_with_edge_Att(train_loader, model, args, args["device"])

 56%|█████▌    | 1554/2772 [1:30:52<1:11:13,  3.51s/it]


KeyboardInterrupt: 

### Testing Loop

In [12]:
data_Hetero = NIDSDataset(root=dir, label_dict=Dict_x, filename=Files, skip_processing=True, test=True, single_file=True)

In [13]:
## For testing the model
test_loader = DataLoader(data_Hetero, batch_size=1, shuffle=False)

In [None]:
# For testing the model without edge attributes
acc, prediction, label = test_cm(test_loader,model)

# # For testing the model with edge attributes 
# acc, prediction, label = test_cm_with_edge_att(test_loader,model)

#### Classification Report

In [None]:
from sklearn.metrics import mean_squared_error, accuracy_score, classification_report, confusion_matrix
import matplotlib.pyplot as plt
print(classification_report(label,prediction))
print('\n')
print('                    Accuracy %',(round(accuracy_score(label,prediction),4)*100))
print('\n')

#### Confusion Matrix

In [None]:
cm=confusion_matrix(label,prediction, normalize='true') ## Getting Results in Percentage 
plt.figure(figsize=(10, 10))
ax = plt.axes()
sns.heatmap(cm, annot=True, cmap='Blues', fmt='.1%',ax=ax) # fmt= 'd' for just showing the value in int
ax.set_ylabel('True Label') 
ax.set_xlabel('Predicted label')
labels=['Benign','WebBased','Spoofing','Recon','Mirai','Dos','DDos','BruteForce']
ax.xaxis.set_ticklabels(labels); ax.yaxis.set_ticklabels(labels)
plt.show()


#### Saving/Loading Model

In [None]:
torch.save(model, '/scratch/user/yasir.ali/GNN_Project/Saved_Model/GNN4ID_8_Classes/model.pth')
# model = torch.load('/scratch/user/yasir.ali/GNN_Project/Saved_Model/GNN4ID_8_Classes/model.pth')