# Deep Autoencoder với Soft Triple Loss và Grid Search

Notebook này triển khai Deep Autoencoder (DAE) kết hợp với Soft Triple Loss để phát hiện bất thường trong dữ liệu mạng. Code thực hiện grid search để tìm số chiều tiềm ẩn (`latent_dims`) và số tâm (`K`) tối ưu.

## Các bước chính:
1. Tiền xử lý dữ liệu từ file CSV.
2. Xây dựng và huấn luyện DAE với Soft Triple Loss.
3. Grid search cho `latent_dims` ([8, 16, 32]) và `K` ([3, 5, 7, 11]).
4. Trích xuất đặc trưng và đánh giá với các mô hình phân loại.
5. Lưu kết quả grid search vào file markdown và vẽ biểu đồ mất mát.


In [1]:
# Import thư viện
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_score, recall_score, f1_score, confusion_matrix
from imblearn.under_sampling import RandomUnderSampler
from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.tree import DecisionTreeClassifier
import matplotlib.pyplot as plt
import time
import os
from itertools import product

## Tiền xử lý dữ liệu

Hàm `preprocess_data` thực hiện các bước:
- Tải file CSV và kiểm tra cột `Label`.
- Ánh xạ nhãn thành nhị phân (0: BENIGN, 1: khác).
- Loại bỏ giá trị vô cực và NaN.
- Loại bỏ cột không phải số và đặc trưng có phương sai thấp.
- Chuẩn hóa dữ liệu bằng MinMaxScaler.
- Cân bằng dữ liệu bằng RandomUnderSampler.
- Chia tập huấn luyện và kiểm tra.

In [2]:
def map_labels_to_numeric(labels):
    label_mapping = {'BENIGN': 0}
    numeric_labels = labels.apply(lambda x: 0 if 'BENIGN' in str(x) else 1)
    return numeric_labels

def preprocess_data(data_path, features):
    data = pd.read_csv(data_path, low_memory=False)
    if ' Label' not in data.columns:
        raise ValueError("Cột 'Label' không tồn tại trong dữ liệu.")
    missing_features = [f for f in features if f not in data.columns]
    if missing_features:
        features = [f for f in features if f in data.columns]
    if len(features) == 0:
        raise ValueError("Không có đặc trưng nào hợp lệ trong dữ liệu.")
    data = data.replace([np.inf, -np.inf], np.nan).dropna()
    numeric_labels = map_labels_to_numeric(data[' Label'])
    benign_data = data[data[' Label'].str.contains('BENIGN', case=False, na=False)][features]
    all_data = data[features]
    if benign_data.empty:
        raise ValueError("Không tìm thấy dữ liệu hợp lệ với nhãn 'Benign'.")
    non_numeric_cols = benign_data.select_dtypes(exclude=['float64', 'int64']).columns
    if len(non_numeric_cols) > 0:
        benign_data = benign_data.drop(columns=non_numeric_cols)
        all_data = all_data.drop(columns=non_numeric_cols)
    benign_data = benign_data.loc[:, benign_data.var() > 1e-3]
    all_data = all_data[benign_data.columns].dropna()
    numeric_labels = numeric_labels[all_data.index]
    if benign_data.empty:
        raise ValueError("DataFrame rỗng sau khi loại bỏ NaN.")
    selected_features = benign_data.columns.tolist()
    scaler = MinMaxScaler()
    benign_data_scaled = scaler.fit_transform(benign_data)
    all_data_scaled = scaler.transform(all_data)
    train_data, test_data = train_test_split(benign_data_scaled, test_size=0.2, random_state=42)
    rus = RandomUnderSampler(sampling_strategy=1.0, random_state=42)
    all_data_resampled, test_labels_resampled = rus.fit_resample(all_data_scaled, numeric_labels)
    return train_data, test_data, all_data_resampled, test_labels_resampled, scaler, selected_features

## Mô hình Deep Autoencoder

Lớp `DeepAutoencoder` xây dựng một autoencoder sâu với:
- **Encoder**: Giảm chiều dữ liệu qua các tầng [512, 256, 128, latent_dim].
- **Decoder**: Tái tạo dữ liệu từ không gian tiềm ẩn.
- Mỗi tầng sử dụng SELU, BatchNorm, và Dropout (0.2) để ổn định huấn luyện.

In [3]:
class DeepAutoencoder(nn.Module):
    def __init__(self, input_dim, latent_dim):
        super(DeepAutoencoder, self).__init__()
        hidden_dims = [512, 256, 128, latent_dim]
        encoder_layers = []
        prev_dim = input_dim
        for dim in hidden_dims:
            encoder_layers.extend([
                nn.Linear(prev_dim, dim),
                nn.SELU(),
                nn.BatchNorm1d(dim),
                nn.Dropout(0.2)
            ])
            prev_dim = dim
        self.encoder = nn.Sequential(*encoder_layers)
        decoder_layers = []
        hidden_dims = hidden_dims[::-1]
        for dim in hidden_dims[:-1]:
            decoder_layers.extend([
                nn.Linear(prev_dim, dim),
                nn.SELU(),
                nn.BatchNorm1d(dim),
                nn.Dropout(0.2)
            ])
            prev_dim = dim
        decoder_layers.append(nn.Linear(prev_dim, input_dim))
        self.decoder = nn.Sequential(*decoder_layers)
    
    def forward(self, x):
        encoded = self.encoder(x)
        decoded = self.decoder(encoded)
        return decoded, encoded

## Soft Triple Loss

Hàm `soft_triple_loss` tối ưu hóa biểu diễn tiềm ẩn bằng cách sử dụng nhiều tâm (`K`) cho mỗi lớp (benign và anomalous). Các tham số mặc định:
- `gamma=0.1`, `delta=0.01`, `lambda_param=2.0`.
- Sử dụng chuẩn hóa L2 và softmax để tính độ tương đồng.

In [4]:
def soft_triple_loss(encoded, labels, centers, gamma=0.1, delta=0.01, lambda_param=2.0):
    if len(encoded) == 0 or centers is None:
        return torch.tensor(0.0, device=encoded.device)
    
    K = centers.size(1)
    num_classes = 2
    d = encoded.shape[1]
    
    encoded_norm = F.normalize(encoded, p=2, dim=1)
    centers_norm = F.normalize(centers, p=2, dim=2)
    inner_logits = torch.einsum('bd,nkd->bnk', encoded_norm, centers_norm)
    inner_softmax = F.softmax(inner_logits / gamma, dim=2)
    S = lambda_param * (torch.sum(inner_softmax * inner_logits, dim=2) - delta * labels)
    outer_softmax = F.softmax(S, dim=1)
    loss = -torch.sum(torch.log(torch.sum(outer_softmax * labels, dim=1) + 1e-6))
    return loss / encoded.size(0)

## Hàm hỗ trợ

- `initialize_centers`: Khởi tạo tâm ngẫu nhiên cho Soft Triple Loss.
- `train_DAE_soft_triplet`: Huấn luyện DAE với Soft Triple Loss, sử dụng early stopping và scheduler.
- `extract_features`: Trích xuất đặc trưng từ không gian tiềm ẩn.
- `evaluate_model`: Đánh giá mô hình phân loại với các chỉ số Precision, Recall, F1-score, và Confusion Matrix.

In [5]:
def initialize_centers(d, num_classes=2, K=5):
    return torch.randn(num_classes, K, d, requires_grad=True)

def train_DAE_soft_triplet(model, train_loader, optimizer, scheduler, centers, epochs=30, device='cuda', patience=10, gamma=0.1, delta=0.01, lambda_param=2.0):
    model.train()
    losses = []
    best_loss = float('inf')
    patience_counter = 0
    for epoch in range(epochs):
        epoch_loss = 0
        batch_count = 0
        for data, labels in train_loader:
            data, labels = data.to(device), labels.to(device)
            labels_one_hot = F.one_hot(labels, num_classes=2).float()
            output, encoded = model(data)
            recon_loss = nn.MSELoss()(data, output)
            soft_triple = soft_triple_loss(encoded, labels_one_hot, centers, gamma, delta, lambda_param)
            total_loss = recon_loss + soft_triple
            if torch.isnan(total_loss):
                continue
            optimizer.zero_grad()
            total_loss.backward()
            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
            optimizer.step()
            epoch_loss += total_loss.item()
            batch_count += 1
        if batch_count == 0:
            continue
        avg_loss = epoch_loss / batch_count
        losses.append(avg_loss)
        scheduler.step(avg_loss)
        print(f'Epoch [{epoch+1}/{epochs}], Total Loss: {avg_loss:.4f}, LR: {scheduler.get_last_lr()[0]:.6f}')
        if not torch.isnan(torch.tensor(avg_loss)) and avg_loss < best_loss:
            best_loss = avg_loss
            patience_counter = 0
            torch.save(model.state_dict(), f'best_DAE_soft_triplet_latent_{model.encoder[-4].out_features}_K_{centers.size(1)}.pth')
        else:
            patience_counter += 1
            if patience_counter >= patience:
                print(f"Early stopping at epoch {epoch+1}")
                break
    return losses

def extract_features(model, data, device='cuda'):
    model.eval()
    data_tensor = torch.tensor(data, dtype=torch.float32).to(device)
    with torch.no_grad():
        _, encoded = model(data_tensor)
    return encoded.cpu().numpy()

def evaluate_model(model, model_name, data, labels):
    start_time = time.time()
    if model_name == 'Linear Regression':
        predictions_proba = model.predict(data)
        predictions = (predictions_proba >= 0.5).astype(int)
    else:
        predictions = model.predict(data)
    training_time = time.time() - start_time
    cm = confusion_matrix(labels, predictions)
    precision = precision_score(labels, predictions, zero_division=0)
    recall = recall_score(labels, predictions, zero_division=0)
    f1 = f1_score(labels, predictions, zero_division=0)
    print(f"{model_name} - Precision: {precision:.4f}")
    print(f"{model_name} - Recall: {recall:.4f}")
    print(f"{model_name} - F1-score: {f1:.4f}")
    print(f"{model_name} - Confusion Matrix:\n{cm}")
    print(f"{model_name} - Số lượng bất thường phát hiện: {np.sum(predictions)}")
    print(f"{model_name} - Training time: {training_time:.2f} seconds")
    return precision, recall, f1, training_time, cm

## Hàm chính với Grid Search

Hàm `main` thực hiện:
- Tiền xử lý dữ liệu.
- Grid search cho `latent_dims=[8, 16, 32]` và `K=[3, 5, 7, 11]`.
- Huấn luyện DAE với Soft Triple Loss cho từng tổ hợp.
- Trích xuất đặc trưng và đánh giá với các mô hình phân loại.
- Lưu kết quả vào file markdown và vẽ biểu đồ mất mát.

In [6]:
def main():
    data_path = r'C:\Users\belon\Downloads\combine.csv\combine.csv'
    df = pd.read_csv(data_path)
    features = df.columns.drop([' Label', ' Destination Port'])
    train_data, test_data, all_data_resampled, test_labels_resampled, scaler, selected_features = preprocess_data(data_path, features)
    if len(train_data) == 0 or len(test_data) == 0:
        raise ValueError("Dữ liệu huấn luyện hoặc kiểm tra rỗng.")
    
    input_dim = len(selected_features)
    all_data_tensor = torch.tensor(all_data_resampled, dtype=torch.float32)
    labels_tensor = torch.tensor(test_labels_resampled.values, dtype=torch.long)
    dataset_all = torch.utils.data.TensorDataset(all_data_tensor, labels_tensor)
    train_loader_all = torch.utils.data.DataLoader(dataset_all, batch_size=2048, shuffle=True, num_workers=4)
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

    models = {
        'Linear Regression': LinearRegression(),
        'Gradient Boosting': GradientBoostingClassifier(random_state=42),
        'Logistic Regression': LogisticRegression(random_state=42, max_iter=1000),
        'Decision Tree': DecisionTreeClassifier(random_state=42)
    }
    
    # Grid Search parameters
    latent_dims = [8, 16, 32]
    K_values = [3, 5, 7, 11]
    results = []
    
    for latent_dim, K in product(latent_dims, K_values):
        print(f"\nTraining DAE with Soft Triple Loss (latent_dim={latent_dim}, K={K})...")
        DAE_soft_triplet = DeepAutoencoder(input_dim=input_dim, latent_dim=latent_dim).to(device)
        centers = initialize_centers(d=latent_dim, num_classes=2, K=K).to(device)
        optimizer = torch.optim.Adam(list(DAE_soft_triplet.parameters()) + [centers], lr=0.05)
        scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.5, patience=10)
        
        start_time = time.time()
        losses = train_DAE_soft_triplet(
            DAE_soft_triplet, train_loader_all, optimizer, scheduler, centers, 
            epochs=30, device=device, gamma=0.1, delta=0.01, lambda_param=2.0
        )
        training_time = time.time() - start_time
        
        plt.plot(losses)
        plt.xlabel('Epoch')
        plt.ylabel('Total Loss')
        plt.title(f'DAE with Soft Triple Loss (latent_dim={latent_dim}, K={K})')
        plt.savefig(f'loss_plot_latent_{latent_dim}_K_{K}.png')
        plt.close()
        
        model_path = f'best_DAE_soft_triplet_latent_{latent_dim}_K_{K}.pth'
        if os.path.exists(model_path):
            DAE_soft_triplet.load_state_dict(torch.load(model_path, map_location=device))
        
        soft_triplet_features = extract_features(DAE_soft_triplet, all_data_resampled, device)
        train_features, test_features, train_labels, test_labels = train_test_split(
            soft_triplet_features, test_labels_resampled, test_size=0.2, random_state=42, stratify=test_labels_resampled
        )
        
        for model_name, model in models.items():
            print(f"\nEvaluating {model_name} with latent_dim={latent_dim}, K={K}...")
            start_time = time.time()
            model.fit(train_features, train_labels)
            eval_time = time.time() - start_time
            precision, recall, f1, _, cm = evaluate_model(model, model_name, test_features, test_labels)
            results.append({
                'Latent Dim': latent_dim,
                'K': K,
                'Model': model_name,
                'Precision': precision,
                'Recall': recall,
                'F1-score': f1,
                'Training Time (DAE)': training_time,
                'Training Time (Model)': eval_time
            })
    
    # Lưu kết quả grid search
    results_df = pd.DataFrame(results)
    for model_name in models.keys():
        model_results = results_df[results_df['Model'] == model_name]
        markdown_table = f"# Kết quả Grid Search cho {model_name}\n\n"
        markdown_table += "| Latent Dim | K | Precision | Recall | F1-score | Thời gian huấn luyện DAE (s) | Thời gian huấn luyện Model (s) |\n"
        markdown_table += "|------------|---|-----------|--------|----------|-----------------------------|-------------------------------|\n"
        for _, row in model_results.iterrows():
            markdown_table += f"| {row['Latent Dim']} | {row['K']} | {row['Precision']:.4f} | {row['Recall']:.4f} | {row['F1-score']:.4f} | {row['Training Time (DAE)']:.2f} | {row['Training Time (Model)']:.2f} |\n"
        print(markdown_table)
        with open(f'grid_search_results_{model_name.lower().replace(" ", "_")}.md', 'w', encoding='utf-8') as f:
            f.write(markdown_table)
    
    return results_df

## Chạy chương trình

Chạy hàm `main` để thực hiện toàn bộ quy trình. Đảm bảo file `combine.csv` đã có sẵn.

In [7]:
results_df = main()

  df = pd.read_csv(data_path)



Training DAE with Soft Triple Loss (latent_dim=8, K=3)...
Epoch [1/30], Total Loss: 0.2652, LR: 0.050000
Epoch [2/30], Total Loss: 0.1362, LR: 0.050000
Epoch [3/30], Total Loss: 0.1223, LR: 0.050000
Epoch [4/30], Total Loss: 0.1136, LR: 0.050000
Epoch [5/30], Total Loss: 0.1100, LR: 0.050000
Epoch [6/30], Total Loss: 0.1056, LR: 0.050000
Epoch [7/30], Total Loss: 0.1044, LR: 0.050000
Epoch [8/30], Total Loss: 0.1009, LR: 0.050000
Epoch [9/30], Total Loss: 0.0998, LR: 0.050000
Epoch [10/30], Total Loss: 0.0961, LR: 0.050000
Epoch [11/30], Total Loss: 0.0943, LR: 0.050000
Epoch [12/30], Total Loss: 0.0956, LR: 0.050000
Epoch [13/30], Total Loss: 0.0923, LR: 0.050000
Epoch [14/30], Total Loss: 0.0936, LR: 0.050000
Epoch [15/30], Total Loss: 0.0910, LR: 0.050000
Epoch [16/30], Total Loss: 0.5876, LR: 0.050000
Epoch [17/30], Total Loss: 0.1098, LR: 0.050000
Epoch [18/30], Total Loss: 0.1002, LR: 0.050000
Epoch [19/30], Total Loss: 0.0955, LR: 0.050000
Epoch [20/30], Total Loss: 0.0937, LR:

  DAE_soft_triplet.load_state_dict(torch.load(model_path, map_location=device))



Evaluating Linear Regression with latent_dim=8, K=3...
Linear Regression - Precision: 0.9668
Linear Regression - Recall: 0.9954
Linear Regression - F1-score: 0.9809
Linear Regression - Confusion Matrix:
[[104417   3693]
 [   495 107614]]
Linear Regression - Số lượng bất thường phát hiện: 111307
Linear Regression - Training time: 0.00 seconds

Evaluating Gradient Boosting with latent_dim=8, K=3...
Gradient Boosting - Precision: 0.9831
Gradient Boosting - Recall: 0.9952
Gradient Boosting - F1-score: 0.9891
Gradient Boosting - Confusion Matrix:
[[106258   1852]
 [   518 107591]]
Gradient Boosting - Số lượng bất thường phát hiện: 109443
Gradient Boosting - Training time: 0.26 seconds

Evaluating Logistic Regression with latent_dim=8, K=3...
Logistic Regression - Precision: 0.9707
Logistic Regression - Recall: 0.9947
Logistic Regression - F1-score: 0.9825
Logistic Regression - Confusion Matrix:
[[104863   3247]
 [   577 107532]]
Logistic Regression - Số lượng bất thường phát hiện: 110779
L

  DAE_soft_triplet.load_state_dict(torch.load(model_path, map_location=device))



Evaluating Linear Regression with latent_dim=8, K=5...
Linear Regression - Precision: 0.9417
Linear Regression - Recall: 0.9953
Linear Regression - F1-score: 0.9677
Linear Regression - Confusion Matrix:
[[101443   6667]
 [   513 107596]]
Linear Regression - Số lượng bất thường phát hiện: 114263
Linear Regression - Training time: 0.01 seconds

Evaluating Gradient Boosting with latent_dim=8, K=5...
Gradient Boosting - Precision: 0.9707
Gradient Boosting - Recall: 0.9958
Gradient Boosting - F1-score: 0.9831
Gradient Boosting - Confusion Matrix:
[[104863   3247]
 [   450 107659]]
Gradient Boosting - Số lượng bất thường phát hiện: 110906
Gradient Boosting - Training time: 0.25 seconds

Evaluating Logistic Regression with latent_dim=8, K=5...
Logistic Regression - Precision: 0.9456
Logistic Regression - Recall: 0.9922
Logistic Regression - F1-score: 0.9683
Logistic Regression - Confusion Matrix:
[[101933   6177]
 [   838 107271]]
Logistic Regression - Số lượng bất thường phát hiện: 113448
L

  DAE_soft_triplet.load_state_dict(torch.load(model_path, map_location=device))



Evaluating Linear Regression with latent_dim=8, K=7...
Linear Regression - Precision: 0.9752
Linear Regression - Recall: 0.9962
Linear Regression - F1-score: 0.9856
Linear Regression - Confusion Matrix:
[[105368   2742]
 [   415 107694]]
Linear Regression - Số lượng bất thường phát hiện: 110436
Linear Regression - Training time: 0.00 seconds

Evaluating Gradient Boosting with latent_dim=8, K=7...
Gradient Boosting - Precision: 0.9896
Gradient Boosting - Recall: 0.9970
Gradient Boosting - F1-score: 0.9932
Gradient Boosting - Confusion Matrix:
[[106972   1138]
 [   328 107781]]
Gradient Boosting - Số lượng bất thường phát hiện: 108919
Gradient Boosting - Training time: 0.31 seconds

Evaluating Logistic Regression with latent_dim=8, K=7...
Logistic Regression - Precision: 0.9770
Logistic Regression - Recall: 0.9941
Logistic Regression - F1-score: 0.9855
Logistic Regression - Confusion Matrix:
[[105578   2532]
 [   639 107470]]
Logistic Regression - Số lượng bất thường phát hiện: 110002
L

  DAE_soft_triplet.load_state_dict(torch.load(model_path, map_location=device))



Evaluating Linear Regression with latent_dim=8, K=11...
Linear Regression - Precision: 0.9729
Linear Regression - Recall: 0.9665
Linear Regression - F1-score: 0.9697
Linear Regression - Confusion Matrix:
[[105201   2909]
 [  3625 104484]]
Linear Regression - Số lượng bất thường phát hiện: 107393
Linear Regression - Training time: 0.00 seconds

Evaluating Gradient Boosting with latent_dim=8, K=11...
Gradient Boosting - Precision: 0.9863
Gradient Boosting - Recall: 0.9964
Gradient Boosting - F1-score: 0.9913
Gradient Boosting - Confusion Matrix:
[[106612   1498]
 [   390 107719]]
Gradient Boosting - Số lượng bất thường phát hiện: 109217
Gradient Boosting - Training time: 0.27 seconds

Evaluating Logistic Regression with latent_dim=8, K=11...
Logistic Regression - Precision: 0.9783
Logistic Regression - Recall: 0.9954
Logistic Regression - F1-score: 0.9868
Logistic Regression - Confusion Matrix:
[[105719   2391]
 [   497 107612]]
Logistic Regression - Số lượng bất thường phát hiện: 11000

  DAE_soft_triplet.load_state_dict(torch.load(model_path, map_location=device))



Evaluating Linear Regression with latent_dim=16, K=3...
Linear Regression - Precision: 0.9741
Linear Regression - Recall: 0.9944
Linear Regression - F1-score: 0.9842
Linear Regression - Confusion Matrix:
[[105254   2856]
 [   605 107504]]
Linear Regression - Số lượng bất thường phát hiện: 110360
Linear Regression - Training time: 0.00 seconds

Evaluating Gradient Boosting with latent_dim=16, K=3...
Gradient Boosting - Precision: 0.9866
Gradient Boosting - Recall: 0.9968
Gradient Boosting - F1-score: 0.9917
Gradient Boosting - Confusion Matrix:
[[106649   1461]
 [   346 107763]]
Gradient Boosting - Số lượng bất thường phát hiện: 109224
Gradient Boosting - Training time: 0.30 seconds

Evaluating Logistic Regression with latent_dim=16, K=3...
Logistic Regression - Precision: 0.9739
Logistic Regression - Recall: 0.9958
Logistic Regression - F1-score: 0.9847
Logistic Regression - Confusion Matrix:
[[105223   2887]
 [   450 107659]]
Logistic Regression - Số lượng bất thường phát hiện: 11054

  DAE_soft_triplet.load_state_dict(torch.load(model_path, map_location=device))



Evaluating Linear Regression with latent_dim=16, K=5...
Linear Regression - Precision: 0.9779
Linear Regression - Recall: 0.9726
Linear Regression - F1-score: 0.9753
Linear Regression - Confusion Matrix:
[[105738   2372]
 [  2961 105148]]
Linear Regression - Số lượng bất thường phát hiện: 107520
Linear Regression - Training time: 0.02 seconds

Evaluating Gradient Boosting with latent_dim=16, K=5...
Gradient Boosting - Precision: 0.9898
Gradient Boosting - Recall: 0.9967
Gradient Boosting - F1-score: 0.9932
Gradient Boosting - Confusion Matrix:
[[107005   1105]
 [   362 107747]]
Gradient Boosting - Số lượng bất thường phát hiện: 108852
Gradient Boosting - Training time: 0.30 seconds

Evaluating Logistic Regression with latent_dim=16, K=5...
Logistic Regression - Precision: 0.9773
Logistic Regression - Recall: 0.9942
Logistic Regression - F1-score: 0.9857
Logistic Regression - Confusion Matrix:
[[105615   2495]
 [   630 107479]]
Logistic Regression - Số lượng bất thường phát hiện: 10997

  DAE_soft_triplet.load_state_dict(torch.load(model_path, map_location=device))



Evaluating Linear Regression with latent_dim=16, K=7...
Linear Regression - Precision: 0.9635
Linear Regression - Recall: 0.9062
Linear Regression - F1-score: 0.9340
Linear Regression - Confusion Matrix:
[[104396   3714]
 [ 10138  97971]]
Linear Regression - Số lượng bất thường phát hiện: 101685
Linear Regression - Training time: 0.01 seconds

Evaluating Gradient Boosting with latent_dim=16, K=7...
Gradient Boosting - Precision: 0.9771
Gradient Boosting - Recall: 0.9960
Gradient Boosting - F1-score: 0.9864
Gradient Boosting - Confusion Matrix:
[[105584   2526]
 [   436 107673]]
Gradient Boosting - Số lượng bất thường phát hiện: 110199
Gradient Boosting - Training time: 0.30 seconds

Evaluating Logistic Regression with latent_dim=16, K=7...


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


Logistic Regression - Precision: 0.9653
Logistic Regression - Recall: 0.9709
Logistic Regression - F1-score: 0.9681
Logistic Regression - Confusion Matrix:
[[104336   3774]
 [  3150 104959]]
Logistic Regression - Số lượng bất thường phát hiện: 108733
Logistic Regression - Training time: 0.00 seconds

Evaluating Decision Tree with latent_dim=16, K=7...
Decision Tree - Precision: 0.9975
Decision Tree - Recall: 0.9982
Decision Tree - F1-score: 0.9978
Decision Tree - Confusion Matrix:
[[107836    274]
 [   193 107916]]
Decision Tree - Số lượng bất thường phát hiện: 108190
Decision Tree - Training time: 0.02 seconds

Training DAE with Soft Triple Loss (latent_dim=16, K=11)...
Epoch [1/30], Total Loss: 0.2533, LR: 0.050000
Epoch [2/30], Total Loss: 0.1202, LR: 0.050000
Epoch [3/30], Total Loss: 0.1124, LR: 0.050000
Epoch [4/30], Total Loss: 0.1072, LR: 0.050000
Epoch [5/30], Total Loss: 0.1015, LR: 0.050000
Epoch [6/30], Total Loss: 0.0988, LR: 0.050000
Epoch [7/30], Total Loss: 0.0936, LR: 

  DAE_soft_triplet.load_state_dict(torch.load(model_path, map_location=device))



Evaluating Linear Regression with latent_dim=16, K=11...
Linear Regression - Precision: 0.9734
Linear Regression - Recall: 0.9957
Linear Regression - F1-score: 0.9844
Linear Regression - Confusion Matrix:
[[105174   2936]
 [   468 107641]]
Linear Regression - Số lượng bất thường phát hiện: 110577
Linear Regression - Training time: 0.00 seconds

Evaluating Gradient Boosting with latent_dim=16, K=11...
Gradient Boosting - Precision: 0.9868
Gradient Boosting - Recall: 0.9965
Gradient Boosting - F1-score: 0.9916
Gradient Boosting - Confusion Matrix:
[[106667   1443]
 [   376 107733]]
Gradient Boosting - Số lượng bất thường phát hiện: 109176
Gradient Boosting - Training time: 0.25 seconds

Evaluating Logistic Regression with latent_dim=16, K=11...
Logistic Regression - Precision: 0.9749
Logistic Regression - Recall: 0.9952
Logistic Regression - F1-score: 0.9849
Logistic Regression - Confusion Matrix:
[[105336   2774]
 [   522 107587]]
Logistic Regression - Số lượng bất thường phát hiện: 11

  DAE_soft_triplet.load_state_dict(torch.load(model_path, map_location=device))



Evaluating Linear Regression with latent_dim=32, K=3...
Linear Regression - Precision: 0.9750
Linear Regression - Recall: 0.9954
Linear Regression - F1-score: 0.9851
Linear Regression - Confusion Matrix:
[[105352   2758]
 [   492 107617]]
Linear Regression - Số lượng bất thường phát hiện: 110375
Linear Regression - Training time: 0.00 seconds

Evaluating Gradient Boosting with latent_dim=32, K=3...
Gradient Boosting - Precision: 0.9874
Gradient Boosting - Recall: 0.9956
Gradient Boosting - F1-score: 0.9915
Gradient Boosting - Confusion Matrix:
[[106733   1377]
 [   478 107631]]
Gradient Boosting - Số lượng bất thường phát hiện: 109008
Gradient Boosting - Training time: 0.30 seconds

Evaluating Logistic Regression with latent_dim=32, K=3...
Logistic Regression - Precision: 0.9758
Logistic Regression - Recall: 0.9957
Logistic Regression - F1-score: 0.9856
Logistic Regression - Confusion Matrix:
[[105438   2672]
 [   467 107642]]
Logistic Regression - Số lượng bất thường phát hiện: 11031

  DAE_soft_triplet.load_state_dict(torch.load(model_path, map_location=device))



Evaluating Linear Regression with latent_dim=32, K=5...
Linear Regression - Precision: 0.9723
Linear Regression - Recall: 0.9931
Linear Regression - F1-score: 0.9826
Linear Regression - Confusion Matrix:
[[105051   3059]
 [   746 107363]]
Linear Regression - Số lượng bất thường phát hiện: 110422
Linear Regression - Training time: 0.02 seconds

Evaluating Gradient Boosting with latent_dim=32, K=5...
Gradient Boosting - Precision: 0.9888
Gradient Boosting - Recall: 0.9968
Gradient Boosting - F1-score: 0.9928
Gradient Boosting - Confusion Matrix:
[[106890   1220]
 [   342 107767]]
Gradient Boosting - Số lượng bất thường phát hiện: 108987
Gradient Boosting - Training time: 0.32 seconds

Evaluating Logistic Regression with latent_dim=32, K=5...
Logistic Regression - Precision: 0.9741
Logistic Regression - Recall: 0.9969
Logistic Regression - F1-score: 0.9854
Logistic Regression - Confusion Matrix:
[[105246   2864]
 [   334 107775]]
Logistic Regression - Số lượng bất thường phát hiện: 11063

  DAE_soft_triplet.load_state_dict(torch.load(model_path, map_location=device))



Evaluating Linear Regression with latent_dim=32, K=7...
Linear Regression - Precision: 0.9734
Linear Regression - Recall: 0.9915
Linear Regression - F1-score: 0.9823
Linear Regression - Confusion Matrix:
[[105178   2932]
 [   923 107186]]
Linear Regression - Số lượng bất thường phát hiện: 110118
Linear Regression - Training time: 0.01 seconds

Evaluating Gradient Boosting with latent_dim=32, K=7...
Gradient Boosting - Precision: 0.9880
Gradient Boosting - Recall: 0.9960
Gradient Boosting - F1-score: 0.9919
Gradient Boosting - Confusion Matrix:
[[106799   1311]
 [   437 107672]]
Gradient Boosting - Số lượng bất thường phát hiện: 108983
Gradient Boosting - Training time: 0.33 seconds

Evaluating Logistic Regression with latent_dim=32, K=7...
Logistic Regression - Precision: 0.9754
Logistic Regression - Recall: 0.9918
Logistic Regression - F1-score: 0.9835
Logistic Regression - Confusion Matrix:
[[105403   2707]
 [   891 107218]]
Logistic Regression - Số lượng bất thường phát hiện: 10992

  DAE_soft_triplet.load_state_dict(torch.load(model_path, map_location=device))



Evaluating Linear Regression with latent_dim=32, K=11...
Linear Regression - Precision: 0.9747
Linear Regression - Recall: 0.9966
Linear Regression - F1-score: 0.9855
Linear Regression - Confusion Matrix:
[[105308   2802]
 [   364 107745]]
Linear Regression - Số lượng bất thường phát hiện: 110547
Linear Regression - Training time: 0.00 seconds

Evaluating Gradient Boosting with latent_dim=32, K=11...
Gradient Boosting - Precision: 0.9882
Gradient Boosting - Recall: 0.9970
Gradient Boosting - F1-score: 0.9926
Gradient Boosting - Confusion Matrix:
[[106820   1290]
 [   326 107783]]
Gradient Boosting - Số lượng bất thường phát hiện: 109073
Gradient Boosting - Training time: 0.30 seconds

Evaluating Logistic Regression with latent_dim=32, K=11...


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


Logistic Regression - Precision: 0.9753
Logistic Regression - Recall: 0.9963
Logistic Regression - F1-score: 0.9857
Logistic Regression - Confusion Matrix:
[[105386   2724]
 [   402 107707]]
Logistic Regression - Số lượng bất thường phát hiện: 110431
Logistic Regression - Training time: 0.03 seconds

Evaluating Decision Tree with latent_dim=32, K=11...
Decision Tree - Precision: 0.9977
Decision Tree - Recall: 0.9986
Decision Tree - F1-score: 0.9982
Decision Tree - Confusion Matrix:
[[107865    245]
 [   153 107956]]
Decision Tree - Số lượng bất thường phát hiện: 108201
Decision Tree - Training time: 0.02 seconds
# Kết quả Grid Search cho Linear Regression

| Latent Dim | K | Precision | Recall | F1-score | Thời gian huấn luyện DAE (s) | Thời gian huấn luyện Model (s) |
|------------|---|-----------|--------|----------|-----------------------------|-------------------------------|
| 8 | 3 | 0.9668 | 0.9954 | 0.9809 | 1604.09 | 0.12 |
| 8 | 5 | 0.9417 | 0.9953 | 0.9677 | 1676.76 | 0.09 |