Skip to content

muntakim1/trafficmaml-framework

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TrafficMAML — A Self-Resilient Meta-Learning Framework for Few-Shot Network-Traffic Classification

Public architecture overview. This repository documents the research-level architecture of TrafficMAML and showcases the system. The source code is withheld pending patent and copyright clearance — see Availability. The directory tree below mirrors the private implementation so the organization is transparent even though the code is not published here.

TrafficMAML pipeline animation

The animation walks one flow through the pipeline under each operating scenario — in-domain inference, cross-domain routing, zero-day detection, and drift detect-and-recover.


Project context

This work is conducted under the thesis "Deep Packet Inspection for Application Classification Enforcement and Advanced Threat Detection," a research project funded by Telekom Malaysia (TM). TrafficMAML is the framework we propose in fulfillment of that thesis — a meta-learned, self-resilient approach to few-shot traffic classification, novel-class (zero-day) threat detection, and drift resilience for deep-packet-inspection deployments.


1. Problem & thesis

Task. Few-shot network-traffic classification that stays usable as conditions change: new protocols, new domains, novel (zero-day) classes, and distribution drift over time.

Framing (the novelty). The contribution is a unified self-resilient framework, not a single model. One meta-learned backbone supports a four-mechanism lifecycle:

Mechanism What it does Component
Adapt few-shot transfer to a new protocol/domain inner-loop adaptation + adapters + sparse episodic transformer
Detect (drift) flag covariate distribution shift MMD drift detector
Detect (zero-day) flag novel classes dual-readout Stage-1 kNN + prototype registry
Retain retrain without forgetting GPM gradient projection (+ replay buffer)

A central empirical finding ties it together: the sparse cross-attention that powers adaptation actively suppresses novelty detectability, resolved by a dual readout — detection is performed on the pre-cross-attention representation.


2. Datasets

All loaded with leakage-controlled 70/30 stratified train/test splits (manifests in splits/, scaler fit on train rows only). Zero-day classes are held out entirely from training per dataset.

Dataset Rows Classes Role
CESNET-TLS22 12,504 48 cross-protocol source/target
CESNET-QUIC22 14,705 48 (same as TLS) cross-protocol source/target
CICIoT2023 2,400 6 main IoT training set (small)
RT-IoT2022 123,117 → ≥200/class → 10 10 cross-domain eval target

Evaluation scenarios: cross-protocol (TLS↔QUIC), cross-domain (CICIoT→RT-IoT), zero-day detection (held-out classes), and covariate/segment drift (chronological slices of the TLS test set).


3. Model architecture

Meta-learned (MAML bi-level) few-shot classifier. Episodic: every forward pass processes a support set (labeled few-shot examples) and a query set.

flow feature vector (per-domain dim)
      │
      ▼
[ input projection ]    domain-specific: Linear → LayerNorm, one per dataset
      │                 (TLS, QUIC, CICIoT, RT-IoT each have their own)
      ▼
[ shared backbone ]     SHARED across domains — the transferable core
      │                 2× (Linear(64→64) + GELU)
      ▼
[ adapter ]             domain-specific low-rank bottleneck (64→16→64, residual)
      │
      ▼  reshape 64-d → 4 tokens × 16-d
[ Sparse Episodic Transformer ]
      │   Stage 1: intra-flow self-attention (tokens within a flow)
      │   Stage 2: inter-flow sparse cross-attention (query → top-k support)
      ▼
   embedding → nearest-prototype classification

Design principle: domain-specific input/adapter shells, a shared backbone in the middle. Transfer happens because the heavy representation is shared; each domain only gets a thin projection plus a low-rank adapter.

3.1 Sparse Episodic Transformer (the distinctive module)

  • Stage 1 — intra-flow self-attention. The tokens of a single flow attend to each other (residual + LayerNorm), shared between support and query. This is the OOD-detection readout (see §5.2).
  • Stage 2 — inter-flow sparse cross-attention. Each query flow attends only to its top-k nearest support flows (hard top-k mask before softmax). This is the few-shot matching metric that drives cross-protocol transfer.

3.2 Prototype classification & registry

Class prototypes are the mean of support embeddings per class; queries are classified by nearest prototype. An adaptive per-class OOD-radius registry supports novelty scoring.

3.3 Shift-gated routing (resilience path)

A lightweight shift gate measures distance to the training-domain manifold and routes accordingly: in-domain flows traverse the full Stage-1 → Stage-2 → prototype path, while out-of-domain flows skip Stage-2 and read out from Stage-1, recovering cross-domain accuracy that Stage-2 otherwise erodes.


4. Training

  • Inner loop: adapt the projection + adapter "fast weights" to an episode's support set. The shared backbone is not updated in the inner loop.
  • Outer loop: Adam over the encoder + transformer, learning an initialization that adapts well across episodes.
  • Episode mix: alternation between cross-protocol episodes (TLS/QUIC) and in-domain CICIoT episodes.
  • Loss: query cross-entropy + supervised-contrastive structuring + cross-protocol prototype alignment (+ rehearsal penalty in the resilience variant).
  • Toggleable mechanisms (ablation knobs): adapters, GPM gradient projection (anti-forgetting), supervised contrastive, and prototype alignment.

5. Evaluation

5.1 Few-shot accuracy

Adapt on support (inner loop), classify the query by nearest prototype. Reported as N-way K-shot. All evaluation uses test partitions only.

5.2 Zero-day OOD — dual readout (key mechanism)

Stage-2 cross-attention blends each query with its nearest support flows, which assimilates novel queries into the known manifold and erases the novelty signal. The fix: score OOD on Stage-1 embeddings (before cross-attention) using kNN distance normalized per episode, with the threshold calibrated on a train-only pseudo-novel split (no test leakage).

5.3 Drift detect-and-recover

An MMD detector flags covariate drift on a streaming test partition; on detection the model re-adapts on fresh support, recovering accuracy versus a no-response control.

5.4 Retain / forgetting

Sequential continual learning with GPM gradient projection as the working retain mechanism, evaluated for backward transfer (retention) and plasticity.


6. Headline results

Reference production run, 5 seeds, held-out test partitions.

  • Cross-protocol (Adapt) — confirmed positive. ≈ 70.8% ± 4.0 (5-shot) vs ProtoNet ≈ 57% vs 20% chance.
  • Cross-domain (Adapt) — honest negative + benchmark caveat. A random-init model matches the trained one on RT-IoT and a shortcut audit shows ports alone reach 95.6%; the benchmark is shortcut-saturated. Meta-training's benefit is protocol-bound, not domain-general.
  • Zero-day (Detect) — dual readout works. Stage-1 kNN AUROC ≈ 0.76–0.88 vs legacy post-attention 0.33–0.44 (below chance).
  • Drift (Detect+Respond) — loop validated. Clean 80% → drift-without-response 47% → re-adapt 57% (+11 pts recovery).
  • Retain — GPM stabilizes retraining; rehearsal/replay is a documented negative in this regime.

7. Repository structure

The folders below mirror the private implementation. They are intentionally empty here (placeholders) because the source is withheld; the map documents how the system is organized.

trafficmaml/            core package
  config.py             paths, hyperparameters, dataset loaders
  dataset.py            episodic sampler, temporal slices
  split_generator.py    leakage-controlled split manifests
  models.py             encoder, Sparse Episodic Transformer, prototype registry
  maml.py               bi-level meta-learning, GPM gradient projection
  engine.py             base build / train / eval
  zeroday_eval.py       dual-readout OOD evaluation
  shortcut_audit.py     feature-shortcut diagnostic
  ood_diagnosis.py      OOD scorer / embedding diagnosis
  baselines*.py         baseline models
  resilience/           drift + continual-learning layer (La-MAML, MMD, replay)
  adaptive/             adaptive attention + shift-gated routing (experimental)

pipeline/               experiment orchestration
  run_pipeline.py       multiseed entry point
  flows/                Prefect flows
  tasks/                train / evaluate / stats / plots / report

scripts/                standalone utilities (e.g. runtime benchmarks)
splits/                 leakage-controlled split manifests (JSON)
results/                metrics + figures
graphs/                 generated figures
docs/                   design notes
papers/                 IEEE Access + Elsevier Computer Networks drafts
thesis/                 thesis chapters
demo/                   interactive pipeline animation + GNS3 closed-loop design

8. Two-paper program

Two distinct papers stem from the one framework (non-overlapping metrics):

  • IEEE Access (adaptation): meta-adaptation is protocol-bound, not domain-general; the shortcut-audit methodology; the sparse episodic transformer.
  • Elsevier Computer Networks (resilience): the dual-readout adaptation–detection tension (crown-jewel novelty); the detect → re-adapt → GPM-retrain self-healing loop; a GNS3 firewall-enforcement demo.

Availability & licensing

The source code, trained checkpoints, and raw datasets are not included in this public repository. They are withheld pending patent and copyright clearance.

This repository (the architecture documentation, figures, and demo asset) is shared for research dissemination and review only. All rights reserved. Reuse, redistribution, or derivative works require prior written permission.

For collaboration, access requests, or licensing inquiries, please open an issue or contact the author.

Citation

If you reference this work, please cite the associated papers (IEEE Access / Elsevier Computer Networks — in preparation). A CITATION.cff will be added on publication.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors