Public architecture overview. This repository documents the research-level architecture of TrafficMAML and showcases the system. The source code is withheld pending patent and copyright clearance — see Availability. The directory tree below mirrors the private implementation so the organization is transparent even though the code is not published here.
The animation walks one flow through the pipeline under each operating scenario — in-domain inference, cross-domain routing, zero-day detection, and drift detect-and-recover.
This work is conducted under the thesis "Deep Packet Inspection for Application Classification Enforcement and Advanced Threat Detection," a research project funded by Telekom Malaysia (TM). TrafficMAML is the framework we propose in fulfillment of that thesis — a meta-learned, self-resilient approach to few-shot traffic classification, novel-class (zero-day) threat detection, and drift resilience for deep-packet-inspection deployments.
Task. Few-shot network-traffic classification that stays usable as conditions change: new protocols, new domains, novel (zero-day) classes, and distribution drift over time.
Framing (the novelty). The contribution is a unified self-resilient framework, not a single model. One meta-learned backbone supports a four-mechanism lifecycle:
| Mechanism | What it does | Component |
|---|---|---|
| Adapt | few-shot transfer to a new protocol/domain | inner-loop adaptation + adapters + sparse episodic transformer |
| Detect (drift) | flag covariate distribution shift | MMD drift detector |
| Detect (zero-day) | flag novel classes | dual-readout Stage-1 kNN + prototype registry |
| Retain | retrain without forgetting | GPM gradient projection (+ replay buffer) |
A central empirical finding ties it together: the sparse cross-attention that powers adaptation actively suppresses novelty detectability, resolved by a dual readout — detection is performed on the pre-cross-attention representation.
All loaded with leakage-controlled 70/30 stratified train/test splits (manifests
in splits/, scaler fit on train rows only). Zero-day classes are held out
entirely from training per dataset.
| Dataset | Rows | Classes | Role |
|---|---|---|---|
| CESNET-TLS22 | 12,504 | 48 | cross-protocol source/target |
| CESNET-QUIC22 | 14,705 | 48 (same as TLS) | cross-protocol source/target |
| CICIoT2023 | 2,400 | 6 | main IoT training set (small) |
| RT-IoT2022 | 123,117 → ≥200/class → 10 | 10 | cross-domain eval target |
Evaluation scenarios: cross-protocol (TLS↔QUIC), cross-domain (CICIoT→RT-IoT), zero-day detection (held-out classes), and covariate/segment drift (chronological slices of the TLS test set).
Meta-learned (MAML bi-level) few-shot classifier. Episodic: every forward pass processes a support set (labeled few-shot examples) and a query set.
flow feature vector (per-domain dim)
│
▼
[ input projection ] domain-specific: Linear → LayerNorm, one per dataset
│ (TLS, QUIC, CICIoT, RT-IoT each have their own)
▼
[ shared backbone ] SHARED across domains — the transferable core
│ 2× (Linear(64→64) + GELU)
▼
[ adapter ] domain-specific low-rank bottleneck (64→16→64, residual)
│
▼ reshape 64-d → 4 tokens × 16-d
[ Sparse Episodic Transformer ]
│ Stage 1: intra-flow self-attention (tokens within a flow)
│ Stage 2: inter-flow sparse cross-attention (query → top-k support)
▼
embedding → nearest-prototype classification
Design principle: domain-specific input/adapter shells, a shared backbone in the middle. Transfer happens because the heavy representation is shared; each domain only gets a thin projection plus a low-rank adapter.
- Stage 1 — intra-flow self-attention. The tokens of a single flow attend to each other (residual + LayerNorm), shared between support and query. This is the OOD-detection readout (see §5.2).
- Stage 2 — inter-flow sparse cross-attention. Each query flow attends only to its top-k nearest support flows (hard top-k mask before softmax). This is the few-shot matching metric that drives cross-protocol transfer.
Class prototypes are the mean of support embeddings per class; queries are classified by nearest prototype. An adaptive per-class OOD-radius registry supports novelty scoring.
A lightweight shift gate measures distance to the training-domain manifold and routes accordingly: in-domain flows traverse the full Stage-1 → Stage-2 → prototype path, while out-of-domain flows skip Stage-2 and read out from Stage-1, recovering cross-domain accuracy that Stage-2 otherwise erodes.
- Inner loop: adapt the projection + adapter "fast weights" to an episode's support set. The shared backbone is not updated in the inner loop.
- Outer loop: Adam over the encoder + transformer, learning an initialization that adapts well across episodes.
- Episode mix: alternation between cross-protocol episodes (TLS/QUIC) and in-domain CICIoT episodes.
- Loss: query cross-entropy + supervised-contrastive structuring + cross-protocol prototype alignment (+ rehearsal penalty in the resilience variant).
- Toggleable mechanisms (ablation knobs): adapters, GPM gradient projection (anti-forgetting), supervised contrastive, and prototype alignment.
Adapt on support (inner loop), classify the query by nearest prototype. Reported as N-way K-shot. All evaluation uses test partitions only.
Stage-2 cross-attention blends each query with its nearest support flows, which assimilates novel queries into the known manifold and erases the novelty signal. The fix: score OOD on Stage-1 embeddings (before cross-attention) using kNN distance normalized per episode, with the threshold calibrated on a train-only pseudo-novel split (no test leakage).
An MMD detector flags covariate drift on a streaming test partition; on detection the model re-adapts on fresh support, recovering accuracy versus a no-response control.
Sequential continual learning with GPM gradient projection as the working retain mechanism, evaluated for backward transfer (retention) and plasticity.
Reference production run, 5 seeds, held-out test partitions.
- Cross-protocol (Adapt) — confirmed positive. ≈ 70.8% ± 4.0 (5-shot) vs ProtoNet ≈ 57% vs 20% chance.
- Cross-domain (Adapt) — honest negative + benchmark caveat. A random-init model matches the trained one on RT-IoT and a shortcut audit shows ports alone reach 95.6%; the benchmark is shortcut-saturated. Meta-training's benefit is protocol-bound, not domain-general.
- Zero-day (Detect) — dual readout works. Stage-1 kNN AUROC ≈ 0.76–0.88 vs legacy post-attention 0.33–0.44 (below chance).
- Drift (Detect+Respond) — loop validated. Clean 80% → drift-without-response 47% → re-adapt 57% (+11 pts recovery).
- Retain — GPM stabilizes retraining; rehearsal/replay is a documented negative in this regime.
The folders below mirror the private implementation. They are intentionally empty here (placeholders) because the source is withheld; the map documents how the system is organized.
trafficmaml/ core package
config.py paths, hyperparameters, dataset loaders
dataset.py episodic sampler, temporal slices
split_generator.py leakage-controlled split manifests
models.py encoder, Sparse Episodic Transformer, prototype registry
maml.py bi-level meta-learning, GPM gradient projection
engine.py base build / train / eval
zeroday_eval.py dual-readout OOD evaluation
shortcut_audit.py feature-shortcut diagnostic
ood_diagnosis.py OOD scorer / embedding diagnosis
baselines*.py baseline models
resilience/ drift + continual-learning layer (La-MAML, MMD, replay)
adaptive/ adaptive attention + shift-gated routing (experimental)
pipeline/ experiment orchestration
run_pipeline.py multiseed entry point
flows/ Prefect flows
tasks/ train / evaluate / stats / plots / report
scripts/ standalone utilities (e.g. runtime benchmarks)
splits/ leakage-controlled split manifests (JSON)
results/ metrics + figures
graphs/ generated figures
docs/ design notes
papers/ IEEE Access + Elsevier Computer Networks drafts
thesis/ thesis chapters
demo/ interactive pipeline animation + GNS3 closed-loop design
Two distinct papers stem from the one framework (non-overlapping metrics):
- IEEE Access (adaptation): meta-adaptation is protocol-bound, not domain-general; the shortcut-audit methodology; the sparse episodic transformer.
- Elsevier Computer Networks (resilience): the dual-readout adaptation–detection tension (crown-jewel novelty); the detect → re-adapt → GPM-retrain self-healing loop; a GNS3 firewall-enforcement demo.
The source code, trained checkpoints, and raw datasets are not included in this public repository. They are withheld pending patent and copyright clearance.
This repository (the architecture documentation, figures, and demo asset) is shared for research dissemination and review only. All rights reserved. Reuse, redistribution, or derivative works require prior written permission.
For collaboration, access requests, or licensing inquiries, please open an issue or contact the author.
If you reference this work, please cite the associated papers (IEEE Access /
Elsevier Computer Networks — in preparation). A CITATION.cff will be added on
publication.
