TrafficMAML — A Self-Resilient Meta-Learning Framework for Few-Shot Network-Traffic Classification

Public architecture overview. This repository documents the research-level architecture of TrafficMAML and showcases the system. The source code is withheld pending patent and copyright clearance — see Availability. The directory tree below mirrors the private implementation so the organization is transparent even though the code is not published here.

The animation walks one flow through the pipeline under each operating scenario — in-domain inference, cross-domain routing, zero-day detection, and drift detect-and-recover.

Project context

This work is conducted under the thesis "Deep Packet Inspection for Application Classification Enforcement and Advanced Threat Detection," a research project funded by Telekom Malaysia (TM). TrafficMAML is the framework we propose in fulfillment of that thesis — a meta-learned, self-resilient approach to few-shot traffic classification, novel-class (zero-day) threat detection, and drift resilience for deep-packet-inspection deployments.

1. Problem & thesis

Task. Few-shot network-traffic classification that stays usable as conditions change: new protocols, new domains, novel (zero-day) classes, and distribution drift over time.

Framing (the novelty). The contribution is a unified self-resilient framework, not a single model. One meta-learned backbone supports a four-mechanism lifecycle:

Mechanism	What it does	Component
Adapt	few-shot transfer to a new protocol/domain	inner-loop adaptation + adapters + sparse episodic transformer
Detect (drift)	flag covariate distribution shift	MMD drift detector
Detect (zero-day)	flag novel classes	dual-readout Stage-1 kNN + prototype registry
Retain	retrain without forgetting	GPM gradient projection (+ replay buffer)

A central empirical finding ties it together: the sparse cross-attention that powers adaptation actively suppresses novelty detectability, resolved by a dual readout — detection is performed on the pre-cross-attention representation.

2. Datasets

All loaded with leakage-controlled 70/30 stratified train/test splits (manifests in splits/, scaler fit on train rows only). Zero-day classes are held out entirely from training per dataset.

Dataset	Rows	Classes	Role
CESNET-TLS22	12,504	48	cross-protocol source/target
CESNET-QUIC22	14,705	48 (same as TLS)	cross-protocol source/target
CICIoT2023	2,400	6	main IoT training set (small)
RT-IoT2022	123,117 → ≥200/class → 10	10	cross-domain eval target

Evaluation scenarios: cross-protocol (TLS↔QUIC), cross-domain (CICIoT→RT-IoT), zero-day detection (held-out classes), and covariate/segment drift (chronological slices of the TLS test set).

3. Model architecture

Meta-learned (MAML bi-level) few-shot classifier. Episodic: every forward pass processes a support set (labeled few-shot examples) and a query set.

flow feature vector (per-domain dim)
      │
      ▼
[ input projection ]    domain-specific: Linear → LayerNorm, one per dataset
      │                 (TLS, QUIC, CICIoT, RT-IoT each have their own)
      ▼
[ shared backbone ]     SHARED across domains — the transferable core
      │                 2× (Linear(64→64) + GELU)
      ▼
[ adapter ]             domain-specific low-rank bottleneck (64→16→64, residual)
      │
      ▼  reshape 64-d → 4 tokens × 16-d
[ Sparse Episodic Transformer ]
      │   Stage 1: intra-flow self-attention (tokens within a flow)
      │   Stage 2: inter-flow sparse cross-attention (query → top-k support)
      ▼
   embedding → nearest-prototype classification

Design principle: domain-specific input/adapter shells, a shared backbone in the middle. Transfer happens because the heavy representation is shared; each domain only gets a thin projection plus a low-rank adapter.

3.1 Sparse Episodic Transformer (the distinctive module)

Stage 1 — intra-flow self-attention. The tokens of a single flow attend to each other (residual + LayerNorm), shared between support and query. This is the OOD-detection readout (see §5.2).
Stage 2 — inter-flow sparse cross-attention. Each query flow attends only to its top-k nearest support flows (hard top-k mask before softmax). This is the few-shot matching metric that drives cross-protocol transfer.

3.2 Prototype classification & registry

Class prototypes are the mean of support embeddings per class; queries are classified by nearest prototype. An adaptive per-class OOD-radius registry supports novelty scoring.

3.3 Shift-gated routing (resilience path)

A lightweight shift gate measures distance to the training-domain manifold and routes accordingly: in-domain flows traverse the full Stage-1 → Stage-2 → prototype path, while out-of-domain flows skip Stage-2 and read out from Stage-1, recovering cross-domain accuracy that Stage-2 otherwise erodes.

4. Training

Inner loop: adapt the projection + adapter "fast weights" to an episode's support set. The shared backbone is not updated in the inner loop.
Outer loop: Adam over the encoder + transformer, learning an initialization that adapts well across episodes.
Episode mix: alternation between cross-protocol episodes (TLS/QUIC) and in-domain CICIoT episodes.
Loss: query cross-entropy + supervised-contrastive structuring + cross-protocol prototype alignment (+ rehearsal penalty in the resilience variant).
Toggleable mechanisms (ablation knobs): adapters, GPM gradient projection (anti-forgetting), supervised contrastive, and prototype alignment.

5. Evaluation

5.1 Few-shot accuracy

Adapt on support (inner loop), classify the query by nearest prototype. Reported as N-way K-shot. All evaluation uses test partitions only.

5.2 Zero-day OOD — dual readout (key mechanism)

Stage-2 cross-attention blends each query with its nearest support flows, which assimilates novel queries into the known manifold and erases the novelty signal. The fix: score OOD on Stage-1 embeddings (before cross-attention) using kNN distance normalized per episode, with the threshold calibrated on a train-only pseudo-novel split (no test leakage).

5.3 Drift detect-and-recover

An MMD detector flags covariate drift on a streaming test partition; on detection the model re-adapts on fresh support, recovering accuracy versus a no-response control.

5.4 Retain / forgetting

Sequential continual learning with GPM gradient projection as the working retain mechanism, evaluated for backward transfer (retention) and plasticity.

6. Headline results

Reference production run, 5 seeds, held-out test partitions.

Cross-protocol (Adapt) — confirmed positive. ≈ 70.8% ± 4.0 (5-shot) vs ProtoNet ≈ 57% vs 20% chance.
Cross-domain (Adapt) — honest negative + benchmark caveat. A random-init model matches the trained one on RT-IoT and a shortcut audit shows ports alone reach 95.6%; the benchmark is shortcut-saturated. Meta-training's benefit is protocol-bound, not domain-general.
Zero-day (Detect) — dual readout works. Stage-1 kNN AUROC ≈ 0.76–0.88 vs legacy post-attention 0.33–0.44 (below chance).
Drift (Detect+Respond) — loop validated. Clean 80% → drift-without-response 47% → re-adapt 57% (+11 pts recovery).
Retain — GPM stabilizes retraining; rehearsal/replay is a documented negative in this regime.

7. Repository structure

The folders below mirror the private implementation. They are intentionally empty here (placeholders) because the source is withheld; the map documents how the system is organized.

trafficmaml/            core package
  config.py             paths, hyperparameters, dataset loaders
  dataset.py            episodic sampler, temporal slices
  split_generator.py    leakage-controlled split manifests
  models.py             encoder, Sparse Episodic Transformer, prototype registry
  maml.py               bi-level meta-learning, GPM gradient projection
  engine.py             base build / train / eval
  zeroday_eval.py       dual-readout OOD evaluation
  shortcut_audit.py     feature-shortcut diagnostic
  ood_diagnosis.py      OOD scorer / embedding diagnosis
  baselines*.py         baseline models
  resilience/           drift + continual-learning layer (La-MAML, MMD, replay)
  adaptive/             adaptive attention + shift-gated routing (experimental)

pipeline/               experiment orchestration
  run_pipeline.py       multiseed entry point
  flows/                Prefect flows
  tasks/                train / evaluate / stats / plots / report

scripts/                standalone utilities (e.g. runtime benchmarks)
splits/                 leakage-controlled split manifests (JSON)
results/                metrics + figures
graphs/                 generated figures
docs/                   design notes
papers/                 IEEE Access + Elsevier Computer Networks drafts
thesis/                 thesis chapters
demo/                   interactive pipeline animation + GNS3 closed-loop design

8. Two-paper program

Two distinct papers stem from the one framework (non-overlapping metrics):

IEEE Access (adaptation): meta-adaptation is protocol-bound, not domain-general; the shortcut-audit methodology; the sparse episodic transformer.
Elsevier Computer Networks (resilience): the dual-readout adaptation–detection tension (crown-jewel novelty); the detect → re-adapt → GPM-retrain self-healing loop; a GNS3 firewall-enforcement demo.

Availability & licensing

The source code, trained checkpoints, and raw datasets are not included in this public repository. They are withheld pending patent and copyright clearance.

This repository (the architecture documentation, figures, and demo asset) is shared for research dissemination and review only. All rights reserved. Reuse, redistribution, or derivative works require prior written permission.

For collaboration, access requests, or licensing inquiries, please open an issue or contact the author.

Citation

If you reference this work, please cite the associated papers (IEEE Access / Elsevier Computer Networks — in preparation). A CITATION.cff will be added on publication.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TrafficMAML — A Self-Resilient Meta-Learning Framework for Few-Shot Network-Traffic Classification

Project context

1. Problem & thesis

2. Datasets

3. Model architecture

3.1 Sparse Episodic Transformer (the distinctive module)

3.2 Prototype classification & registry

3.3 Shift-gated routing (resilience path)

4. Training

5. Evaluation

5.1 Few-shot accuracy

5.2 Zero-day OOD — dual readout (key mechanism)

5.3 Drift detect-and-recover

5.4 Retain / forgetting

6. Headline results

7. Repository structure

8. Two-paper program

Availability & licensing

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
demo		demo
docs		docs
graphs		graphs
papers		papers
pipeline		pipeline
results		results
scripts		scripts
splits		splits
thesis		thesis
trafficmaml		trafficmaml
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

TrafficMAML — A Self-Resilient Meta-Learning Framework for Few-Shot Network-Traffic Classification

Project context

1. Problem & thesis

2. Datasets

3. Model architecture

3.1 Sparse Episodic Transformer (the distinctive module)

3.2 Prototype classification & registry

3.3 Shift-gated routing (resilience path)

4. Training

5. Evaluation

5.1 Few-shot accuracy

5.2 Zero-day OOD — dual readout (key mechanism)

5.3 Drift detect-and-recover

5.4 Retain / forgetting

6. Headline results

7. Repository structure

8. Two-paper program

Availability & licensing

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages