### High-level flow across the three phases

```
Smoke-test  ➜  Full Phase-1 sweep  ➜  Narrow winners
            ➜  Smoke-test Phase-2  ➜  Full continual sweep
                                    ➜  Narrow winners
                                           ➜ Smoke-test Phase-3 (NDP)
                                           ➜ Full NDP rigidity sweep
```

Each arrow is a **gated checkpoint**: you analyse the previous results, trim the factor space, and promote only the most informative settings to the next, heavier stage.

---

## 📑 Run matrices

### Notation

* **Topo**: FC, RS (=random-sparse), WS, MOD, WS+MOD
* **Reg** (regime): EE (=equal-edge), EP (=equal-parameter)
* **Lay**: 1-L, 2-L, 3-L
* **Task-C1**: CartPole, LunarLander, MtCarCont
* **Alg**: A2C, PPO
* **Seed**: 3 RNG seeds
* **Curr**: Continual-World (CW), MiniGrid-LevelGen (MG), ProcGen (PG)
* **Mode**: FZ (freeze backbone), FT (free-tune backbone)
* **RegBas**: none, EWC, SI
* **RigidityCtrl**: rigid-all, soft-gates (learned), NDP-evolved

---

### Phase 1 — single-task topology fitness

| Matrix    | Factors                                 | Levels                | Total runs |
| --------- | --------------------------------------- | --------------------- | ---------- |
| **Smoke** | Topo × Reg × Lay × Task-C1 × Alg × Seed | 5 × 1 × 1 × 1 × 1 × 1 | **5**      |
| **Full**  | Topo × Reg × Lay × Task-C1 × Alg × Seed | 5 × 2 × 3 × 3 × 2 × 3 | **540**    |

*(WS uses β = 0.10 only; MOD uses M = 4 only in Phase-1.)*

---

### Phase 2 — continual learning

| Matrix    | Factors                                                                             | Levels            | Total runs |
| --------- | ----------------------------------------------------------------------------------- | ----------------- | ---------- |
| **Smoke** | **Top winners from Phase-1 (2)** × Curr (CW) × Mode (FT) × RegBas (none) × Seed (1) | 2                 | **2**      |
| **Full**  | **Top 3 winners** × Curr × Mode × RegBas × Seed                                     | 3 × 3 × 2 × 3 × 3 | **162**    |

*“Winners” = highest AUC/parameter on all three tasks in Phase-1.*

---

### Phase 3 — developmental rigidity (NDP)

| Matrix    | Factors                                                                                              | Levels        | Total runs |
| --------- | ---------------------------------------------------------------------------------------------------- | ------------- | ---------- |
| **Smoke** | Single topology (best from Phase-2) × Curr (CW) × RigidityCtrl (rigid-all vs. soft-gates) × Seed (1) | 1 × 1 × 2     | **2**      |
| **Full**  | Same topology × Curr × RigidityCtrl × Seed                                                           | 1 × 3 × 3 × 3 | **27**     |

*(Soft-gates = learnable α ∈ \[0,1] per edge; NDP-evolved = third level of RigidityCtrl.)*

---

## 📥 What every run must write to disk

| Category               | Fields                                                                       |
| ---------------------- | ---------------------------------------------------------------------------- |
| **Meta**               | run-ID, date, GPU type, code git-hash, PyTorch seed, graph seed              |
| **Topology**           | N, E, regime (EE/EP), knob values (p, β, M…), adjacency hash                 |
| **Structural metrics** | ρ, ⟨k⟩, C, L, σ, Q, assortativity r, diameter D                              |
| **Compute**            | #parameters, MACs/forward, wall-clock per 1 k steps                          |
| **Learning curve**     | episodic return × timestep (numpy array)                                     |
| **Continual only**     | per-task: first-episode reward, final reward, forgetting Δ, forward-transfer |
| **Phase-3 only**       | rigidity mask (vector), plasticity cost (Σ(1−α)), NDP genotype               |

Everything saved as a single `.npz` + metadata `.json` per run.

---

## 📊 Metrics to evaluate

| Phase | Primary figures                                                                  | Statistical tests                           | Secondary analyses                                                      |
| ----- | -------------------------------------------------------------------------------- | ------------------------------------------- | ----------------------------------------------------------------------- |
| **1** | • AUC of reward curve<br>• Final reward at 1 M steps<br>• Params vs. AUC scatter | 2-way ANOVA: Topo × Reg per task            | Pearson/Spearman of AUC with (σ, Q, ρ)                                  |
| **2** | • Avg Forgetting<br>• Forward Transfer<br>• Final average reward                 | 3-way ANOVA: Topology × Mode × RegBas       | Plasticity cost vs. Forgetting regression                               |
| **3** | • Avg Forgetting with rigidity control<br>• Fraction rigid edges vs. performance | Repeated-measures ANOVA across RigidityCtrl | Mantel test: rigidity-matrix ↔ adjacency; MI(rigid mask, module labels) |

All plots share a common style-sheet; violin + box for ANOVA, line plots for curves, bar charts for forgetting.

---

### 🔁 Smoke-test check-list (identical for each phase)

1. Launch the smoke matrix on one GPU/CPU.
2. Verify logs contain **all** mandatory fields.
3. Plot quick-and-dirty curves; confirm no NaNs or diverging losses.
4. Run the exact ANOVA script; expect p-values even if not meaningful.
5. Push artefacts to repo; freeze the code version before full sweep.

Following this blueprint you move through three phases with **controlled scope**, guaranteeing that every variance in the final results can be traced back to **topological structure—not to hidden hyper-parameters or sloppy logging**.


# Additions to make it more fool-proof

Below is an **updated experimental blueprint** that *absorbs every red-flag* raised in the critical review.
Green = new / changed; strike-through = dropped.

---

## 0 Principles now baked-in

* **≥ 7 seeds** for every “full” matrix → enough statistical power.
* **Harder cheap tasks** (CartPole-Swingup, LunarLander, MountainCarCont) instead of rapidly-saturated CartPole.
* FC is **never “edge-matched”**; we use

  * **FC-CAP**: a “mainstream” width (e.g. 128) – capacity reference;
  * **FC-FLOP**: width chosen so dense MACs ≈ sparse MACs – compute-fairness reference.
* Two *extra* baselines demanded by the literature:

  * **PRUNED-40 %** (magnitude-pruned FC);
  * **DST-RigL** (dynamic sparse training starting from a random mask).
* Each topology gets *its own* tuned learning-rate & entropy coef (picked by a tiny LR grid on CartPole-Swingup **before** logging any results).
* Verify graph **connectedness**, log the **execution-order DAG**, and snapshot adjacency every **10 k steps**.

---

## 1 Run matrices

### Symbols

| Token      | Meaning                                               |
| ---------- | ----------------------------------------------------- |
| **TOP**    | `{RS, WS, MOD, WS+MOD, FC-CAP, FC-FLOP, PRUNED, DST}` |
| **LAY**    | `{1 L, 2 L, 3 L}`                                     |
| **TSK-P1** | `{Swingup, Lunar, MtCarC}`                            |
| **λ**      | TD-lambda `{0.95, 0.99}`                              |
| **ALG**    | `PPO` (A2C only in smoke)                             |
| **SEED**   | `7` (smoke = 1)                                       |

### Phase 1 (single-task fitness)

| Matrix       | Factors                                                       | Levels            | Runs      |
| ------------ | ------------------------------------------------------------- | ----------------- | --------- |
| **Smoke-P1** | TOP (4) × LAY (2 L) × TSK-P1 (Swingup) × ALG (A2C) × SEED (1) | 4                 | **4**     |
| **Full-P1**  | TOP (8) × LAY (3) × TSK-P1 (3) × λ (2) × ALG (1) × SEED (7)   | 8 × 3 × 3 × 2 × 7 | **1 008** |

*WS fixed β = 0.10; MOD fixed M = 4; all sparse masks sized to match FC-FLOP MACs.*

---

### Phase 2 (continual)

“Winners” = top-3 of Phase 1 on **AUC/param** aggregated over tasks.

| Token    | Meaning                                           |
| -------- | ------------------------------------------------- |
| **WIN**  | `3` topologies                                    |
| **CUR**  | `{Continual-World, MiniGrid-LG, ProcGen-CoinRun}` |
| **MODE** | `{FZ (freeze), FT (fine-tune)}`                   |
| **REG**  | `{none, EWC, SI}`                                 |

| Matrix       | Factors                                                | Levels            | Runs    |
| ------------ | ------------------------------------------------------ | ----------------- | ------- |
| **Smoke-P2** | WIN (1) × CUR (CW) × MODE (FT) × REG (none) × SEED (1) | 1                 | **1**   |
| **Full-P2**  | WIN (3) × CUR (3) × MODE (2) × REG (3) × SEED (7)      | 3 × 3 × 2 × 3 × 7 | **378** |

λ fixed to value that maximised AUC in Phase 1 for each WIN.

---

### Phase 3 (NDP-driven rigidity)

| Token     | Meaning                        |
| --------- | ------------------------------ |
| **RIG**   | `{rigid-all, soft-gates, NDP}` |
| **CUR**   | same 3 curricula               |
| **TOPO★** | single best from Phase 2       |

| Matrix       | Factors                                                      | Levels        | Runs   |
| ------------ | ------------------------------------------------------------ | ------------- | ------ |
| **Smoke-P3** | TOPO★ × CUR (CW) × RIG (rigid-all vs. soft-gates) × SEED (1) | 2             | **2**  |
| **Full-P3**  | TOPO★ × CUR (3) × RIG (3) × SEED (7)                         | 1 × 3 × 3 × 7 | **63** |

---

## 2 What each run records  (Δ = new item)

| Block                | Fields                                                               |
| -------------------- | -------------------------------------------------------------------- |
| **Meta**             | git-hash, GPU, Torch seed, graph seed                                |
| **Topology static**  | N, E, β or M, execution-order DAG hash, **path-multiplicity Δ**      |
| **Topology dynamic** | adjacency snapshot every 10 k steps → `σ(t), Q(t)`                   |
| **Training**         | LR, entropy, λ, optimiser stats                                      |
| **Compute**          | params, MACs, *measured* ms/forward                                  |
| **Learning curves**  | episodic return, loss values                                         |
| **Continual only**   | per-task { first-reward, final, forgetting Δ, Fisher diag (×layer) } |
| **Phase 3 only**     | rigidity α‐vector snapshots, plasticity cost                         |

All dumped in `[run-id].npz` + `[run-id].json`.

---

## 3 Metrics & statistical tests

| Phase  | Primary metrics                                        | Tests                                                                | Secondary / new                                        |
| ------ | ------------------------------------------------------ | -------------------------------------------------------------------- | ------------------------------------------------------ |
| **P1** | • AUC/param • AUC/ MACs • Early-learn slope (0–50 eps) | Two-way RM-ANOVA (TOP × LAY) per task; factors balanced with 7 seeds | • Δσ = σ(final)−σ(init) • ΔQ                           |
| **P2** | • Avg Forgetting • Forward Transfer • Final Reward     | 3-way RM-ANOVA (Topology × MODE × REG)                               | Regression: Forgetting ∼ Fisher loss + Δσ              |
| **P3** | • Same as P2 + Plasticity-cost                         | RM-ANOVA (RIG)                                                       | Mantel test: rigidity vs. adjacency; MI(rigid, module) |

All ANOVAs checked for homoscedasticity; Welch variant if violated.
p-values FDR-corrected across tasks and curricula.

---

## 4 Smoke-test checklist (revised)

1. **Connectivity OK?** auto-redraw until connected.
2. **Adjacency snapshots** file written? size == ⌈steps/10 k⌉.
3. **Dynamic metrics finite?** no NaN in σ(t).
4. **Wall-clock log** exists → confirms sparse kernels not exploding.
5. Quick Welch-t across seeds to ensure variance ≠ 0.

---

### What changed vs. the first blueprint?

| Old                         | New                                                            |
| --------------------------- | -------------------------------------------------------------- |
| Edge-matched FC baseline    | **Dropped** (unfair).  Replaced with **FC-FLOP** & **FC-CAP**. |
| 3 seeds                     | **7 seeds** everywhere full.                                   |
| CartPole                    | **CartPole-Swingup** (harder).                                 |
| No dynamic masks            | Added **DST-RigL** baseline.                                   |
| Static topology logged once | **Snapshot every 10 k steps** for Δσ, ΔQ.                      |
| No pruning baseline         | Added **40 % magnitude-pruned FC**.                            |
| Single LR                   | **Per-topology LR/entropy grid-search** (pre-phase).           |
| No λ factor                 | Added **TD-λ ∈ {0.95, 0.99}** (Phase 1 only).                  |

---

## 5 Why the new design plugs the leaks

*Dynamic drift* captured via Δσ/ΔQ → if all graphs converge, we know the initial mask mattered only for speed.
*Compute fairness* comes from FC-FLOP runs and measured wall-clock.
*Optimiser bias* killed by per-topology LR tuning.
*Statistical power* rises with 7 seeds.
*Modern baselines* (pruning, DST) ensure the story is relevant in 2025 literature.
*Hardware caveat* documented via ms/forward.

---

### Ready to launch

Spin the **Smoke-P1** matrix first (4 tiny runs).
If logs tick every box, lock the git-hash and scale out with the *Full matrices* above.
Ping me when you want the LR-tuner script, the RigL patch, or the ANOVA notebook template.
