### High-level flow across the three phases

```
Smoke-test  ‚ûú  Full Phase-1 sweep  ‚ûú  Narrow winners
            ‚ûú  Smoke-test Phase-2  ‚ûú  Full continual sweep
                                    ‚ûú  Narrow winners
                                           ‚ûú Smoke-test Phase-3 (NDP)
                                           ‚ûú Full NDP rigidity sweep
```

Each arrow is a **gated checkpoint**: you analyse the previous results, trim the factor space, and promote only the most informative settings to the next, heavier stage.

---

## üìë Run matrices

### Notation

* **Topo**: FC, RS (=random-sparse), WS, MOD, WS+MOD
* **Reg** (regime): EE (=equal-edge), EP (=equal-parameter)
* **Lay**: 1-L, 2-L, 3-L
* **Task-C1**: CartPole, LunarLander, MtCarCont
* **Alg**: A2C, PPO
* **Seed**: 3 RNG seeds
* **Curr**: Continual-World (CW), MiniGrid-LevelGen (MG), ProcGen (PG)
* **Mode**: FZ (freeze backbone), FT (free-tune backbone)
* **RegBas**: none, EWC, SI
* **RigidityCtrl**: rigid-all, soft-gates (learned), NDP-evolved

---

### Phase 1 ‚Äî single-task topology fitness

| Matrix    | Factors                                 | Levels                | Total runs |
| --------- | --------------------------------------- | --------------------- | ---------- |
| **Smoke** | Topo √ó Reg √ó Lay √ó Task-C1 √ó Alg √ó Seed | 5 √ó 1 √ó 1 √ó 1 √ó 1 √ó 1 | **5**      |
| **Full**  | Topo √ó Reg √ó Lay √ó Task-C1 √ó Alg √ó Seed | 5 √ó 2 √ó 3 √ó 3 √ó 2 √ó 3 | **540**    |

*(WS uses Œ≤ = 0.10 only; MOD uses M = 4 only in Phase-1.)*

---

### Phase 2 ‚Äî continual learning

| Matrix    | Factors                                                                             | Levels            | Total runs |
| --------- | ----------------------------------------------------------------------------------- | ----------------- | ---------- |
| **Smoke** | **Top winners from Phase-1 (2)** √ó Curr (CW) √ó Mode (FT) √ó RegBas (none) √ó Seed (1) | 2                 | **2**      |
| **Full**  | **Top 3 winners** √ó Curr √ó Mode √ó RegBas √ó Seed                                     | 3 √ó 3 √ó 2 √ó 3 √ó 3 | **162**    |

*‚ÄúWinners‚Äù = highest AUC/parameter on all three tasks in Phase-1.*

---

### Phase 3 ‚Äî developmental rigidity (NDP)

| Matrix    | Factors                                                                                              | Levels        | Total runs |
| --------- | ---------------------------------------------------------------------------------------------------- | ------------- | ---------- |
| **Smoke** | Single topology (best from Phase-2) √ó Curr (CW) √ó RigidityCtrl (rigid-all vs. soft-gates) √ó Seed (1) | 1 √ó 1 √ó 2     | **2**      |
| **Full**  | Same topology √ó Curr √ó RigidityCtrl √ó Seed                                                           | 1 √ó 3 √ó 3 √ó 3 | **27**     |

*(Soft-gates = learnable Œ± ‚àà \[0,1] per edge; NDP-evolved = third level of RigidityCtrl.)*

---

## üì• What every run must write to disk

| Category               | Fields                                                                       |
| ---------------------- | ---------------------------------------------------------------------------- |
| **Meta**               | run-ID, date, GPU type, code git-hash, PyTorch seed, graph seed              |
| **Topology**           | N, E, regime (EE/EP), knob values (p, Œ≤, M‚Ä¶), adjacency hash                 |
| **Structural metrics** | œÅ, ‚ü®k‚ü©, C, L, œÉ, Q, assortativity r, diameter D                              |
| **Compute**            | #parameters, MACs/forward, wall-clock per 1 k steps                          |
| **Learning curve**     | episodic return √ó timestep (numpy array)                                     |
| **Continual only**     | per-task: first-episode reward, final reward, forgetting Œî, forward-transfer |
| **Phase-3 only**       | rigidity mask (vector), plasticity cost (Œ£(1‚àíŒ±)), NDP genotype               |

Everything saved as a single `.npz` + metadata `.json` per run.

---

## üìä Metrics to evaluate

| Phase | Primary figures                                                                  | Statistical tests                           | Secondary analyses                                                      |
| ----- | -------------------------------------------------------------------------------- | ------------------------------------------- | ----------------------------------------------------------------------- |
| **1** | ‚Ä¢ AUC of reward curve<br>‚Ä¢ Final reward at 1 M steps<br>‚Ä¢ Params vs. AUC scatter | 2-way ANOVA: Topo √ó Reg per task            | Pearson/Spearman of AUC with (œÉ, Q, œÅ)                                  |
| **2** | ‚Ä¢ Avg Forgetting<br>‚Ä¢ Forward Transfer<br>‚Ä¢ Final average reward                 | 3-way ANOVA: Topology √ó Mode √ó RegBas       | Plasticity cost vs. Forgetting regression                               |
| **3** | ‚Ä¢ Avg Forgetting with rigidity control<br>‚Ä¢ Fraction rigid edges vs. performance | Repeated-measures ANOVA across RigidityCtrl | Mantel test: rigidity-matrix ‚Üî adjacency; MI(rigid mask, module labels) |

All plots share a common style-sheet; violin + box for ANOVA, line plots for curves, bar charts for forgetting.

---

### üîÅ Smoke-test check-list (identical for each phase)

1. Launch the smoke matrix on one GPU/CPU.
2. Verify logs contain **all** mandatory fields.
3. Plot quick-and-dirty curves; confirm no NaNs or diverging losses.
4. Run the exact ANOVA script; expect p-values even if not meaningful.
5. Push artefacts to repo; freeze the code version before full sweep.

Following this blueprint you move through three phases with **controlled scope**, guaranteeing that every variance in the final results can be traced back to **topological structure‚Äînot to hidden hyper-parameters or sloppy logging**.


# Additions to make it more fool-proof

Below is an **updated experimental blueprint** that *absorbs every red-flag* raised in the critical review.
Green = new / changed; strike-through = dropped.

---

## 0‚ÄÇPrinciples now baked-in

* **‚â• 7 seeds** for every ‚Äúfull‚Äù matrix ‚Üí enough statistical power.
* **Harder cheap tasks** (CartPole-Swingup, LunarLander, MountainCarCont) instead of rapidly-saturated CartPole.
* FC is **never ‚Äúedge-matched‚Äù**; we use

  * **FC-CAP**: a ‚Äúmainstream‚Äù width (e.g. 128) ‚Äì capacity reference;
  * **FC-FLOP**: width chosen so dense MACs ‚âà sparse MACs ‚Äì compute-fairness reference.
* Two *extra* baselines demanded by the literature:

  * **PRUNED-40 %** (magnitude-pruned FC);
  * **DST-RigL** (dynamic sparse training starting from a random mask).
* Each topology gets *its own* tuned learning-rate & entropy coef (picked by a tiny LR grid on CartPole-Swingup **before** logging any results).
* Verify graph **connectedness**, log the **execution-order DAG**, and snapshot adjacency every **10 k steps**.

---

## 1‚ÄÇRun matrices

### Symbols

| Token      | Meaning                                               |
| ---------- | ----------------------------------------------------- |
| **TOP**    | `{RS, WS, MOD, WS+MOD, FC-CAP, FC-FLOP, PRUNED, DST}` |
| **LAY**    | `{1 L, 2 L, 3 L}`                                     |
| **TSK-P1** | `{Swingup, Lunar, MtCarC}`                            |
| **Œª**      | TD-lambda `{0.95, 0.99}`                              |
| **ALG**    | `PPO` (A2C only in smoke)                             |
| **SEED**   | `7` (smoke = 1)                                       |

### Phase 1‚ÄÇ(single-task fitness)

| Matrix       | Factors                                                       | Levels            | Runs      |
| ------------ | ------------------------------------------------------------- | ----------------- | --------- |
| **Smoke-P1** | TOP (4) √ó LAY (2 L) √ó TSK-P1 (Swingup) √ó ALG (A2C) √ó SEED (1) | 4                 | **4**     |
| **Full-P1**  | TOP (8) √ó LAY (3) √ó TSK-P1 (3) √ó Œª (2) √ó ALG (1) √ó SEED (7)   | 8 √ó 3 √ó 3 √ó 2 √ó 7 | **1 008** |

*WS fixed Œ≤ = 0.10; MOD fixed M = 4; all sparse masks sized to match FC-FLOP MACs.*

---

### Phase 2‚ÄÇ(continual)

‚ÄúWinners‚Äù = top-3 of Phase 1 on **AUC/param** aggregated over tasks.

| Token    | Meaning                                           |
| -------- | ------------------------------------------------- |
| **WIN**  | `3` topologies                                    |
| **CUR**  | `{Continual-World, MiniGrid-LG, ProcGen-CoinRun}` |
| **MODE** | `{FZ (freeze), FT (fine-tune)}`                   |
| **REG**  | `{none, EWC, SI}`                                 |

| Matrix       | Factors                                                | Levels            | Runs    |
| ------------ | ------------------------------------------------------ | ----------------- | ------- |
| **Smoke-P2** | WIN (1) √ó CUR (CW) √ó MODE (FT) √ó REG (none) √ó SEED (1) | 1                 | **1**   |
| **Full-P2**  | WIN (3) √ó CUR (3) √ó MODE (2) √ó REG (3) √ó SEED (7)      | 3 √ó 3 √ó 2 √ó 3 √ó 7 | **378** |

Œª fixed to value that maximised AUC in Phase 1 for each WIN.

---

### Phase 3‚ÄÇ(NDP-driven rigidity)

| Token     | Meaning                        |
| --------- | ------------------------------ |
| **RIG**   | `{rigid-all, soft-gates, NDP}` |
| **CUR**   | same 3 curricula               |
| **TOPO‚òÖ** | single best from Phase 2       |

| Matrix       | Factors                                                      | Levels        | Runs   |
| ------------ | ------------------------------------------------------------ | ------------- | ------ |
| **Smoke-P3** | TOPO‚òÖ √ó CUR (CW) √ó RIG (rigid-all vs. soft-gates) √ó SEED (1) | 2             | **2**  |
| **Full-P3**  | TOPO‚òÖ √ó CUR (3) √ó RIG (3) √ó SEED (7)                         | 1 √ó 3 √ó 3 √ó 7 | **63** |

---

## 2‚ÄÇWhat each run records  (Œî = new item)

| Block                | Fields                                                               |
| -------------------- | -------------------------------------------------------------------- |
| **Meta**             | git-hash, GPU, Torch seed, graph seed                                |
| **Topology static**  | N, E, Œ≤ or M, execution-order DAG hash, **path-multiplicity Œî**      |
| **Topology dynamic** | adjacency snapshot every 10 k steps ‚Üí `œÉ(t), Q(t)`                   |
| **Training**         | LR, entropy, Œª, optimiser stats                                      |
| **Compute**          | params, MACs, *measured* ms/forward                                  |
| **Learning curves**  | episodic return, loss values                                         |
| **Continual only**   | per-task { first-reward, final, forgetting Œî, Fisher diag (√ólayer) } |
| **Phase 3 only**     | rigidity Œ±‚Äêvector snapshots, plasticity cost                         |

All dumped in `[run-id].npz` + `[run-id].json`.

---

## 3‚ÄÇMetrics & statistical tests

| Phase  | Primary metrics                                        | Tests                                                                | Secondary / new                                        |
| ------ | ------------------------------------------------------ | -------------------------------------------------------------------- | ------------------------------------------------------ |
| **P1** | ‚Ä¢ AUC/param‚ÄÉ‚Ä¢ AUC/ MACs‚ÄÉ‚Ä¢ Early-learn slope (0‚Äì50 eps) | Two-way RM-ANOVA (TOP √ó LAY) per task; factors balanced with 7 seeds | ‚Ä¢ ŒîœÉ = œÉ(final)‚àíœÉ(init)‚ÄÉ‚Ä¢ ŒîQ                           |
| **P2** | ‚Ä¢ Avg Forgetting‚ÄÉ‚Ä¢ Forward Transfer‚ÄÉ‚Ä¢ Final Reward     | 3-way RM-ANOVA (Topology √ó MODE √ó REG)                               | Regression: Forgetting ‚àº Fisher loss + ŒîœÉ              |
| **P3** | ‚Ä¢ Same as P2 + Plasticity-cost                         | RM-ANOVA (RIG)                                                       | Mantel test: rigidity vs. adjacency; MI(rigid, module) |

All ANOVAs checked for homoscedasticity; Welch variant if violated.
p-values FDR-corrected across tasks and curricula.

---

## 4‚ÄÇSmoke-test checklist (revised)

1. **Connectivity OK?** auto-redraw until connected.
2. **Adjacency snapshots** file written? size == ‚åàsteps/10 k‚åâ.
3. **Dynamic metrics finite?** no NaN in œÉ(t).
4. **Wall-clock log** exists ‚Üí confirms sparse kernels not exploding.
5. Quick Welch-t across seeds to ensure variance ‚â† 0.

---

### What changed vs. the first blueprint?

| Old                         | New                                                            |
| --------------------------- | -------------------------------------------------------------- |
| Edge-matched FC baseline    | **Dropped** (unfair).  Replaced with **FC-FLOP** & **FC-CAP**. |
| 3 seeds                     | **7 seeds** everywhere full.                                   |
| CartPole                    | **CartPole-Swingup** (harder).                                 |
| No dynamic masks            | Added **DST-RigL** baseline.                                   |
| Static topology logged once | **Snapshot every 10 k steps** for ŒîœÉ, ŒîQ.                      |
| No pruning baseline         | Added **40 % magnitude-pruned FC**.                            |
| Single LR                   | **Per-topology LR/entropy grid-search** (pre-phase).           |
| No Œª factor                 | Added **TD-Œª ‚àà {0.95, 0.99}** (Phase 1 only).                  |

---

## 5‚ÄÇWhy the new design plugs the leaks

*Dynamic drift* captured via ŒîœÉ/ŒîQ ‚Üí if all graphs converge, we know the initial mask mattered only for speed.
*Compute fairness* comes from FC-FLOP runs and measured wall-clock.
*Optimiser bias* killed by per-topology LR tuning.
*Statistical power* rises with 7 seeds.
*Modern baselines* (pruning, DST) ensure the story is relevant in 2025 literature.
*Hardware caveat* documented via ms/forward.

---

### Ready to launch

Spin the **Smoke-P1** matrix first (4 tiny runs).
If logs tick every box, lock the git-hash and scale out with the *Full matrices* above.
Ping me when you want the LR-tuner script, the RigL patch, or the ANOVA notebook template.
