# 1. Big picture 
| Phase                          | Core research goal                                                         | Delivarables                                                                                                                                                                                                                                                    | Details                                                                                                                                                                  |
| ------------------------------ | -------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| 1. Topology -> RL fitness   | Quantify how structural form affects one-shot RL performance           | – Implement FC, random-sparse, Watts-Strogatz SW, explicit modular, SW + modular (NEAT or direct generators)  <br>– Compute structural metrics (ρ, C, L, σ, Q, etc.)  <br>– Reward curves, sample-efficiency, parameter count  <br>– Correlation / ANOVA linking structure <-> RL score | • Which RL tasks? (simple control vs. Atari vs. MuJoCo)  <br>• Which RL algorithm family (value-based, actor-critic, evolutionary)?  <br>• How many random seeds for significance?                    |
| 2. Continual / Lifelong    | Identify structures that retain & transfer across a task sequence      | – Define task curriculum (e.g., Continual-World, MiniGrid-LevelGen, ProcGen variations)  <br>– Metrics:  Forward transfer, Average forgetting, final area under reward curve  <br>– Compare topologies from Phase 1 under identical curricula                             | • Curriculum length & similarity gradient  <br>• Whether to freeze weights vs. fine-tune  <br>• Use regularisation baselines (EWC, SI, L2) as controls                                                |
| 3. NDP-controlled rigidity | Let a Neural Developmental Program decide what stays rigid vs. plastic | – Implement/core-use Nisioti-style NDP that outputs connection graph and a “rigidity gate” per edge/neuron  <br>– Train NDP over the same curricula; measure emergent rigidity patterns  <br>– Analyse correlations between learned rigidity and metrics from Phases 1–2          | • Encoding of rigidity (scalar temperature? binary mask? synaptic consolidation coefficient)  <br>• How often NDP can mutate structure during life  <br>• Compute cost of plasticity vs. frozen parts |



### 1 Experimental stack

| Layer                  | Plan                                                                                                                                                                                                                                             | Rationale                                                                                                             |
| ---------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------- |
| RL environments    | Start small: CartPole, MountainCarContinuous, LunarLander, then scale to *MiniGrid LevelGen*, *Continual-World (HalfCheetah angles)*, or *ProcGen* variants                                                                                           | Cheap early iterations; later tasks give you continuous control and vision-based curricula                           |
| RL algorithms      | • PPO (stable, on-policy)<br>• A2C (fast baseline)<br>• ES/NEAT (for graph-evolving baselines)                                                                                                                                                                 | Gives you gradient-based + evolutionary flavours, and both can share the same graph substrate                       |
| Graph generators   | • FC: trivial dense adjacency<br>• Random-sparse: Erdős–Rényi *(N, p)*<br>• Modular: Stochastic Block Model (p\_intra ≫ p\_inter)<br>• SW: Watts-Strogatz (k, β)<br>• SW+Mod: SBMs with WS inside blocks + random long-range shortcuts | All have two or three tunable knobs you can sweep while “staying inside” the topology class (as we detailed earlier) |
| Structural metrics | ρ, C, L, σ, Q, assortativity, average degree, weighted efficiency                                                                                                                                                                    | Store per-run; you’ll need them for correlation & ANOVA.                                                              |



### 2 Statistics & analysis blueprint

1. Phase 1 cross-section:
   Two-way ANOVA with factors Topology × Task on (a) final episodic reward, (b) AUC of learning curve, (c) parameter count
   Follow with Pearson/Spearman correlation between each structural metric and each performance metric

2. Phase 2 continual:
   Compute for every task t in the curriculum:

   $$
   \text{Forgetting}(t)=\max_{k\le t} R_k - R_t
   $$

   then Average Forgetting, Forward Transfer (first-episode reward on task t+1), and Backward Transfer
   Same ANOVA structure but substitute Topology × Curriculum

3. Phase 3 rigidity patterns:
   *Cluster* learned rigidity coefficients and relate cluster membership to (a) module boundaries, (b) node centrality, (c) structural metric values
   Possible tools: Mantel test (matrix correlation between rigidity matrix and adjacency), mutual information between rigidity mask and community labels

### 3 Open design questions

| Question                                  | Why it matters                                                                        | Typical choices                                                                      |
| ----------------------------------------- | ------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------ |
| What counts as “good” RL performance? | Sample-efficiency vs. asymptotic vs. compute budget can lead to opposite conclusions. | Use both AUC (early learning) and final reward; normalise by FLOPs or wall-clock.    |
| Weight-sharing vs. per-edge weights?  | Graphs with repeated edge weights behave differently (esp. modular).                  | Probably unique weights; but you can test hash-based parameter tying as an ablation. |
| Plasticity rule during lifetime?      | Hebbian, Oja, or gradient? For NDP you might inject online Hebbian updates.           | Start with SGD/Adam only; later add Hebbian channels if time allows.                 |
| Plasticity cost                       | Free plasticity biases toward fully plastic networks.                                 | Impose an L1 penalty on non-rigid gates or limit number of plastic edges.            |
| Computation budget                    | Determines number of random seeds × tasks × topologies you can afford.                | Rough planning: 5 seeds × 5 topologies × 4 tasks ≈ 100 training runs per phase.      |

# 2. Network Analysis

### 1. Metrics that distinguish the five topologies

| Metric (symbol)                                       | Why it matters              | Fully-Connected         | Random (Erdős-Rényi) | Small-World (Watts-Strogatz-like) | Modular (Community-rich)                     | Small-World + Modular                                       |
|---------------------------------------------------|-------------------------|---------------------|------------------|-------------------------------|------------------------------------------|---------------------------------------------------------|
|Edge density ρ = 2 E / N(N−1)                   | Basic sparsity              | ρ = 1                   | ρ ≈ p (user-set)     | ρ low-to-medium                   | medium overall, high inside modules          | similar to Modular                                          |
|Degree distributionP(k)                          | Homogeneity vs. dispersion  | δ-function at N−1       | Binomial → Poisson   | Narrow (peaks near k)             | Multimodal (peaks per module)                | Multimodal + local narrow peaks                             |
|Global clustering coefficient C                 | Triangle abundance          | C = 1                   | C ≈ ρ (tiny)         | C ≫ C\_random                     | C\_high inside modules, moderate global      | C\_high inside modules, moderate to high global             |
|Characteristic path length L                    | Global efficiency           | L = 1                   | L ≈ log N / log ⟨k⟩  | L ≈ L\_random (small)             | L inflated by inter-module hops              | Close to random at global scale, but modules shrink local L |
|Small-world indexσ = (C/C\_random)/(L/L\_random) | Captures SW trade-off       | σ immense (ill-defined) | σ ≈ 1                | σ ≫ 1 (typically > 2)             | σ modest to large (driven by intra-module C) | σ large                                                     |
|Modularity (Newman Q)                            | Detects community structure | Q ≈ 0                     | Q ≈ 0                  | Q ≈ 0−0.1                           | Q ≥ 0.3–0.9                                  | Q ≥ 0.3–0.9                                                 |
|Assortativity r                                  | Like-with-like linking      | 0 (all equal)           | ≈ 0                  | ≈ 0                               | may be ± depending on module sizes           | similar to Modular                                          |
|Diameter D                                        | Worst-case distance         | 1                       | \~log N              | \~log N                           | grows with module spacing                    | \~log N                                                     |


* Fully connecteds - tands out by the unique combination ρ = 1, C = 1, L = 1
* Random sparse - is the only one with C ≈ ρ and σ ≈ 1
* Small-world - shows C ≫ C\_random while keeping L close to random, giving σ ≫ 1 but Q ≈ 0
* Modular - pushes Q high; C is high within modules but global L increases
* Small-world + Modular - inherits high Q and high σ; it looks modular locally and small-world globally


#### 2. Tunable parameters that keep the network class

| Topology                  | Parameter that can be varied                                                                                                                                                                                                                                           |
|-----------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|Fully connected      | •N (nodes)– any positive integer.<br>• Weighted: edge weights can vary, but every possible edge must exist                                                                                                                                                                                     |
|Random (E-R)         | •p (link probability)in (0, 1) – as long as p is not 1 and edges are independent.<br>•Narbitrary.<br>• Can allow self-loops or direction without breaking “random independent” nature                                                                                                              |
|Small-world          | •k (mean degree)– choose any even k ≪ N.<br>•β (rewire probability)in roughly 0.01–0.2 range keeps high C and low L; β→0 becomes a ring lattice (not SW) and β→1 becomes random<br>•N arbitrary.                                                                                              |
|Modular              | •M (number of modules/communities)≥ 2.<br>•p\_intra vs. p\_inter– keep p\_intra ≫ p\_inter so that Q stays high (>0.3).<br>•Module size distributioncan be equal or heterogeneous.<br>•Edge sparsity-: can be dense inside modules and still modular if between-module density stays low |
|Small-world + Modular| All Modular knobs-plus-a-shortcut strategy (e.g., sprinkle random long-range links) whose density preserves small global L while leaving intra-module C high. Typical design: <10 % of all edges are long shortcuts.<br>• Rewiring confined to inter-module edges helps maintain modularity          |


What we variate on: 
1. Same Edge-Count vs. Same Parameter Count across topologies
2. 1-Layer, 2-Layer, 3-Layer
3. Cartpole, LunarLander, GridWorld
4. Seed 1, Seed 2, Seed 3
5. PPO, A2C, ES
6. MLP, RNN



Which factors we want to rule out? How can we make sure that the differences that we see in performance metrics is actually because of the topology?


- How do weight updates differ? (Does degreee of nodes play a role?) 

Questions for further developments of the PhD in bioinspired Machine Learning:
- Is there a way to use these ideas for future considerations of World Models/Representations for RL OR
- Is there a way to use these ideas for future considerations of anzthing with Multi-Agent Ideas applied to Indirect Encodings