# Definition of Network topologies

### Common Network Metrics

| Type                         |   avg_clustering |   avg_degree |   avg_path_length |   density |
|:-----------------------------|-----------------:|-------------:|------------------:|----------:|
| Fully Connected              |           1      |      69      |            1      |    1      |
| Modular                      |           0.7047 |      25.7429 |            2.0489 |    0.1865 |
| Random-sparse                |           0.1622 |      12.4375 |            1.8914 |    0.0987 |
| Small-world (NEAT)           |           0.429  |      11.5429 |            2.4166 |    0.0836 |
| Small-world (Watts-Strogatz) |           0.4022 |      11.3143 |            3.1391 |    0.082  |

### Core Network Metrics

| Type                         |   core_avg_clustering |   core_avg_degree |   core_avg_path_length |   core_density |
|:-----------------------------|----------------------:|------------------:|-----------------------:|---------------:|
| Modular                      |                0.7692 |           27.7812 |                 1.9623 |         0.2205 |
| Small-world (NEAT)           |                0.4536 |           12.25   |                 2.3175 |         0.0972 |
| Small-world (Watts-Strogatz) |                0.4661 |           12      |                 3.1002 |         0.0952 |

### Topology Specific Metrics

Modular:
- modularity: 0.6291
- n_communities: 4
- p_inter: 0.0531
- p_intra: 0.8

Small-world (Watts-Strogatz):
- beta: 0.1
- k: 6

**TODO: Short discussion what these metrics tell us**

# Visualization of Topologies

1. Fully Connected
2. Modular
3. Random-sparse
4. Small-world grown with NEAT
5. Small-world like Watts-Stroganov

# Training on standard benchmarks

1. CartPole
2. LunarLander
3. GridWorld
4. (Parity)

Convert to Policy Networks
- Wrap each topology in a functional neural network:
    - Assign inputs/outputs (e.g., first n input nodes, last m output nodes)
    - Assign fixed or trainable activations (e.g., ReLU/Linear)
    - Initialize edge weights (random, or shared init)

Using different RL-Algorithms
1. Value-Based
    - DQN
2. Policy-Based
    - REINFORCE
3. Actor-Critic
    - PPO
    - SAC
4. Model-Based
    - MBPO

(To begin with: PPO, SAC)


### PPO

### SAC

# Analysis of Evaluation Metrics

To systematically assess the impact of network topologies:

- Performance Metrics:
    - Final Reward
    - Learning Curves: Track cumulative rewards over episodes.
    - Sample Efficiency: Measure rewards relative to the number of interactions.
    - Stability: Evaluate variance across multiple training runs.
    - Policy Robustness
- Structural Metrics:
    - Degree distribution
    - Clustering Coefficient: Indicates the degree to which nodes cluster together.
    - Average Path Length: Measures the average number of steps between nodes.
    - Sparsity: Represents the proportion of zero-valued weights.
    - Modularity
- Statistical Analysis:
    - Correlation Analysis: Determine relationships between structural and performance metrics.
    - ANOVA: Assess differences in performance across topologies.
    - Regression Models: Predict performance outcomes based on structural features.

By analyzing how symmetry breaks through training, we might uncover what structural differentiations (e.g. modularization, edge pruning) emerge to support specialized functions.




### Across 3 Seeds: 42, 43, 44




# Scaling

Here think about altering the scale of the networks

So instead of just 1 hidden layer, go for 2 or 3 e.g.

### Which 1 vs. 2 hidden layers
Expectation:
If you keep the hidden size 64 for each layer and apply the same density (10 %), RS should still learn.
Small‑world and Modular should catch up or even overtake RS because their connectivity patterns finally matter.

# Discussion

Generalisation / transfer.
Sparse, small‑world, or modular graphs avoid “over‑smoothing” and reduce co‑adaptation, which helps on tasks with noisy inputs or when transferring to variants.

Online continual learning.
Modular graphs isolate sub‑functions; pruning or freezing a module hurts only part of the behaviour.

## Which option is best for the project?

| option                                                         | advantages                                                                                                                                                              | drawbacks                                                                                                                     | when to use                                                                                       |
| -------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------- |
| **Same global edge count** (density target)                    | *Very easy* to implement; good first sanity check.                                                                                                                      | Capacity differs whenever you change **n\_total** or strip H→H edges; can still leave “dead” hidden units.                    | Pilot experiments, scaling studies where you keep `n_total` fixed.                                |
| **Parameter‑fair (same total weights)**                        | Cleanest answer to “does wiring help *for a fixed capacity*?”  Avoids criticism that one model just has more weights.                                                   | Requires active rescaling of density per topology & per stub rule; makes summary plots a little less intuitive.               | Papers, final benchmarking, cross‑task generalisation claims.                                     |
| **Fixed local fan‑in/out** (e.g. *each output gets 8 parents*) | Matches biological “sparse but regular” notion; optimisation is stable because gradients per node live in same scale; parameter count grows automatically with `n_out`. | Edge count differs across topologies with different IO wiring; needs a second pass to ensure reachability/clustering targets. | Ablations of biological realism; studying how modular or SW wiring interacts with limited fan‑in. |




# What I did so far

- Implement the Topology-Types with the same global edge count for all (first sanity check)
- Train them with PPO and Cartpole 
    - Environment: CartPole-v1  
    - Training Settings: PPO with lr=0.0003, batch_size=64, n_steps=2048, n_epochs=10, gamma=0.99, gae_lambda=0.95, clip_range=0.2, total_timesteps=20,000  
    - Evaluation: Comparing topologies (fc, rs, sw_neat, sw_ws, modular) with 3 seeds 

-> Fully Connected performs best

Next: Parameter‑fair (same total weights)
- Probably cleanest answer to “does wiring help for a fixed capacity?”  
- Avoids criticism that one model just has more weights
- Requires active rescaling of density per topology & per stub rule
- makes plots less intuitive

This is my codebase to check which influence network topology has on RL performance. I started with Cartpole and PPO, but will in the future also use LunarLander, GridWorld as new tasks and SAC as new RL algorithm. 

Analyse and explain really simple what I did here to make sure that topologies are comparable:

- Current status of the project
  - Built five topology generators (fc, random‑sparse, small‑world‑NEAT, Watts‑Strogatz, modular)
  - Forced each network to hold exactly the same global edge count first, then moved to the same total trainable‑weight budget (parameter‑fair)
  - Training done with PPO on CartPole‑v1, identical hyper‑parameters, 20 000 timesteps
- Observations from equal weight‑budget run (seed 42)
  - Fully connected
    - Final mean reward 104
    - Sparsity 0.00, average degree 140, average path length 1.0
  - Modular
    - Final mean reward 41
    - Sparsity 0.97, average degree 4.0, average path length 2.58
  - Random sparse
    - Final mean reward 46
    - Sparsity 0.97, average degree 4.2, average path length 2.46
  - Small‑world NEAT
    - Final mean reward 46
    - Sparsity 0.97, average degree 4.2, average path length 2.46
  - Watts‑Strogatz small‑world
    - Final mean reward 47
    - Sparsity 0.97, average degree 4.2, average path length 2.48

- Simple explanation of why fully connected still wins
  - Gradient flow and credit assignment
    - Every hidden unit in FC receives error signals every update, none are starved
    - Sparse graphs pass reward signals through few edges, many weights get tiny or no gradients
  - Effective capacity
    - Same number of weights does not equal same usefulness
    - FC puts all weights in direct paths, sparse nets hide most weights behind multi‑hop routes and behave like lower‑width models during early learning
  - Task bias

    - CartPole rewards a fast, almost linear policy
    - Dense receptive fields of FC match this bias; small‑world clustering or long‑range reuse helps only on harder, high‑dimensional tasks
- Why switching to parameter‑fair was worth it

  - Removes arguments about raw capacity differences
  - Confirms wiring alone (without more gradient reach) does not close the gap under current settings
  - Codebase now pads every topology to the same weight budget and unit tests verify counts
- Remaining issues and next practical steps

  - Modular generator occasionally fails to meet clustering and modularity targets before padding; refine fallback so it preserves community structure as well as budget
  - Move to harder environments (for example LunarLander‑v2 or an Atari game) where representation reuse matters more
  - Log gradient L2 norms per layer to confirm weight starvation in sparse nets and guide further architectural tweaks
