## What does it mean to be a good network topology for RL?
- Evaluate how various network topologies affect RL performance on standard tasks
- Which topologies (small-world, fully-connected, random sparse, modular) provide superior performance in standard RL tasks?
- What structural properties (e.g., clustering, sparsity, path length) correlate with improved RL performance?

Desired outcome:
- Empirical understanding of which structural properties correlate with good RL performance
a. Implement and evaluate standard architectures (fully-connected, random sparse, small-world via NEAT)
b. Analyze metrics: clustering, sparsity, path length.
c. Quantitative analysis (like correlation & ANOVA) to identify beneficial structural properties.

Remarks:
- Which tasks to use for this test? 
    - Parity Task
- Implement different network topologies
    - Fully-Connected
    - Small-World grown with NEAT
    - Random-sparse
    - What else could be tested?
    --> Which metrics? p and k 
- What other structural properties can be interesting?
    - what means clustering, sparsity, path length? run subtests on these?
-  Interplay with modularity? Which role does structural modularity play vs. is there a functional modularity (or specialization)?



- Train each network on standard benchmarks (e.g., CartPole, LunarLander, GridWorld).
- Extract topology stats after training (or during, if topology changes dynamically).
- Run analysis to find which structural properties correlate with high performance.



Security Aspects:
- Avoid layers that could reintroduce full connectivity (e.g., LayerNorm, residuals that shortcut through topology).
- Ensure no automatic pruning or rewiring during training unless intended.
- Monitor adjacency matrices (or weight masks) during training to verify structure preservation.

## How should network structures look to be easily adapted by RL for lifelong learning?
- Determine what structures can be effectively adapted by RL across tasks
- What are the topological features that support adaptation across different tasks in a lifelong learning scenario?
- How does network structure impact robustness against catastrophic forgetting?

Desired Outcome:
Identification of network properties conducive to lifelong learning & Understanding trade-offs between adaptability and forgetting
a. Decide on a controlled task distribution (idea: keperas)
b. Incrementally train networks on task sequences, measure how efficiently they adapt
c. Compare the structures identified in Phase 1 for their lifelong learning capability
d. Explicitly assess catastrophic forgetting

Tasks:
- How to test for lifelong learning capabilities?
- How to test for catastrophic forgetting?
- Which role does scale play (# of parameters)?


## How can an NDP learn to differentiate flexible vs. rigid structures in neural networks?
- Can an NDP learn which parts of a network should remain rigid or become flexible over extended task distributions?
- How does this affect the network’s lifelong learning capability?
- Which role does the layer play? --> Also relates to where Input/Output are positioned


- Insight into how developmental processes influence structural adaptability & NDP-driven rigidity management with idea to improve lifelong learning capabilities
    1. Implement a basic NDP architecture (check this with Nisioti et al.) that outputs not only connection structures but also rigidity/flexibility parameters per connection or neuron
    2. Train NDP-generated networks on lifelong learning task sequences from Phase 2, allowing the NDP to control structural rigidity
    3. Analyze emergent rigidity patterns, relating them explicitly to network performance, adaptability, and resistance to forgetting

# Next REAL Journal Entry:


TODO: Tackle the relation to memory and functional modularity

What makes memory:
- not like the paper, where they say attentional bias makes memory (What did Chris critique here?)

Functional Modularity:
- seems to be an emergent phenomenon, but doesn't just emerge randomly, but given a certain topology and certain pressure (sparse ressources or something like that?)


## Future Work
- Emergence of modularity
- Concerning Modularity: How could functional modularity (in contrast to structural modularity) interplay with the whole project?
- By analyzing how symmetry breaks through training, we might uncover what structural differentiations (e.g. modularization, edge pruning) emerge to support specialized functions.
    --> Found your "Structurally Flexible Neural Networks: Evolving the Building Blocks for General Agents" Paper, where topologically the network topology is sparse and sampled randomly per lifetime. And around 50% of possible connections are removed at random. This random sampling is not evolved, but acts as a symmetry-breaking mechanism. Random sparse connectivity leads to better generalization and adaptability across tasks. Fully connected networks (SFNN_fully) perform worse — they tend to oversmooth and collapse into homogeneous activations, failing to differentiate functions across units. Fixed topology during evolution (Fixed_SFNN) overfits and loses the generalization benefits of structure variation.
    --> Online RL requires topology to be good for gradient updates to work well.
    --> SFNN requires topology to be good for self-organization during lifetime.

# References: 

### Dynamics of specialization in neural modules under resource constraints - https://www.nature.com/articles/s41467-024-55188-9
- "Modularity is an enticing concept that naturally fits the way we attempt to understand and engineer complex systems. We tend to break down difficult concepts into smaller, more manageable parts. Modularity has a clear effect in terms of robustness and interpretability in such systems. Disentangled functionality means that the impairment of a module doesn’t lead to the impairment of the whole, while making it easy to spot critical failure points."
- "As a more emergent principle linking structure and function, it has been suggested that modularity emerged as a byproduct of the evolutionary pressure to reduce the metabolic costs of building, maintaining and operating neurons and synapses. It has been shown that brain networks are near-optimal when it comes to minimizing both its wiring and running metabolic costs." 
- "distinguish two types of modularity, structural and functional, and understand how they are related. We take structural modularity to mean the degree to which a neural network is organized into discrete and differentiated modules. Such modules show much denser connections internally than across other modules. This is usually computed using the Q-metric, measuring how much more clustered these modules are when compared to a graph connected at random."
- "Although note that many other techniques are possible, and that module detection in networks17 as well as defining measures of modularity18,19 are complex and interesting fields in their own right."
- While this structural definition is important, it doesn’t necessarily inform us on the function of the modules. 
--> But here we jump in with the influence on RL
--> Here we don't care about the functional modularity (only if it emerges naturally?)
- Functional modularity: "*Separate modifiability* means that the impairment of one module should not affect the functioning of another."
--> Is this what is necessary for safe mutations? 
- "Generally, the link between structural and functional modularity is context-dependent and involves a complex and dynamic interplay of several internal and external variables. However, it is unclear the extent to which structural modularity is important for the emergence of specialization through training. We show here a case where even under strict structural modularity conditions, modules exhibit entangled functional behaviors. We then explore the space of architectures, resource constraints, and environmental structure in a more systematic way, and find sets of necessary constraints (within our constrained setup) for the emergence of specialized functions in the modules."

Methods:
- "these modules consist of vanilla RNNs, but the code is written to allow the use of other modules types, such as GRUs."
- ". We vary n, p, the pathway structure, and the presence of the bottleneck layer."
- "The choice of a recurrent rather than feed-forward architecture was made to keep a consistent architecture throughout the paper and to simplify the definitions of functional specialization and modularity."
- Structural modularity: "We define the fraction of connections between two modules of size n as p ∈ [1/n2, 1]. The same fraction of connections is used in each direction. The smallest value is p = 1/n2 corresponding to a single connection in each direction, and the largest fraction p = 1 corresponds to n2 connections in each direction (all-to-all connectivity)."

Results:
-  "in at least one simple type of network, imposing moderately high levels of structural modularity doesn’t directly lead to the emergence of specialized modules."
- "a high level of structural modularity isn’t a sufficient condition for the emergence of specialization."
- "One limitation on this conclusion is that we have relied on the well-established Q-metric from network theory. This metric is widely used in connectomics research, but may not be the best measure of structural modularity (although we did not systematically investigate alternatives)."
- "other studies have taken a more emergent approach: various researchers54,55 have investigated whether modular properties can emerge directly from the first principle of minimizing connection costs. Spatially-embedded networks, regularized to minimize connection costs while learning, do end up displaying modular and small-world features, but we note that both works had to introduce an additional regularization or optimization technique to see it emerge. Understanding if, and if so how, structural and functional modularity can emerge from purely low level and naturalistic principles outside of a controlled setup thus remains an open question."




Difference Modularity and Small-Worldness:
- Modularity refers to the presence of tightly connected subgroups (modules) within a network that are loosely connected to other modules. Each module might specialize in a sub-function or sub-task.
- Small-worldness describes networks that exhibit:
    - High clustering (like modularity) and
    - Short average path lengths (like random graphs), which facilitates efficient communication across the network.

Difference between continuous learning and lifelong learning
- sometimes used interchangeably in casual contexts even though there is a difference
- Lifelong Learning
    - Focuses on an agent's ability to retain and reuse knowledge across tasks over a long time.
    - Involves task boundaries (task A, then task B, etc.).
    - Concerns: catastrophic forgetting, transfer learning, memory mechanisms.
- Continuous Learning
    - Emphasizes learning in a streaming, non-stationary environment.
    - Often assumes no explicit task boundaries.
    - Focuses on gradual adaptation, robustness to distribution shifts.
- TL;DR: 
    - Lifelong = "curriculum of tasks with memory"
    - Continuous = "ongoing environmental change with adaptation"