# Neuroevolution of Augmenting Topologies

## Evolutionary Algorithms
Evolutionary algorithms (EA) are global optimization techniques that try to find solutions to a problem by using heuristics inspired by the principles of nature's evolution. 

The basics of it is that a population of solutions to the problem is kept using some meaningful representation (bitstrings often used in genetic algorithms which is one type of EA). Individual solution representation in the population are called *genotypes* and the solutions they represent are called *phenotypes* (?). Each genome representation has different *chromosomes* that can be altered that change a solution.

A *fitness function* is used to rank how good each solution is. This random search of solutions is then guided using evolutionary techniques such as *crossover/mating* of two individuals, *mutation*, *natural selection*. The search is iterative and for each iteration usually, some individuals are discarded while others are kept and can then mate, mutate etc to create new individuals that might be better.

Research in this area usually explores different techniques for mating, mutation, which individuals are chosen to continue, the mapping between genotypes and phenotypes (encoding), etc.

TODO: critique, slow, no guarantee of convergence? etc, no mathematical guartentees TODO: better results with simulated annealing? or other heuristic searches?

TODO: building block hypothesis

I think the main attraction of this type of algorithms is that they are easy to use when the objective is not very nicely defined mathematically or filled with a lot of non continuities, bad gradients. TODO: is this really a problem for DL since they are general function approximators given enough complexity?

## EA for Neural Networks
Evolutionary techniques have been combined with neural networks with some success. Called Neural Evolution (NE) here.

A neural network is defined by its *topology*, its *weights* and its *activation functions*. Most research in neural networks uses some form of hand designed topology meaning the shape and activation functions of the network are set from the beginning and then the weights of the network are optimised through some optimisation technique like gradient descent. Initially, NE was used to find weights of the networks (as an alternative to gradient descent/backprop? maybe before backprop was popular). Later NE was used to evolve the topologies as well.

It's known that a fully connected network of enough complexity can approximate any continuous function but the advantages of having a more sparse network found through NE are thought to be speed of learning. Another advantage is it removes the need for a human to have to decide a topology, although depending on representations used in NE, it is not completely removed either. However, this paper aims to argue that "if done right, evolving structure along with connection weights can significantly enhance the performance of NE".

The representations used for the individual solutions (genotypes) are usually divided into *direct* and *indirect* encodings. A direct encoding has every node and connection between them specified explicitly meaning there is basically no difference between the genotype and the phenotype. An indirect encoding instead specifies rules in some way that describe how the neural net (phenotype) should be built. There are pros and cons with both but direct encodings are usually simpler to implement while indirect can have more compressed representations. Different representations also make the different genetic operators harder or easier.

I have also seen examples where NE is combined with backprop in which NE is used only to find a good topology and backprop to optimise the weights of a certain topology. This seems like a much better idea in all cases?


## Put this somewhere
As far as I know, neuroevolution seems to be a bit outside the main research stage. I think this is true for most research in evolutionary algorithms. TODO: common critiques here. My own gut feeling is also kind of biased against them, maybe because they are not as well founded theoretically as other techniques (at least right now).

TODO: a fully connected with enough (infinite?) hidden nodes is a universal approximator of any continuous function (Neal?), connection to gaussian processes. The only problem is with enought time to optimise, need to cut corners somewhere which is what current research does in different ways

NEAT: one goal is to minimize the search space of weights, i.e. reduce number of weights needed

TODO: my own view? slightly biased against this? :P

TODO: competing conventions problem, crossover is hard for neural networks

TODO: other competing methods for doing this? either for network topology itself or fitting similar problems
http://papers.nips.cc/paper/4522-practical-bayesian-optimization-of-machine-learning-algorithms.pdf
more recurrent neural network stuff? http://jmlr.org/proceedings/papers/v37/jozefowicz15.pdf

## Basics of NEAT
The authors present NEAT (NeuroEvolution of Augmenting Topologies) which is an NE method that tries to take advantage of structure to minimize the weight space, i.e. using fewer nodes connected in a certain way to give the same results as a denser network. They say that this should show gains in learning speed. "Improved efficiency results from topologies being minimized throughout evolution, rather than only at the very end."

The authors lists three challenges with using NE for evolving topologies and proposes the NEAT method which consists of methods to solve them.

* **Is there a genetic representation that allows disparate topologies to cross over in a meaningful way?** i.e. the problem here is that it's hard to align genes describing the same thing for two different individuals. Sometimes, the genomes can have different sizes. Other times, genes in the exact same position on different chromosomes may be expressing completely different traits. In addition, genes expressing the same trait may appear at different positions on different chromosomes. How can these complications be resolved? This is solved with the alignment procedure using gene tracking with innovation number.

* **How can topological innovation that needs a few generations to be optimized be protected so that it does not disappear from the population prematurely?** This is solved with *Speciation* explained below.

* **How can topologies be minimized throughout evolution without the need for a specially contrived fitness function that measures complexity?**

They also talk about the competing conventions problem which is a common problem in NE. The problem is about having different encodings for the same network. It's a problem because it can cause crossover to have damaged offspring.

TODO: solution to competing conventions? the alignment?

### Encoding
NEAT uses direct encoding because the indirect encoding usually requires more knowledge about genetic and neural mechanisms. NEAT's representation is defined as follows

<img src="figs/NEAT/genotype.png" width="75%" height="75%">

NEAT’s genetic encoding scheme is designed to allow corresponding genes to be easily lined up when two genomes cross over during mating.

*Node genes* provide a list of inputs, hidden nodes, and outputs that can be connected.

Each genotype also includes a list of *connection genes* (edges basically). Each connection gene specifies the in-node, the out-node, the weight of the connection, and if it's enabled. It also has an *innovation number* which is used for finding corresponding genes. TODO: explain further

### Gene tracking
We want to track the historical origin of each gene because two genes with the same origin must represent the same thing (a part from weight differences). This tracking is used to align genes when doing crossover.

This is achieved by simply having a global incrementing number that is assigned to each new gene appearing through structural mutation (new nodes or edges).

See section on crossover for how it's used when creating offspring


### Genetic Operators

#### Mutation
NEAT's mutation can change both connection weights and topology.

At each generation, each connection weight is perturbed or it is not with some probability. *Structural mutations* happen either by adding nodes or by adding a connection between unconnected node. This is shown in the figure below. In the add node case an existing connection is split and the new node is placed in between. The old connection is disabled and to new edges are added.

<img src="figs/NEAT/add-connection-add-node.png" width="50%" height="50%">

Whenever a new connection or node gene is added, it is given an innovation number. The problem of the same mutation being given different innovation numbers, which can happen if in the same generation two individual mutate the same new connection or node. To solve this a list of each new innovation in a generation is kept, and then made sure that the same structural innovations are given the same innovation number.


#### Crossover / mating
Each gene has an innovation number as described earlier. This number is used to see which genes are actually the same. When doing the crossover, an inherited gene (connection or node) will have the same innovation number as it had in the parent it came from.

Innovation numbers can not change in during this part, that only happens in mutations.

When doing the actual crossover, each gene with the same innovation number are matched up (*matching genes*) and then depending on crossover strategy, the offspring's corresponding gene is determined from those two. Often one of the matching genes are just chosen randomly to pass to the offspring.

If an alignment can not be made of two genes because only one of the parents has it, it seems like the nonmatched gene is always inherited by the offspring?

The following image demonstrates this.

<img src="figs/NEAT/crossover.png" width="65%" height="65%">


### Speciation
Innovation in evolved networks happens by adding new structure in some way (mutation/crossover) but usually some new structure can cause the fitness to go down. For example, adding a new node can introduce a new non linearity but the weight may not be very good for it in the beginning. It might require a few iterations before it becomes better than before so for this reason it is necessary to protect these and let them survive for a bit.

One way to solve this is by *speciation* which is a way to let individuals evolve into different species that only compete with each other. This requires a *compatability function* to tell if two individuals are different species or not.

NEAT uses the number of excess and disjoint genes (unmatched genes basically) between two individuals to determine if they are two different species or not. The compatability function is defined as follows.

$$\delta = \frac{c_1 E}{N} + \frac{c_2 D}{N} + c_3 \overline{W}$$

Where $E$ is the number of excess genes (innovation numbers above the other's max innovation number), $D$ is the number of disjoint genes, $N$ is the number of genes in the larger genome, and $\overline{W}$ is the average weight difference in matched connection weights. $c_1,\ c_2,\ c_3$ are parameters to adjust the importance of each part. A compatability threshold $\delta_t$ is then used for the cutoff. Is the threshold fixed?

During the evolution an ordered list of species is kept and genotypes are places in the species they are "closest" to. To find the closest species, a random genome from each species from the previous generation is used to compare with. 

If no species is compatible enough for an individual, a new species is created with that individual as its representative.

NEAT uses speciation by using *explicit fitness sharing* which forces individuals with similar genotypes to share their fitness payoff (??). Thus, innovations in NEAT are protected in their own species and to not let a species afford to become too big. This is to avoid a species to take over the entire population. The adjusted fitness of an individual $i$ in the population is defined as follows.

$$f'_i = \frac{f_i}{\sum_j sh(\delta(i, j))}$$

Where $f_i$ is the fitness to the task of the individual $i$ and $sh(\delta(i, j))$ is 0 when $\delta$ is above the threshold and 1 otherwise, in other words the denominator is the number of individuals in the same species as $i$.

Every species is assigned a number of offspring in proportion to the sum of adjusted fitness for its member individuals.

Species then reproduce by first removing the lowest performing individuals. The entire population is then replaced with the offspring of the remaining individuals in each species.

Does crossover only happen within species?

### Initial Topology of Individuals
NE algorithms can be sensitive to starting with random initial topology populations since it can lead to networks with no path from inputs to outputs which are infeasible.

Experiments also show that it is beneficial to evolve minimal topologies (Occam's razor basically? too much complexity is bad). But it can be hard to get small networks when starting with random topology since shrinking the network through evolution can be problematic. Difficult to do it in fitness function since it might then favor too small topologies.

NEAT solves this by just starting minimally and letting NE solve it itself. This means that NEAT starts with no hidden nodes and grows the structure when more is needed. In other words, the initial topology is just the input nodes directly connected to the output nodes.

New structure only comes from structural mutations.

TODO: how does it solve the problem of infeasible networks though? Enough by just punishing them in fitness function adequately?

### TLDR
The three main things with NEAT are
* The historical markings of genes to keep track of which genes are the same to make crossover possible
* Speciation to allow for new structure to only compete with similar topologies which becomes better in the long run
* Starting from minimal topologies, i.e. as little complexity as possible and let it grow as it needs

### Parameter settings
TODO


## Pseudo code
Is this about it? What is the order of when things happen, do we maintain the speciation after mutation? or before?

```python
population = init_species_and_individuals()

while no solution good enough:
    population.individuals_iterator.foreach(evaluate_fitness)
    
    population = update_speciation()
    
    population.species = population.species.foreach(filter(culling_strategy))
    
    population.species.foreach(crossover_within_species)
    
    population.individuals_iterator.foreach(mutate)
    
return best solution
```


## Discussion and Thoughts
With GPUs (optimized for large matrix multiplications), isnt it easier to just have weights go to zero rather than finding a sparse topology? 

Is this completely outdated? seems like it tbh, TODO: look at new research using this or hyperNEAT or other extensions of NEAT

Are cycles in networks possible with NEAT? how is this computed in this case? Or do we avoid them using some rule when doing crossover/mutation etc. Actually it means the network can remember stuff, just recurrent connection that is

How are recurrent connections actually handled when using the network? as well as when evaluating the fitness?

What are some competing methods for similar problems? (Deep) Q-learning, other reinforcement learning


# Small example
Toy example of finding a neural network that solves XOR via NEAT. TODO

In [1]:
import numpy as np

In [None]:
X = [[0, 0],
     [1, 0],
     [0, 1],
     [1, 1]]

y = [0, 1, 1, 0]