# Neuroevolution class

In this class we will investigate the use of Neat to solve the [Gym Lunar Lander game](https://www.gymlibrary.dev/environments/box2d/lunar_lander/).

First execute the following code which is used to render Gym environments in jupyter:

In [1]:
# Virtual display
from pyvirtualdisplay import Display

virtual_display = Display(visible=0, size=(14, 9))
virtual_display.start()
import matplotlib.pyplot as plt
%matplotlib inline
from IPython import display

def jrender(env, step=0, info=""):
    plt.figure("display",(100,3))
    plt.clf()
    plt.imshow(env.render())
    plt.title("%s | Step: %d | %s" % (env.spec.id,step, info))
    plt.axis('off')

    display.clear_output(wait=True)
    display.display(plt.gcf())
    plt.close()

## Task 1: Gym

Create an instance of the `LunarLander-v2` modifying the following example (modified from https://www.gymlibrary.dev/content/basic_usage/ )

Before running the code make the following changes:
- change render_mode to "rgb_array"
- modify the code to run for 200 steps
- add `jrender(env)` as the last instruction of the for loop to update the screen
- pass the step number as second parameter to the jrender

In [2]:
import gym
env = gym.make("LunarLander-v2", render_mode="human")

observation, info = env.reset()

for _ in range(1000):
    observation, reward, terminated, truncated, info = env.step(env.action_space.sample())

    if terminated or truncated:
        observation, info = env.reset()

env.close()

  if not isinstance(terminated, (bool, np.bool8)):


In [3]:
...

Ellipsis

Make the following changes to the code:
- remove the if statement (inlcuding the body) and change the for into a while loop to run the environment while the step method returns not terminated and not truncated
- pass as third parameter to jrender the total reward accumulated so far. You should see the reward decreases of about 100 points when crashing.
- to speedup the visualisation you can change the loop to call jrender every 5 frames or when terminated or truncated (to show the final reward)

In [4]:
...

Ellipsis

Refresh the different types of observsation and action spaces at https://www.gymlibrary.dev/content/basic_usage/#spaces . Print out the environment observation and action space.

In [5]:
...

Ellipsis

What does each value in the observation and action space represent? Consult th documentation of [Lunar Lander](https://www.gymlibrary.dev/environments/box2d/lunar_lander/) to find out.

In [6]:
...

Ellipsis

## Task 2: Neat

We are now experimenting with the basic XOR Neat example.

- Download the configuration file https://raw.githubusercontent.com/CodeReclaimers/neat-python/master/examples/xor/config-feedforward-partial by running the following

In [7]:
! wget https://raw.githubusercontent.com/CodeReclaimers/neat-python/master/examples/xor/config-feedforward-partial

--2025-02-10 12:47:57--  https://raw.githubusercontent.com/CodeReclaimers/neat-python/master/examples/xor/config-feedforward-partial
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8002::154, 2606:50c0:8003::154, 2606:50c0:8000::154, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8002::154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1931 (1.9K) [text/plain]
Saving to: ‘config-feedforward-partial’


2025-02-10 12:47:57 (13.8 MB/s) - ‘config-feedforward-partial’ saved [1931/1931]



You can edit the downloaded file by clicking on it from the jupyter home.


Execute the following code (modified version) from the base XOR Neat example at https://neat-python.readthedocs.io/en/latest/xor_example.html

In [8]:
"""
2-input XOR example -- this is most likely the simplest possible example.
"""

from __future__ import print_function
import os
import neat
# import visualize

# 2-input XOR inputs and expected outputs.
xor_inputs = [(0.0, 0.0), (0.0, 1.0), (1.0, 0.0), (1.0, 1.0)]
xor_outputs = [   (0.0,),     (1.0,),     (1.0,),     (0.0,)]


def eval_genomes(genomes, config):
    for genome_id, genome in genomes:
        genome.fitness = 4.0
        net = neat.nn.FeedForwardNetwork.create(genome, config)
        for xi, xo in zip(xor_inputs, xor_outputs):
            output = net.activate(xi)
            genome.fitness -= (output[0] - xo[0]) ** 2


def run(config_file):
    # Load configuration.
    config = neat.Config(neat.DefaultGenome, neat.DefaultReproduction,
                         neat.DefaultSpeciesSet, neat.DefaultStagnation,
                         config_file)

    # Create the population, which is the top-level object for a NEAT run.
    p = neat.Population(config)

    # Add a stdout reporter to show progress in the terminal.
    p.add_reporter(neat.StdOutReporter(True))
#     stats = neat.StatisticsReporter()
#     p.add_reporter(stats)
#     p.add_reporter(neat.Checkpointer(5))

    # Run for up to 300 generations.
    winner = p.run(eval_genomes, 300)

    # Display the winning genome.
    print('\nBest genome:\n{!s}'.format(winner))

    # Show output of the most fit genome against training data.
    print('\nOutput:')
    winner_net = neat.nn.FeedForwardNetwork.create(winner, config)
    for xi, xo in zip(xor_inputs, xor_outputs):
        output = winner_net.activate(xi)
        print("input {!r}, expected output {!r}, got {!r}".format(xi, xo, output))


Execute the run function passing the configuration file `config-feedforward-partial` downloaded before.

In [9]:
run("config-feedforward-partial")


 ****** Running generation 0 ****** 

Population's average fitness: 2.28860 stdev: 0.26536
Best fitness: 2.98951 - size: (1, 1) - species 1 - id 73
Average adjusted fitness: 0.288
Mean genetic distance 1.749, standard deviation 0.557
Population of 150 members in 2 species:
   ID   age  size  fitness  adj fit  stag
     1    0    77      3.0    0.336     0
     2    0    73      2.9    0.240     0
Total extinctions: 0
Generation time: 0.021 sec

 ****** Running generation 1 ****** 

Population's average fitness: 2.21862 stdev: 0.30503
Best fitness: 2.98951 - size: (1, 1) - species 1 - id 73
Average adjusted fitness: 0.217
Mean genetic distance 1.757, standard deviation 0.638
Population of 150 members in 2 species:
   ID   age  size  fitness  adj fit  stag
     1    1    77      3.0    0.271     1
     2    1    73      2.9    0.163     1
Total extinctions: 0
Generation time: 0.019 sec (0.020 average)

 ****** Running generation 2 ****** 

Population's average fitness: 2.25302 stdev: 0.

Take a moment to study the evolution progress in the XOR example.

## Task 3: Neat + Gym

Evolve a controller for the lunar lander starting from the above XOR and Gym examples.

The evolve_genome function should be changed to evaluate the fitness by computing the total reward accumulated by  each network in the environment:

- for each genome create a NN from the config and the genes
- reset returns the initial observation
- at each step the action to be applied to the environment could be selected by choosing the output neuron with the highest output e.g. using the `np.argmax` function
- finally remember to set the fitness (total reward) of the genome by setting `genome.fitness`

As the lunar lander starts with different velocity at each reset the reward obtained in each simulation (episode) will be different. To obtain a more stable fitness you can average the reward over multiple episodes. The number of episodes provide a tradeoff between fitness noise and computation time. You can set it to 5 episodes for now.

To evolve the agent with gym you can create a copy of the configuration file for the XOR example from the jupiter home and modify the number of inputs and outputs according to the lunar lander gym observation and action spaces. You can set the maximum fitness to 300 or disable the fitness termination check by setting `no_fitness_termination` to `True`.
Full documentation of the parameters is available from the [neat-python doc page](https://neat-python.readthedocs.io/en/latest/)

Before evolving the agent it would be useful to change the interface of the run function so that returns the population after the evolution and can optionally take a starting population as a parameter. In this way you can call run to evolve the population incrementally.

Evolve the controller for 10 generations and look at the behaviour of the best controller after the evolution. To do so you can use `population.best_genome` to access the best network after the 10 generations. This can then be used together with a configuration to create a network and simulate it on a new test episode calling jrender after each step as done previously.

You can also enable the checkpointer to save the progress every 5 generations by uncommenting the appropriate line in the XOR example. A population could be loaded from a checkpoint using `p = neat.Checkpointer().restore_checkpoint("neat-checkpoint-[NUM-CHECKPOINT]"))`

In [10]:
...

Ellipsis

The fitness of the agent after 10 generations should not be very high. To improve the performance it would usually require quite a high number of generations. Use the following strategies to reduce the training time:

- **Reward shaping:** Instead of evaluating the agent until the end of the episode you can stop the evaluation after 500-700 steps and decrease the fitness by 100 if the agent has not landed (terminated is False) by that time. Alternatively you can subtract a penalty for every timestep. This would bias the search toward more aggressive landing strategies, which will also terminate faster.
- **Incremental learning:** The fitness obtained averaging 5 episodes (initial conditions) is too noisy for the evolution to progress realiably, using about 20 episodes would make the fitness measurement much more reliable at the cost of substantially increasing the fitness computation time. A possible shortcut is use a fixed set of episodes for a number of generations. This can be done by passing a `seed` parameter (integer) to `env.reset` so that the fitness is evaluated over the same set of episodes across multiple generations. The episode counter could be used as the seed, potentially with some offset. In this way often 40-60 generations are usually sufficient to reach a reasonable fitness (100-200) on the training episodes. You can verify that the agent behaviour is good when tested using the seeds of the training episodes, but it can be significantly worse when the agent is tested using different seeds. Once the fitness has reached a basic flying competence on the training examples you can help the agent learn more general strategies by continuing training on either (i) different seeds, (ii) more episodes, or (iii) revert to random episodes. However there is a chance that the initial set of episodes would not be very representative of general cases and thus the evolution could overfit to a deep local minima and take a long time to recover when presented different episodes. This can also happen if the agent is trained for too long on the same episodes.

In [11]:
...

Ellipsis