## Evolutionary Robotics

In this notebook, we will discuss evolutionary robotics and have you implement an evolutionary algorithm to solve one of the tasks in the AI gym, namely the continuous mountain car task. As you will see, an initial implementation of an evolutionary algorithm for solving a robotics task is easily made. However, obtaining good results may - depending on the task - be hard, and understanding the solution may even be harder still.

The figure below shows the typical evolutionary robotics approach. An initial population is randomly generated. Then there is an iterative process of: (1) Evaluating all individuals (genomes) in the population, resulting in a fitness value for each individual, (2) Selecting the individuals that will be allowed to procreate, i.e., form the new generation, and (3) Vary on the genomes of the selected individuals (using cross-over, mutation, etc.). The process typically terminates either after a specified number of generations, or after convergence to an optimal solution. Evaluation involves the conversion of the genome (genotype) to the phenotype (e.g., setting the weights of a neural network to the values in the genome). Then the phenotype is tested out on the task, typically in simulation but in some works also on real robots. In robotics tasks, evaluation is a stochastic process and execution of the task by the robot can take a long time. 

<img src="evolutionary_robotics_process.jpg" width="50%"></img>
*Figure 1:* Depiction of the typical evolutionary robotics approach. Figure from: _Doncieux, S., Bredeche, N., Mouret, J. B., & Eiben, A. E. G. (2015). Evolutionary robotics: what, why, and where to. Frontiers in Robotics and AI, 2, 4._


## MountainCarContinuous-v0

In this notebook, you will apply an evolutionary robotics approach to the continuous mountain car task. In this task, the car needs to reach the flag on the right mountain, while observing its position and velocity, and acting by means of accelerating the car left or right. The car cannot go straight up the mountain, but has to build up momentum to succeed. The fitness function rewards reaching the flag, and punishes the use of control actions (using less energy to reach the hill top is better). Please see the general description of the task <A HREF="https://gym.openai.com/envs/MountainCarContinuous-v0/" TARGET="_blank">here</A> and the details of the task <A HREF="https://github.com/openai/gym/wiki/MountainCarContinuous-v0" TARGET="_blank">here</A>.

<img src="continuous_mountain_car.png" width="50%"></img>
*Figure 2:* Screenshot of the continuous mountain car task. 


Below you find code to evaluate an agent a single time in the mountain car environment. Please study the code, and note that the method ```act``` should be replaced in the end with a learned controller.

In [1]:
import run_cart
import gym
import numpy as np

class random_agent(object):
    """Random agent"""

    def act(self, observation, reward, done):
        return [2.0*np.random.rand()-1.0]

agent = random_agent()
reward = run_cart.run_cart_continuous(agent, graphics=False)
print('Reward = ' + str(reward))

Reward = -32.9296307284816




## Using a continuous time recurrent neural network as controller

Of course, random control is not going to solve the task. Below, we 

In [4]:
from CTRNN import CTRNN
from scipy.sparse import csr_matrix

class CTRNN_agent(object):
    
    """ Continuous Time Recurrent Neural Network agent. """
    
    n_observations = 2;
    n_actions = 1;
    
    def __init__(self, network_size, weights=[], taus = [], gains = [], biases = []):
        self.network_size = network_size;
        if(self.network_size < self.n_observations + self.n_actions):
            self.network_size = self.n_observations + self.n_actions;
        self.cns = CTRNN(self.network_size, step_size=0.1) 
        if(len(weights) > 0):
            # weights must be a matrix size: network_size x network_size
            self.cns.weights = csr_matrix(weights)
        if(len(biases) > 0):
            self.cns.biases = biases
        if(len(taus) > 0):
            self.cns.taus = taus
        if(len(gains) > 0):
            self.gains = gains
    
    def act(self, observation, reward, done):
        external_inputs = np.asarray([0.0]*self.network_size)
        external_inputs[0:self.n_observations] = observation
        self.cns.euler_step(external_inputs)
        return self.cns.outputs[-self.n_actions:]

# run a CTRNN agent:
n_neurons = 10;
weights = np.zeros([n_neurons, n_neurons])
taus = np.asarray([0.1]*n_neurons)
gains = np.ones([n_neurons,])
biases = np.zeros([n_neurons,])
agent = CTRNN_agent(n_neurons, weights=weights, taus = taus, gains = gains, biases = biases)
reward = run_cart.run_cart_continuous(agent, graphics=False)

print('Reward = ' + str(reward))

Reward = -24.99999999999965
