<img src="../imgs/logo.png" width="20%" align="right" style="margin:0px 20px">


# Evolutionary Computation

## 5.3 Deep Neuroevolution

<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" align="left" src="https://i.creativecommons.org/l/by-sa/4.0/80x15.png" /></a>&nbsp;| Dennis G. Wilson | <a href="https://d9w.github.io/evolution/">https://d9w.github.io/evolution/</a>

# Deep Neuroevolution

Artificial neural networks are commonly used today in many applications, from phone apps to automatic piloting systems to search engines. These machine learning models contain many parameters and are usually optimized with stochastic gradient descent. However, evolutionary strategies can also be a great tool for optimizing neural network parameters, especially when there isn't a clear direction the training of the network should take. This is the case for reinforcement learning, so we'll look at a classic RL task in this section.

Because of the success of deep learning, where neural network architectures are "deep" by having many layers, this field is sometimes called deep neuroevolution. However, remember from tutorial 4 that researchers have been evolving neural networks long before the advent of deep learning.

In today's notebook, I'll be using some Python RL environments and using PyCall to interact with them in Julia.

In [2]:
using PyCall
using Conda

┌ Info: Precompiling PyCall [438e738f-606a-5dbb-bf0a-cddfbfd45ab0]
└ @ Base loading.jl:1278


In [3]:
Conda.add("gym")

┌ Info: Downloading miniconda installer ...
└ @ Conda /home/gbrivady/.julia/packages/Conda/x2UxR/src/Conda.jl:193
┌ Info: Installing miniconda ...
└ @ Conda /home/gbrivady/.julia/packages/Conda/x2UxR/src/Conda.jl:203


PREFIX=/home/gbrivady/.julia/conda/3
Unpacking payload ...
Extracting "urllib3-1.26.9-pyhd8ed1ab_0.tar.bz2"
Extracting "python-3.9.10-h85951f9_2_cpython.tar.bz2"
Extracting "ca-certificates-2021.10.8-ha878542_0.tar.bz2"
Extracting "ruamel_yaml-0.15.80-py39h3811e60_1006.tar.bz2"
Extracting "certifi-2021.10.8-py39hf3d152e_1.tar.bz2"
Extracting "zlib-1.2.11-h36c2ea0_1013.tar.bz2"
Extracting "_libgcc_mutex-0.1-conda_forge.tar.bz2"
Extracting "libzlib-1.2.11-h36c2ea0_1013.tar.bz2"
Extracting "tqdm-4.63.0-pyhd8ed1ab_0.tar.bz2"
Extracting "six-1.16.0-pyh6c4a22f_0.tar.bz2"
Extracting "libuuid-2.32.1-h7f98852_1000.tar.bz2"
Extracting "conda-package-handling-1.8.0-py39hb9d737c_0.tar.bz2"
Extracting "libgcc-ng-11.2.0-h1d223b6_14.tar.bz2"
Extracting "ld_impl_linux-64-2.36.1-hea4e1c9_2.tar.bz2"
Extracting "xz-5.2.5-h516909a_1.tar.bz2"
Extracting "pycosat-0.6.3-py39h3811e60_1009.tar.bz2"
Extracting "pysocks-1.7.1-py39hf3d152e_4.tar.bz2"
Extracting "colorama-0.4.4-pyh9f0ad1d_0.tar.bz2"
Extracting "tk

┌ Info: Running `conda install -y gym` in root environment
└ @ Conda /home/gbrivady/.julia/packages/Conda/x2UxR/src/Conda.jl:127


Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: /home/gbrivady/.julia/conda/3

  added / updated specs:
    - gym


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2021.10.8          |   py39hf3d152e_2         145 KB  conda-forge
    cloudpickle-2.0.0          |     pyhd8ed1ab_0          24 KB  conda-forge
    gym-0.21.0                 |   py39hef51801_2         1.8 MB  conda-forge
    libblas-3.9.0              |14_linux64_openblas          12 KB  conda-forge
    libcblas-3.9.0             |14_linux64_openblas          12 KB  conda-forge
    libgfortran-ng-11.2.0      |      h69a702a_15          23 KB  conda-forge
    libgfortran5-11.2.0        |      h5c6108e_15         1.7 MB  conda-forge
    liblapack-3.9.0            |14_linux64_openblas          12 KB  conda-forge
 

I'll also select a random seed. This means that whenever I generate random numbers, they'll follow a defined sequence. Finally, I'll import all of the code from the CMA-ES notebook, which I've put in a separate file.

In [4]:
import Random
Random.seed!(1234);

In [5]:
include("cmaes.jl");

Let's make a simple neural network. Remember the model of a neuron which multiplies inputs by synaptic weights then adds a bias. We'll construct a network with two internal, or hidden, layers. These layers will be fully connected - every neuron will connect to every other one in the next layer.

<img src="../imgs/neuron_model.jpeg" width="50%">
<img src="../imgs/cnn.png" width="50%">

In [6]:
struct FCLayer
    w::Array{Float64}
    b::Array{Float64}
end

struct SimpleANN
    l1::FCLayer
    l2::FCLayer
    out::FCLayer
end

We can write a construction method which just uses zeros as all weights and biases. We'll fill these with the genetic information later.

In [7]:
function SimpleANN(input::Int, N1::Int, N2::Int, output::Int)
    l1 = FCLayer(zeros(N1, input), zeros(N1))
    l2 = FCLayer(zeros(N2, N1), zeros(N2))
    out = FCLayer(zeros(output, N2), zeros(output))
    SimpleANN(l1, l2, out)
end

SimpleANN

Finally, we'll use our network to compute, passing an input in before the first layer and recording the activation of the output layer

In [8]:
ann = SimpleANN(5, 64, 64, 4);

In [9]:
function compute(inputs::Array{Float64}, ann::SimpleANN)
    x = ann.l1.w * inputs .+ ann.l1.b
    x = ann.l2.w * x .+ ann.l2.b
    x = ann.out.w * x .+ ann.out.b
    x
end

compute (generic function with 1 method)

Since all weights and biases are zeros, if we pass in zeros we should also get out zeros.

In [10]:
compute(zeros(5), ann)

4-element Array{Float64,1}:
 0.0
 0.0
 0.0
 0.0

Now that we have an ANN, let's test it. We'll evaluate individuals in the CartPole environment, where they must balance a pole on a cart to keep it upright. The actions our agent can take are to move the cart either right or left.

In [11]:
gym = pyimport("gym")

LoadError: [91mPyError (PyImport_ImportModule[39m

[91mThe Python package gym could not be imported by pyimport. Usually this means[39m
[91mthat you did not install gym in the Python version being used by PyCall.[39m

[91mPyCall is currently configured to use the Python version at:[39m

[91m/usr/bin/python3[39m

[91mand you should use whatever mechanism you usually use (apt-get, pip, conda,[39m
[91metcetera) to install the Python package containing the gym module.[39m

[91mOne alternative is to re-configure PyCall to use a different Python[39m
[91mversion on your system: set ENV["PYTHON"] to the path/name of the python[39m
[91mexecutable you want to use, run Pkg.build("PyCall"), and re-launch Julia.[39m

[91mAnother alternative is to configure PyCall to use a Julia-specific Python[39m
[91mdistribution via the Conda.jl package (which installs a private Anaconda[39m
[91mPython distribution), which has the advantage that packages can be installed[39m
[91mand kept up-to-date via Julia.  As explained in the PyCall documentation,[39m
[91mset ENV["PYTHON"]="", run Pkg.build("PyCall"), and re-launch Julia. Then,[39m
[91mTo install the gym module, you can use `pyimport_conda("gym", PKG)`,[39m
[91mwhere PKG is the Anaconda package that contains the module gym,[39m
[91mor alternatively you can use the Conda package directly (via[39m
[91m`using Conda` followed by `Conda.add` etcetera).[39m

[91m) <class 'ModuleNotFoundError'>[39m
[91mModuleNotFoundError("No module named 'gym'")[39m


In [None]:
env = gym.make("CartPole-v1")
n_in = 4
n_out = 2;

We'll run an entire episode, which terminates whenever the pole falls to a certain angle from the top.

In [None]:
function play_env(ann::SimpleANN; render=false)
    env = gym.make("CartPole-v1")
    env.seed(0)
    obs = env.reset()
    total_reward = 0.0
    done = false
    
    while ~done
        action = argmax(compute(obs, ann))-1
        obs, reward, done, _ = env.step(action)
        if render
            env.render()
        end
        total_reward += reward
    end
    env.close()
    env = nothing
    Base.GC.gc()
    total_reward
end

With our zero network, this won't be able to last very long, as it is always taking a constant action of 0.

In [None]:
ann = SimpleANN(n_in, 5, 5, n_out)
play_env(ann; render=true)

Let's write a new constructor for our network which takes in genes and sets all of the network parameters. We'll then optimize these genes.

In [None]:
function SimpleANN(genes::Array{Float64})
    ann = SimpleANN(n_in, 5, 5, n_out)
    layers = [ann.l1.w, ann.l1.b, ann.l2.w, ann.l2.b, ann.out.w, ann.out.b]
    L = 1
    j = 1
    for i in eachindex(genes)
        if j > length(layers[L])
            L += 1
            j = 1
        end
        layers[L][j] = genes[i]
        j += 1
    end
    ann
end

The objective function is then just to create an ANN and evaluate its performance on an episode of the CartPole benchmark. Because CMA-ES is minimizing, we'll return the negative.

In [None]:
function objective(genes::Array{Float64})
    ann = SimpleANN(genes)
    -play_env(ann)
end

Let's see how many genes we have now:

In [None]:
N = n_in*5 + 5 + 5*5 + 5 + 5*n_out + n_out

Now we can try a random individual, maybe it will do better!

In [None]:
ann = SimpleANN(randn(N))
play_env(ann; render=true)

Let's use the CMAES function we defined in the last notebook and optimize for just a few steps.

In [None]:
c = CMAES(N=N, µ=10, λ=30, τ=sqrt(N), τ_c=N^2, τ_σ=sqrt(N))
for i in 1:5
    step!(c, objective)
    println(i, " ", maximum(.-c.F_λ))
end

We might notice that our results go down. Remember that CMA-ES is not elitist! We should keep an external archive of the best results.

In [None]:
best = nothing
best_fit = -Inf
c = CMAES(N=N, µ=10, λ=30, τ=sqrt(N), τ_c=N^2, τ_σ=sqrt(N))
for i in 1:20
    step!(c, objective)
    bestind = argmin(c.F_λ)
    maxfit = -c.F_λ[bestind]
    println(i, " ", maxfit)
    if maxfit > best_fit
        best = copy(c.offspring[bestind])
        best_fit = maxfit
    end
    if best_fit == 500
        break
    end
end

Finally, we can see how the CMA-ES optimized invidual does on this benchmark.

In [None]:
ann = SimpleANN(best)
play_env(ann; render=true)

<div class="alert alert-success">
    <b>Exercise</b>
    <br/>
    We were sort of cheating before. This neural network only learned how to do well on one individual, the one which comes from seeing the environment with 0. Test is on an environment with a different seed. Does it still do well? Finally, re-run the evaluation, but don't use a random seed, or change it every time. What is the impact of a stochastic fitness on evolution?
</div>