<img src="images/deep_ga.png" align=right width=50%></img>
# Deep Neuroevolution
Author: Jin Yeom (jinyeom@utexas.edu)

## Contents
- [Configuration](#Configuration)
- [Genotype](#Genotype)
- [Phenotype](#Phenotype)
- [Environment](#Environment)
- [Genetic Algorithm (GA)](#Genetic-Algorithm-%28GA%29)

In [10]:
import random
from copy import deepcopy

import gym
import torch
from torch import nn
from torch.nn import functional as F
from torchvision.transforms import functional as T
from torchsummary import summary
from deap import creator, base, tools

## Configuration

In [2]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("device =", device)

device = cpu


In [19]:
SIGMA = 0.005
ENV_NAME = "CartPole-v1"
SCREEN_SIZE = 84
MAX_ITER = 500
N_GEN = 1000
POP_SIZE = 1000
N_SEL = 200

## Genotype

In [4]:
def rand_seed():
    return random.randint(0, 2**31-1)

In [5]:
def mutate(ind):
    ind.append(rand_seed())
    return ind,

In [6]:
def decode(genotype, tmpl_model, sigma):
    model = deepcopy(tmpl_model)
    for seed in genotype:
        # NOTE: in the paper, the first seed is used for initialization,
        # but in such case, every individual in the first generation is
        # initialized around zero; we probably don't want that.
        #
        # Instead, we're going to assume that all individuals are already
        # initialized with a better initialization method, e.g., Xavier.
        torch.manual_seed(seed)
        for param in model.parameters():
            param.data.add_(torch.randn_like(param) * sigma)
    return model

Ooh, looks like we're going to have to implement the phenotype before testing `deocde`.

## Phenotype

In [20]:
class NatureDQN(nn.Module):
    def __init__(self, in_channels=4, act_dim=18):
        super(NatureDQN, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, 32, kernel_size=8, stride=4)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=4, stride=2)
        self.conv3 = nn.Conv2d(64, 64, kernel_size=3, stride=1)
        self.fc4 = nn.Linear(7 * 7 * 64, 512)
        self.fc5 = nn.Linear(512, act_dim)
        
        def init_weights(m):
            if type(m) in {nn.Conv2d, nn.Linear}:
                torch.nn.init.xavier_uniform_(m.weight)
        self.apply(init_weights)
        
    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.relu(self.conv2(x))
        x = F.relu(self.conv3(x))
        x = x.view(x.size(0), -1)
        x = F.relu(self.fc4(x))
        return F.softmax(self.fc5(x), dim=-1)

Now, let's take a look at the network. Remember, `tmpl_model` below will be used to decode each individual genotype during evolution.

In [21]:
TMPL_MODEL = NatureDQN().to(device)
summary(TMPL_MODEL, (4, SCREEN_SIZE, SCREEN_SIZE))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 32, 20, 20]           8,224
            Conv2d-2             [-1, 64, 9, 9]          32,832
            Conv2d-3             [-1, 64, 7, 7]          36,928
            Linear-4                  [-1, 512]       1,606,144
            Linear-5                   [-1, 18]           9,234
Total params: 1,693,362
Trainable params: 1,693,362
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.11
Forward/backward pass size (MB): 0.17
Params size (MB): 6.46
Estimated Total Size (MB): 6.73
----------------------------------------------------------------


## Environment

In [13]:
def render(env):
    # from https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html
    # NOTE: this function is slightly modified for conciseness
    screen = env.render(mode="rgb_array").transpose((2, 0, 1))
    screen = screen[:, 160:320]
    view_width = 320
    
    def get_cart_location():
        world_width = env.x_threshold * 2
        scale = screen_width / world_width
        return int(env.state[0] * scale + screen_width / 2.0)
    
    cart_location = get_cart_location()
    if cart_location < view_width // 2:
        slice_range = slice(view_width)
    elif cart_location > (screen_width - view_width // 2):
        slice_range = slice(-view_width, None)
    else:
        slice_range = slice(cart_location - view_width // 2,
                            cart_location + view_width // 2)
        
    screen = screen[:, :, slice_range]
    screen = np.ascontiguousarray(screen, dtype=np.float32) / 255
    screen = T.to_pil_image(screen)
    screen = T.resize(screen, SCREEN_SIZE)
    screen = T.to_tensor(screen)
    
    return screen.unsqueeze(0).to(device)

In [16]:
env = gym.make(ENV_NAME)
render(env) # doesn't work on a chromebook!

[33mWARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.[0m


NameError: name 'base' is not defined

In [22]:
def evaluate(ind):
    env = gym.make(ENV_NAME)
    policy = decode(ind, TMPL_MODEL, SIGMA)
    policy.eval()
    
    obs = render(env)
    i = done = fitness = 0
    while not done and i < MAX_ITER:
        act_probs = policy(obs)
        action = torch.argmax(act_probs)
        _, reward, done, _ = env.step(action)
        obs = render(env)
        fitness += reward
        i += 1
    return fitness,

## Genetic Algorithm (GA)

In [13]:
creator.create("FitnessMax", base.Fitness, weights=(1.0,))
creator.create("Individual", list, fitness=creator.FitnessMax)



In [23]:
toolbox = base.Toolbox()
toolbox.register("individual", tools.initRepeat, list, rand_seed, n=1)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)
toolbox.register("mutate", mutate)
toolbox.register("evaluate", evaluate)
toolbox.register("select", tools.selBest, k=N_SEL)