# Solving video games with *Artificial Intelligence* and *Evolution*

<a href="https://colab.research.google.com/github/deepmind/educational/blob/master/colabs/introductory/fluttering_avians.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



> <p><small><small>Copyright 2021 DeepMind Technologies Limited.</small></small></p>
> <p><small><small> Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at </small></small></p>
> <p><small><small> <a href="https://www.apache.org/licenses/LICENSE-2.0">https://www.apache.org/licenses/LICENSE-2.0</a> </small></small></p>
> <p><small><small> Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. </small></small></p>



**Aim of this Colab**

Give you basic intuition on Artificial Intelligence and how Evolution fits into it. It does so by walking you through developing an agent to play a video game.

**Disclaimer**

This code is intended for educational purposes and, in the name of usability by a non-technical audience, it does not always follow best practices for software engineering.

All images are copyright of Alphabet Inc.

**Links to resources**
- [What is Colab?](https://colab.research.google.com/notebooks/intro.ipynb) If you have never used Colab before, get started here!

# Artificial Intelligence and Games

## What do we mean by Artificial Intelligence?

Usually, when we think of *Artificial Intelligence* (or *AI*) in games, we think of the opponents *being programmed* to pose a challenge. These opponents adapt their strategy based on the difficulty level, or even our actions.

<center>
<img src="https://storage.googleapis.com/dm-educational/assets/FlutteringAvians/fighting.png" width="500"/>
</center>

However, here we are talking of a different kind of AI. The whole idea is to create *something* that can play the game the way a person would, that is, by using a (virtual) controller with the same functionality as the one that *we* would use to play the game.

<center>
<img src="https://storage.googleapis.com/dm-educational/assets/FlutteringAvians/robot_playing.png" width="250"/>
</center>

We refer to this *something* as an **agent**. Agents interact with an **environment** via *actions* (e.g. buttons pressed). The environment is the game, and the agent is the player.

<center>
<img src="https://storage.googleapis.com/dm-educational/assets/FlutteringAvians/rl_setting.png" width="500"/>
</center>

The goal of the agent is to get the highest possible *score* in the game, or, as we say in the AI community, to maximise its *reward*.

-------

But first, let's have some fun playing the game! :)

# Fluttering Avians

*Fluttering Avians* is a game we made so you could learn about *Artificial Intelligence*. It is similar to the famous [*Flappy Bird*](https://en.wikipedia.org/wiki/Flappy_Bird) game of 2013.

------

Place your mouse next to the `[ ]` sign in a cell, it should change to a play button ▶️. Click the play button to run the cell.

In [None]:
#@title Run your first cell!

print("Yay! you just ran the cell!")

If you ran the cell correctly, you should see:

`Yay! you just ran the cell!`


------

Now, let's get all the code and libraries we need to create a *Fluttering Avians* environment. Run the cell below. It should take less than a minute to complete.

Don't worry, you don't need to understand how this is happening, but if you are curious, you can take a peek at the actual code by double-clicking the cell below; the one that is called: `Set up the code for the Fluttering Avians environment`.

This will open up the whole code so you can see it. You can hide it again by double-clicking on the name (which will be to the right of the code) again.


In [None]:
#@title Set up the code for the Fluttering Avians environment

%%capture

!pip install dm_env
!pip install colabtools

from base64 import b64encode
import copy
import dm_env
import math
import matplotlib.pyplot as plt
from matplotlib import animation, rc
import numpy as np
import PIL
from PIL import ImageDraw
from PIL import ImageFont
import threading
import tree
import time

from google.colab import output
import IPython

import requests
from io import BytesIO



def load_resource(url):
  r = requests.get(
      'https://storage.googleapis.com/dm-educational/assets/FlutteringAvians/' +
      '{}'.format(url), stream=True)
  return PIL.Image.open(BytesIO(r.content)).convert()

def load_font(size=18):
  r = requests.get(
      'https://storage.googleapis.com/dm-educational/assets/FlutteringAvians/' +
      'Roboto-Regular.ttf', stream=True)
  return PIL.ImageFont.truetype(BytesIO(r.content), size)

def encode_observation(ts):
  b = BytesIO()
  ts.observation['Pixels'].save(b, format='png')
  return b64encode(b.getvalue()).decode('utf-8')


class FlutteringAvians(dm_env.Environment):

  def __init__(self, render_pixels: bool = True, flutter_speed: float = 0.02,
               pipe_speed: float = 0.02, pipe_hole_size: float = 0.2,
               flutter_up_angle: float = 30, flutter_down_angle: float = 175,
               flutter_angle_delta: float = 10, frames_per_pipe: int = 30):
    """Fluttering Avians is a fast-paced, obstacle-avoidance, auto-scoller.
    """

    # Load the sprites
    self._bg = load_resource('Background.png')
    self._pipes_up = load_resource('Obstacle_From_Top.png')
    self._pipes_dn = load_resource('Obstacle_From_Bottom.png')
    self._flutter1 = load_resource('Red_Bird_Wing_Down.png')
    self._flutter2 = load_resource('Red_Bird_Wing_Mid.png')
    self._flutter3 = load_resource('Red_Bird_Wing_Up.png')
    self._flappies = [
        self._flutter1, self._flutter2, self._flutter3, self._flutter2]

    # The background image sets the whole frame size
    self._width = self._bg.size[0]
    self._height = self._bg.size[1]

    self._render_pixels = render_pixels
    self._flutter_speed = flutter_speed * self._height
    self._pipe_speed = pipe_speed * self._width
    self._pipe_hole_size = pipe_hole_size * self._height

    # Angles are given clockwise, 0 pointing up.
    self._flutter_up_angle = flutter_up_angle
    self._flutter_down_angle = flutter_down_angle
    self._flutter_angle_delta = flutter_angle_delta
    self._frames_per_pipe = frames_per_pipe
    self._sprite_count_frames = 3

    self._max_pipes = int(
        self._width / self._pipe_speed / self._frames_per_pipe) + 2

    self._font = load_font(32)
    self._score_font = load_font(18)
    self._init_state()

  def _init_state(self):
    self._flutter_angle = 60
    self._flutter_x = self._width * 0.25
    self._flutter_y = self._height * 0.5
    self._score = 0
    self._next_pipe = self._frames_per_pipe
    # Put first pipe in state.
    self._current_pipes = []
    self._add_pipe()
    self._game_over_flag = False
    self._sprite_count = 0

  def _add_pipe(self):
    # Minimum distance from edge of screen the hole of the pipe can be.
    bounds = 150
    self._current_pipes.append(
        [self._width + self._pipes_up.size[0],
         np.random.randint(bounds, self._height - bounds)])

  def _get_frame(self):
    frame = self._bg.copy()
    flutter = self._flappies[self._sprite_count // self._sprite_count_frames %
                            len(self._flappies)]

    sprite = flutter.rotate(90 - self._flutter_angle,
                            resample=PIL.Image.BICUBIC, expand=True)
    flutter_sz = sprite.size
    frame.paste(sprite,
                (int(self._flutter_x - flutter_sz[0]/2),
                 int(self._flutter_y - flutter_sz[1]/2)),
                sprite)
    pipe_sz = self._pipes_up.size
    for pipe in self._current_pipes:
      frame.paste(self._pipes_up,
                  (int(pipe[0] - pipe_sz[0]/2),
                   int(pipe[1] - pipe_sz[1] - self._pipe_hole_size/2)),
                  self._pipes_up)
      frame.paste(self._pipes_dn,
                  (int(pipe[0] - pipe_sz[0]/2),
                   int(pipe[1] + self._pipe_hole_size/2)),
                  self._pipes_dn)
    draw = ImageDraw.Draw(frame)
    if self._game_over_flag:
      draw.text((50, 160), "GAME OVER", (204, 136, 204), font=self._font)
    draw.text((20, 20), f"Score: {int(self._score)}",
              (0, 0, 0), font=self._score_font)
    return frame

  def _get_observation(self):
    # Copy the pipes
    pipes = self._current_pipes[:]
    assert len(pipes) <= self._max_pipes
    while len(pipes) < self._max_pipes:
      pipes.append([0, 0])
    obs = {
        'Features': {
            'angle': self._flutter_angle,
            'x': self._flutter_x,
            'y': self._flutter_y,
            'pipes': pipes,
        }}
    if self._render_pixels:
      obs['Pixels'] = self._get_frame()
    return obs

  def _collides_with_pipe(self, pipe, point):
    clearance_x = self._pipes_up.size[0] + self._flutter1.size[0]*0.75
    if (pipe[0] + clearance_x/2 >= point[0] >= pipe[0] - clearance_x/2):
      clearance_y = self._pipe_hole_size - self._flutter1.size[1]*0.75
      if (pipe[1] + clearance_y/2 >= point[1] >= pipe[1] - clearance_y/2):
        return False
      else:
        return True
      return False

  def reset(self):
    self._init_state()

    return dm_env.TimeStep(
        step_type=dm_env.StepType.FIRST,
        reward=0,
        discount=1.0,
        observation=self._get_observation())

  def step(self, action):
    if self._game_over_flag:
      return self.reset()

    reward = 0
    self._next_pipe -= 1
    self._sprite_count += 1
    if self._next_pipe <= 0:
      self._next_pipe = self._frames_per_pipe
      self._add_pipe()

    self._flutter_y -= self._flutter_speed * math.cos(
        math.radians(self._flutter_angle))

    if action:
      self._flutter_angle = self._flutter_up_angle
    else:
      self._flutter_angle = min(
          self._flutter_down_angle,
          self._flutter_angle + self._flutter_angle_delta)

    # If the pipe has left the screen, remove it
    if self._current_pipes[0][0] < -self._pipes_up.size[0]:
      self._current_pipes = self._current_pipes[1:]

    for pipe in self._current_pipes:
      pipe[0] -= self._pipe_speed
      if self._collides_with_pipe(pipe, [self._flutter_x, self._flutter_y]):
        reward = -1

    if self._flutter_y >= self._height or self._flutter_y <= 0:
      # End episode, show game over.
      reward = -1

    self._score += 1.0/self._frames_per_pipe

    if reward < 0:
      self._game_over_flag = True

    obs = self._get_observation()
    return dm_env.TimeStep(
        step_type=dm_env.StepType.MID if reward >= 0 else dm_env.StepType.LAST,
        reward=reward,
        discount=1.0,
        observation=obs)

  def action_spec(self):
    dm_env.specs.DiscreteArray(2)

  def observation_spec(self):
    spec = {
        'Features': {
            'angle': dm_env.specs.Array([], np.float32),
            'x': dm_env.specs.Array([], np.float32),
            'y': dm_env.specs.Array([], np.float32),
            'pipes': dm_env.specs.Array([self._max_pipes, 2], np.float32),
        }
    }
    if self._render_pixels:
      spec['Pixels'] = dm_env.specs.Array(
          [self._bg.size[0], self._bg.size[1], 4], np.uint8)
    return spec




def get_trace(agent, env, max_steps: int = 500):
  ts = env.reset()
  trace = [ts.observation]
  while ts.step_type != dm_env.StepType.LAST and max_steps > 0:
    policy_val = agent.step(ts.observation)
    action = 1 if policy_val > 0.5 else 0
    ts = env.step(action)
    trace.append(ts.observation)
    max_steps -= 1
  return trace

def prepare_animation():
  fig, ax = plt.subplots()
  fig.set_figheight(512/60)
  fig.set_figwidth(288/60)
  ax.grid(False)
  ax.axis('off')
  image = ax.imshow(np.zeros([512, 288, 4]), aspect='equal')
  return fig, ax, image



env = FlutteringAvians()


Now we can play the game!

In [None]:
#@title Play the game!

#@markdown Avoid the obstacles! Click the image to flutter up.

def encode_image(img):
  b = BytesIO()
  img.save(b, format='png')
  return b64encode(b.getvalue()).decode('utf-8')

IPython.display.display(IPython.display.Javascript('''
bgimage = new Image();
bgimage.src = 'data:image/png;base64,{}';

flutter1image = new Image();
flutter1image.src = 'data:image/png;base64,{}';
flutter2image = new Image();
flutter2image.src = 'data:image/png;base64,{}';
flutter3image = new Image();
flutter3image.src = 'data:image/png;base64,{}';

pipesupimage = new Image();
pipesupimage.src = 'data:image/png;base64,{}';
pipesdnimage = new Image();
pipesdnimage.src = 'data:image/png;base64,{}';

'''.format(encode_image(env._bg),
           encode_image(env._flutter1),
           encode_image(env._flutter2),
           encode_image(env._flutter3),
           encode_image(env._pipes_up),
           encode_image(env._pipes_dn),
           )))

IPython.display.display(IPython.display.HTML('''
<center>
<div id='maindiv'>
<canvas id='canvas' width="288" height="512"
style="border:3px solid #616161;"></canvas>
</div>
<div id='play_button_div'>
<button onclick="restart()">Restart game</button>
</div>
</center>

<script>
bg_w = 288
bg_h = 512
flutter_w = 31
flutter_h = 23
pipe_w = 52
pipe_h = 320

class FlutteringAvians {
  constructor(flutter_speed = 0.02, pipe_speed = 0.02,
              pipe_hole_size = 0.2, flutter_up_angle = 30,
              flutter_down_angle = 175, flutter_angle_delta = 10,
              frames_per_pipe = 30) {
    this._flappies = [
        flutter1image, flutter2image, flutter3image, flutter2image];

    // The background image sets the whole frame size
    this._width = bg_w;
    this._height = bg_h;

    this._flutter_speed = flutter_speed * this._height;
    this._pipe_speed = pipe_speed * this._width;
    this._pipe_hole_size = pipe_hole_size * this._height;

    // Angles are given clockwise, 0 pointing up.
    this._flutter_up_angle = flutter_up_angle;
    this._flutter_down_angle = flutter_down_angle;
    this._flutter_angle_delta = flutter_angle_delta;
    this._frames_per_pipe = frames_per_pipe;
    this._sprite_count_frames = 3;

    this._max_pipes = parseInt(
        this._width / this._pipe_speed / this._frames_per_pipe) + 2;

    this._init_state();
  }

  _init_state() {
    this._flutter_angle = 60;
    this._flutter_x = this._width * 0.25;
    this._flutter_y = this._height * 0.5;
    this._score = 0;
    this._next_pipe = this._frames_per_pipe;
    // Put first pipe in state.
    this._current_pipes = [];
    this._add_pipe();
    this._game_over_flag = false;
    this._sprite_count = 0;
  }

  _add_pipe() {
    // Minimum distance from edge of screen the hole of the pipe can be.
    var bounds = 150;
    this._current_pipes.push(
        [this._width + pipe_w,
         Math.random() * (this._height - 2*bounds) + bounds]);
  }

  _draw_frame() {
    var c = document.getElementById("canvas");
    var ctx = c.getContext("2d");

    var flutter = this._flappies[
      parseInt(this._sprite_count / this._sprite_count_frames) %
      this._flappies.length];

    ctx.drawImage(bgimage, 0, 0)
    for (var i = 0; i < this._current_pipes.length; i++) {
      var pipe = this._current_pipes[i];
      ctx.drawImage(pipesupimage,
                    pipe[0] - pipe_w/2,
                    pipe[1] - pipe_h - this._pipe_hole_size/2);
      ctx.drawImage(pipesdnimage,
                    pipe[0] - pipe_w/2,
                    pipe[1] + this._pipe_hole_size/2);
    }
    if (this._game_over_flag) {
      ctx.font = "32px Arial";
      ctx.fillStyle = "#CC88CC";
      ctx.fillText("GAME OVER", 50, 160);
    }
    ctx.font = "18px Arial";
    ctx.fillStyle = "#000000";
    ctx.fillText("Score: " + Math.floor(this._score), 20, 20);
    var rot = Math.PI * (this._flutter_angle - 90) / 180.0;
    var x = this._flutter_x;
    var y = this._flutter_y;
    ctx.translate(x, y);
    ctx.rotate(rot);
    ctx.drawImage(flutter, -flutter_w/2, -flutter_h/2);
    ctx.rotate(-rot);
    ctx.translate(-x, -y);
  }

  _collides_with_pipe(pipe, point) {
    var clearance_x = pipe_w + flutter_w*0.75
    if (pipe[0] + clearance_x/2 >= point[0] &&
        point[0] >= pipe[0] - clearance_x/2) {
      var clearance_y = this._pipe_hole_size - flutter_h*0.75;
      if (pipe[1] + clearance_y/2 >= point[1] &&
          point[1] >= pipe[1] - clearance_y/2) {
        return false;
      }
      else { return true; }
      return false;
    }
  }

  reset() {
    this._init_state();
  }

  step(action) {
    if (this._game_over_flag) {
      this.reset();
    }

    var reward = 0;
    this._next_pipe -= 1;
    this._sprite_count += 1;
    if (this._next_pipe <= 0) {
      this._next_pipe = this._frames_per_pipe;
      this._add_pipe();
    }

    this._flutter_y -= this._flutter_speed * Math.cos(
        Math.PI * this._flutter_angle / 180.0);

    if (action) {
      this._flutter_angle = this._flutter_up_angle;
    }
    else {
      this._flutter_angle = Math.min(
          this._flutter_down_angle,
          this._flutter_angle + this._flutter_angle_delta);
    }

    // If the pipe has left the screen, remove it
    if (this._current_pipes[0][0] < -this._pipe_w) {
      this._current_pipes = this._current_pipes.slice(
          1, this._current_pipes.length);
    }

    for ( var i = 0; i < this._current_pipes.length; i++ ) {
      var pipe = this._current_pipes[i];
      pipe[0] -= this._pipe_speed;
      if (this._collides_with_pipe(pipe, [this._flutter_x, this._flutter_y])) {
        reward = -1;
      }
    }

    if (this._flutter_y >= this._height || this._flutter_y <= 0) {
      // End episode, show game over.
      reward = -1;
    }

    this._score += 1.0/this._frames_per_pipe;

    if (reward < 0) {
      this._game_over_flag = true;
    }

    // Render frame
    this._draw_frame();

    return reward;
  }
};

window.onload = function() {
  var c = document.getElementById("canvas");
  var ctx = c.getContext("2d");
  var img = document.getElementById("img");
  ctx.drawImage(img, 10, 10, 150, 180);
};

env = new FlutteringAvians();
console.log('Loaded environment');
myStepFn = null

const move = () => {
  action = 1;
}
const step = () => {
  var reward = env.step(action);
  action = 0;
  stepCount++;
  if (reward < 0) {
    clearInterval(myStepFn);
  }
}
const restart = () => {
  action = 0;
  env.reset();
  clearInterval(myStepFn);
  myStepFn = setInterval(step, 50);
  stepCount = 0;
}

restart();

document.getElementById("canvas").addEventListener('click', move);
</script>
'''))

That's a hard game, huh?

Imagine a computer that would be able to learn to play perfectly and get the maximum score!

Well, that's what AI is all about. In particular, *Machine Learning*, a subfield of AI, studies how to make machines that *learn* directly from their own experience through trial and error, similar to the way *we* learn.

# This agent plays the game, so you don't have to!

An *agent* is anything that observes the current situation in the game (what we call a **state**) and chooses which action to take. In our case, whether to flutter or not.

Our game looks like this:

<center>
<img src="https://storage.googleapis.com/dm-educational/assets/FlutteringAvians/fluttering_avians.png" width="200"/>
</center>

So we can define our state to be made up of:

3. The current **_y_-coordinate** of the character
1. The current **angle** of the character
2. The **distance** to the next obstacle
4. The **height** of the hole in the obstacle

<center>
<img src="https://storage.googleapis.com/dm-educational/assets/FlutteringAvians/features.png" width="200" />
</center>

We take those four numbers, and we plug them into our agent. Out the other side we get the action.

<center>
<img src="https://storage.googleapis.com/dm-educational/assets/FlutteringAvians/agent.png" width="500"/>
</center>

Do you think you could think of a way to combine those four values to choose the right action for any state?

In many ways, this is what programmers do all the time! We write the code inside of those agents so that they do what we need them to. A simple idea would be to check if **our height** is similar to the **height of the hole**. If we are too low, we need to flutter! That function would look like:

```python
def fluttering_agent(y_coord, angle, distance, height):
  if y_coord > height:
    return 1
  else:
    return 0
```

The arguments (inputs) to the function implementing our agent are:
*   **`y_coord`**: The **_y_** coordinate of the character. `0` is at the top of the screen, the maximum value is `512` at the bottom of the screen.
*   **`angle`**: The angle of the character. These are measured clockwise in degrees starting at the top for `0`, right for `90`, down for `180`. Fluttering sets this angle to `30`. The maximum angle, when nose-diving, is `175`.
*   **`distance`**: The distance to the center of the next obstacle. This is always a positive number. The width of the screen is `288`. Our character is `72` pixels from the left edge.
*   **`height`**: The height in pixels of the center of the hole in the obstacle. Obstacles have an opening of `102` pixels. This is in the same units as **`y_coord`**.
  
The function should return either `0` for no action, or `1` to flutter.

-----

Give it a go! Improve the function in any way you want and see how your *agent* fares. :)

## Hand-code the agent!

**NOTE**: Don't forget to run the cell with your function once you've written it (and any time you change it).

In [None]:
def fluttering_agent(y_coord, angle, distance, height):
  # Your code goes below here.
  
  return 0

### One possible solution

Below is a manually written black box function that mostly works :). You can show it if you click on the _`↳ 1 cell hidden`_ message below.

The solution is not perfect, but it shows that this kind of function can be written, at least *in principle*. Don't worry if it doesn't make much sense, or if your solution looks very different to it. There are *many* ways of solving this problem!

In [None]:
def fluttering_agent(y_coord, angle, distance, height):
  if distance >= 140:
    if angle > 120:
      return 1
  if distance < 140:
    if height + 25 < y_coord:
      return 1
  return 0

## Test your manually coded agent!

Let's try out the agent you just coded to see how it performs in the game:

In [None]:
#@title Run your agent!


class ManualAgent(object):
  def __init__(self):
    pass

  def _state_from_obs(self, observation):
    x = observation['Features']['x']
    y = observation['Features']['y']
    for i, p in enumerate(observation['Features']['pipes']):
      if p[0] > x:
        next_p = observation['Features']['pipes'][i]
        break
    for i, p in enumerate(observation['Features']['pipes']):
      if p[0] > x and p[0] < next_p[0]:
        next_p = observation['Features']['pipes'][i]
    angle = observation['Features']['angle']
    # Hand crafted feature of distance to next pipe, y, angle and hole position.
    return [y, angle, next_p[0] - x, next_p[1]]

  def step(self, observation):
    """Evaluate the policy of the agent given an observation.

    Returns:
      logit for whether the 'up' action is taken.
    """
    state = self._state_from_obs(observation)
    return fluttering_agent(*state)



trace = get_trace(ManualAgent(), env)
fig, ax, image = prepare_animation()

def update(frame):
    image.set_data(np.asarray(trace[frame]['Pixels']))
    return image, 

anim = animation.FuncAnimation(fig, update, frames=range(len(trace)), blit=True,
                               interval=50)

plt.close()
rc('animation', html='jshtml')
anim



# How to train your agent

Don't worry if you are not familiar with programming. The whole point is that we want to use machine learning to make the computer learn on its own!

How would a computer do *that*?

Well, the computer will not write code like we do (although some researchers are working on that problem!). Instead, the computer will create a function with a whole lot of **parameters**, which are like little knobs and switches that can make it create just about any function we can think of. Then, over time, it will adapt those **parameters** based on its experience to make the function better and better at what it does.

A very popular choice nowadays is what we call an **Artificial Neural Network**. Similar to our brains, which are composed of a bunch of neurons, each of which is not that sophisticated, but that jointly end up making up whole human brains. Our *artificial neurons* are much simpler than real ones, and we will not have billions of them, but just a handful. Regardless, it will be enough to conquer Fluttering Avians!

So, we will now replace our black box, with an artificial neural network like the one depicted below. Don't worry too much about the details. Just notice that the neurons are taking the same inputs that our black box function did above, and they output whether we should flutter or not.

<center>
<img src="https://storage.googleapis.com/dm-educational/assets/FlutteringAvians/ann.png" width="500"/>
</center>


In [None]:
#@title Define an Agent using an Artificial Neural Network

def relu(x):
  return x if x > 0 else 0

def logistic(x, shape=1):
  value = x*shape
  if value > 100:
    return 1
  if value < -100:
    return 0
  return math.exp(value) / (1 + math.exp(value))

class Agent(object):
  def __init__(self,
               obs_spec: dict,
               act_spec: dm_env.specs.DiscreteArray,
               num_hidden: int):
    num_inputs = 4

    # initialise parameters between -1 and 1, uniformly.
    # Of the hidden layer; and
    self._params_hidden = 2 * np.random.random_sample(
        (num_hidden, int(num_inputs) + 1)) - 1

    # Of the output layer; and
    self._params_output = 2 * np.random.random_sample(num_hidden + 1) - 1

  def _state_from_obs(self, observation):
    x = observation['Features']['x']
    y = observation['Features']['y']
    for i, p in enumerate(observation['Features']['pipes']):
      if p[0] > x:
        next_p = observation['Features']['pipes'][i]
        break
    for i, p in enumerate(observation['Features']['pipes']):
      if p[0] > x and p[0] < next_p[0]:
        next_p = observation['Features']['pipes'][i]
    angle = observation['Features']['angle']
    # Hand crafted feature of distance to next pipe, y, angle and hole position.
    return [next_p[0] - x, y, angle, next_p[1], 1.]

  def step(self, observation):
    """Evaluate the policy of the agent given an observation.

    Returns:
      logit for whether the 'up' action is taken.
    """
    state = self._state_from_obs(observation)

    hidden = np.matmul(self._params_hidden, state)
    hidden = np.array([relu(x) for x in hidden])
    hidden = np.concatenate([hidden, [1.]])  # for bias in output layer
    output = np.dot(hidden, self._params_output)
    return logistic(output)

  def mutate(self, sigma: float = 0.05):
    self._params_hidden += np.random.normal(0, sigma, self._params_hidden.shape)
    self._params_output += np.random.normal(0, sigma, self._params_output.shape)

  def copy_from(self, agent: 'Agent'):
    np.copyto(self._params_hidden, agent._params_hidden)
    np.copyto(self._params_output, agent._params_output)




def eval_agent(
    agent: Agent, env: dm_env.Environment, max_fitness: int = 200) -> float:
  """Get the fitness of the given agent policy."""
  fitness = 0
  ts = env.reset()
  while ts.step_type != dm_env.StepType.LAST and fitness < max_fitness:
    fitness += 1
    policy_val = agent.step(ts.observation)
    action = 1 if policy_val > 0.5 else 0
    ts = env.step(action)
  return fitness

def eval_population(
    population: list, env: dm_env.Environment, replicas: int = 10,
    ) -> np.ndarray:
  fitness = np.zeros((len(population),))
  for i, ag in enumerate(population):
    for _ in range(replicas):
      fitness[i] += eval_agent(ag, env)
  return fitness / replicas

def init_population(pop_size, num_hidden, env):
  return [Agent(env.observation_spec(), env.action_spec(), num_hidden)
          for _ in range(pop_size)]

def selection(population, new_pop, env, sigma=0.1, mu=0.4):
  pop_size = len(population)
  fitness = eval_population(population, env)

  best = population[np.argmax(fitness)]

  # Get fitness by order
  order = sorted(enumerate(fitness), key=lambda x: x[1])
  # assign pseudo fitness based on ranking.
  pseudo_f = np.zeros((pop_size,))
  for i, pair in enumerate(order):
    pseudo_f[pair[0]] = i
  choices = np.random.choice(pop_size, size=pop_size,
                             replace=True, p=pseudo_f/sum(pseudo_f))
  for i in range(pop_size):
    new_pop[i].copy_from(population[choices[i]])
    if np.random.random_sample() < mu:
      new_pop[i].mutate(sigma)

  return fitness, best

def run_agent(agent, env, delay=0.05, ignore_result=True):
  ts = env.reset()
  while ts.step_type != dm_env.StepType.LAST:
    policy_val = agent.step(ts.observation)
    action = 1 if policy_val > 0.5 else 0
    ts = env.step(action)
    src = 'data:image/png;base64,{}'.format(encode_observation(ts))
    output.eval_js(
        f'document.getElementById("img").setAttribute("src", "{src}")',
        ignore_result=ignore_result)
    if delay:
      time.sleep(delay)


agent = Agent(env.observation_spec(), env.action_spec(), num_hidden=2)


The cell above creates an *agent* that has an artificial neural network inside, like the one in the image above, and that knows how to interact with our Fluttering Avians environment.

-----

Now, let's see how well it does in the game!

In [None]:
#@title Run the agent!

trace = get_trace(agent, env)
fig, ax, image = prepare_animation()

def update(frame):
    image.set_data(np.asarray(trace[frame]['Pixels']))
    return image, 

anim = animation.FuncAnimation(fig, update, frames=range(len(trace)), blit=True,
                               interval=50)

plt.close()
rc('animation', html='jshtml')
anim


Not that smart, sadly! This is because the agent is not learning yet. This is its very first interaction with the world. It is much less capable than even a newborn baby... in fact, it is acting essentially at random.

# Train Fluttering Avians

Ok, we've created an agent, but we still haven't figured out how it will change all its internal switches and knobs (its parameters). This is where the crux of machine learning lies!

There are many, many (many!) ways to do this, and it is not clear that one of them is always better than the other. This is an active area of research where we still don't know the best solutions (also called **algorithms**). How very exciting, huh?! Maybe *you* will come up with a better solution than the ones we have! Why not?

One of the most famous techniques to solve these kinds of problems is *Reinforcement Learning* (of which DeepMind does quite a bit). However, here we will use something different: we will use a solution from the family of *Evolutionary Algorithms* (or *EAs*).

## Evolutionary Algorithms

Instead of having a single agent that tries to solve the game, we will have a whole bunch of them, a **population**.

<center>
<img src="https://storage.googleapis.com/dm-educational/assets/FlutteringAvians/population.png" width="300" />
</center>

Each of the agents will play the game for a while. We will keep track of which agents get further in the game than the others. That is, we will **evaluate** them.

<center>
<img src="https://storage.googleapis.com/dm-educational/assets/FlutteringAvians/evaluation.png" width="300" />
</center>

Then, we will remove the worst performing agents to make space for new agents to try to solve the game. The new agents will be descendants of the best performing agents of the previous generation. That is, agents will **reproduce** over time.

<center>
<img src="https://storage.googleapis.com/dm-educational/assets/FlutteringAvians/reproduction.png" width="300" />
</center>

The best agents reproduce more than worse ones. When an agent reproduces, their *offspring* are not an exact copy of their parents (what would the point of that!?), instead, we will wiggle their internal parameters a little bit in the hope that they discover something new. That is, they will **mutate**.

<center>
<img src="https://storage.googleapis.com/dm-educational/assets/FlutteringAvians/mutation.png" width="300" />
</center>

-----

And that's it! We just repeat this process over and over until we get a good solution! These algorithms are called *Evolutionary* because they roughly mimic the process of evolution through natural selection.

## Evolution on a population to train an agent

Now we are ready to train some agents!

You can choose below how many agents there will be in your population, and how many neurons each of them will have. Run the cell to create the population.


In [None]:
#@title Initialise the population

#@markdown This is the number of agents in the population. The larger it is, the
#@markdown more diversity of behaviours we will see, but the slower it will run.
population_size = 50 #@param {type:"slider", min:10, max:100, step:10}

#@markdown This is the number of internal (or *hidden*) neurons in each agent's
#@markdown artificial brain.
number_of_hidden_neurons = 3 #@param {type:"slider", min:1, max:10, step:1}

train_env = FlutteringAvians(render_pixels=False)  
pop = init_population(population_size, number_of_hidden_neurons, train_env)
new_pop = init_population(population_size, number_of_hidden_neurons, train_env)

OK, now that we have a population, we can do a few generations of the algorithm. We'll keep track of the best agent so far so we can see its progress.

You can re-run the cells below to train and see your best agent's performance. If you want to start over with a new population, just run the **Initialise population** cell above.

----

That's it! Try to get an agent that gets as close to 200 as you can. **Good luck!**

In [None]:
#@title Evaluate, select, reproduce and mutate!

from google.colab import widgets
grid = widgets.Grid(1, 1)

#@markdown This is the number of generations that we will run our evolutionary
#@markdown algorithm. The larger the number, the better (usually) our agents
#@markdown will get. More generations will take longer to run. (Remember you can
#@markdown re-run this cell to keep training anyway).
generations = 30 #@param {type:"slider", min:1, max:50, step:1}

#@markdown This is the probability that any new agent (i.e. the products of
#@markdown reproduction) will change its parameters.
mutation_rate = 0.6 #@param {type:"slider", min:0, max:1, step:0.1}

#@markdown This is the amount of wiggle that a mutation will cause in an agent's
#@markdown parameters. Larger makes more exploration of new behaviours, but too
#@markdown large might break good behaviours that are already there.
mutation_intensity = 0.08 #@param {type:"slider", min:0.01, max:0.5, step:0.01}

#@markdown Whether to plot the performance of the best agent in the population.
plot_best = True #@param {type:"boolean"}

def plot_fitnesses(i, all_fitness):
  axes = plt.subplot()
  axes.set_title('Agent performance')
  axes.set_xlabel('Generation')
  axes.set_ylabel('Score')
  if plot_best:
    axes.plot(range(i+1), [np.max(f) for f in all_fitness], label='Best',
              linestyle='--')
  axes.plot(range(i+1), [np.mean(f) for f in all_fitness], label='Mean')
  axes.fill_between(range(i+1),
                    [np.percentile(f, 5) for f in all_fitness],
                    [np.percentile(f, 95) for f in all_fitness], alpha=0.4)
  axes.legend()
  return axes

# List of all fitness values over time.
fitness_all = []

with grid.output_to(0, 0):
  grid.clear_cell()
  plot_fitnesses(-1, fitness_all)

for i in range(generations):
  fitness, best = selection(
      pop, new_pop, train_env, mutation_intensity, mutation_rate)
  fitness_all.append(fitness)
  with grid.output_to(0, 0):
    grid.clear_cell()
    plot_fitnesses(i, fitness_all)

  # print('fitness', list(reversed(sorted(fitness))))
  # Swap populations
  new_pop, pop = pop, new_pop


In [None]:
#@title Run the best agent in the population!

trace = get_trace(best, env)
fig, ax, image = prepare_animation()

def update(frame):
    image.set_data(np.asarray(trace[frame]['Pixels']))
    return image, 

anim = animation.FuncAnimation(fig, update, frames=range(len(trace)), blit=True,
                               interval=50)

plt.close()
rc('animation', html='jshtml')
anim


# THE END