#### Copyright 2019 Google LLC.

In [0]:
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Reinforcement Learning

Reinforcement Learning (RL) is equivalent to teaching a machine to make complex decisions given the inputs from an environment in which an agent is tasked.  It is the third paradigm of machine learning after supervised, and unsupervised learning, and it is the AI problem .  The reinforcement part allows an AI agent to experiment in its environment while optimizing its actions through consequences, observations and rewards.  Deep Reinforcement Learning is a merger of Deep Learning and Reinforcement Learning, and is wonderfully taught in this [UC Berkeley course](https://sites.google.com/corp/view/deep-rl-bootcamp/lectures).


## Overview

### Learning Objectives

* Markov Decision Process
* Planning (Dynamic Programming)
* Policy Gradient Methods
* Deep RL

### Prerequisites

* CNN
* Sequential Models
* Deep Neural Networks

### Estimated Duration

60 minutes

### Grading Criteria

Each exercise is worth 3 points. The rubric for calculating those points is:

| Points | Description |
|--------|-------------|
| 0      | No attempt at exercise |
| 1      | Attempted exercise, but code does not run |
| 2      | Attempted exercise, code runs, but produces incorrect answer |
| 3      | Exercise completed successfully |

There are 2 exercises in this Colab so there are 6 points available. The grading scale will be 6 points.

## Reinforcement Learning (RL)

Over the past few years, RL has been used in many applications of robotics, flight, autonomous driving, video games, board games, and other simulations.  Facebook just open sourced [AI Habitat](https://ai.facebook.com/blog/open-sourcing-ai-habitat-an-simulation-platform-for-embodied-ai-research/), a research environment for training agents in photo-realistic environments.

Reinforcement learning will continue to push the state-of-the-art AI agents to reason about more complex tasks in both physical, and digital decision making.

### Fields of RL

<img src="https://raw.githubusercontent.com/googleDBS/images/master/RL.png" width="400">

[source: David Silver](http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Teaching_files/intro_RL.pdf)

### Markov Decision Process (MDP)

The roots of RL begin with the simple Agent-State model outlined in the MDP.   

<img src="https://docs.google.com/a/google.com/drawings/d/e/2PACX-1vRkmWkNLKmlxMIh3E_8UtTsXQvucl4NZyTfrrgi14MMU_KWjcX2YZkxRFb7jtQ_9G5yGLde4Q0t3FIo/pub?w=795&amp;h=443"
     width="500">

The assumption here us that the Agent is free to observe the state.

### OpenAI `gym` with Space Invaders

In [0]:
import gym
from gym import wrappers

env = gym.make('SpaceInvaders-v0')
env = wrappers.Monitor(env, "./gym-results", force=True)
env.reset()
for _ in range(1000):
    action = env.action_space.sample()
    observation, reward, done, info = env.step(action)
    if done: break
env.close()

In [0]:
import io
import base64
from IPython.display import HTML

video = io.open('./gym-results/openaigym.video.%s.video000000.mp4' % env.file_infix,
                'r+b').read()
encoded = base64.b64encode(video)
HTML(data='''
    <video width="360" height="auto" alt="test" controls>
    <source src="data:video/mp4;base64,{0}" type="video/mp4" /></video>'''
.format(encoded.decode('ascii')))

# Resources

* https://fullstackdeeplearning.com
* [UCL Course on RL](http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Teaching.html)
* [Berkeley Deep RL](https://sites.google.com/corp/view/deep-rl-bootcamp/lectures)
* [OpenAI Spinup](https://spinningup.openai.com/en/latest/index.html)
* [OpenAI gym](http://gym.openai.com/)

# Exercises

## Exercise 1

Implement OpenAI gym



### Student Solution

In [0]:
# Your answer goes here

### Answer Key

**Solution**

In [0]:
# Put the recommended solution here; if there is more than one "good" solution
# that you think students should know put those solutions in subsequent code
# boxes with "# Solution" in the first line.

**Validation**

In [0]:
# If the solution can be auto-graded, perform the autograding here.