d-sandbox

<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 1200px">
</div>

-sandbox
# OpenAI gym Lab

## ![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) In this lesson you learn:<br>
 - How to use OpenAI gym
  
## ![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) Library Requirements

<img alt="Caution" title="Caution" style="vertical-align: text-bottom; position: relative; height:1.3em; top:0.0em" src="https://files.training.databricks.com/static/images/icon-warning.svg"/> Additional libraries must be attached to your cluster for this lesson to work.

We will use the PyPI library **`gym==0.15.4`**.
* This library is used to develop and compare reinforcement learning algorithms

## ![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) References
* [Gym documentation](https://gym.openai.com/docs/)
* Sutton book - Chapter 2

### Exercise 1
0. Familiarize yourself with OpenAI gym. Click [here](https://gym.openai.com/docs/)
0. Familiarize yourself with Env class. And how you can extend that. Click [here](https://github.com/openai/gym/blob/master/gym/envs/toy_text/discrete.py)

### Exercise 2
Consider the following environment:

1. There is a 5 by 5 grid.
2. There are 4 actions: UP, DOWN, LEFT and RIGHT. Each action results in a move. The reward for each action is 0 unless you try to leave the grid i.e. you are at the edges. If you are at the edge and you decide to leave the grid, the reward is -1. If you are at A and B, ANY action results in the move shown below and you receive +10 and +5 rewards, respectively.
4. Create GridWorldAdvancedEnvironment class to represent the environment for this problem.
<br>
![gridenv](https://files.training.databricks.com/images/rl/actions_ab.png)

In [5]:
# ANSWER
# Required libraries
import gym
import numpy as np
from gym import spaces
np.random.seed(1234)


class GridWorldAdvancedEnvironment(gym.Env):
  """This class describes the simple version of Deal or No Deal environment"""
  
  def __init__(self):
    
    # Define the immediate reward, number of actions and number of states
    
    self.UP = 0
    self.DOWN = 1
    self.RIGHT = 2
    self.LEFT = 3
    
    # Number of actions
    self.na = 4
    # Number of states
    self.ns = 25
    # Initial state
    self.state = 0
    

  # Define reset() method. This method reset the environment.
  def reset(self, state):
    return self._reset(state)
  
  # Define step() method. This method describes how the environment responds. 
  def step(self, action):
    return self._step(action)
  
  
  
  def _step(self, action):
  
    
    # Going up
    if action == self.UP:
      
      if self.state in range(5) and self.state != 1 and self.state != 3:
        self.state = self.state
        self.reward = -1
      
      elif self.state == 1:
        self.reward = 10
        self.state = 21
      
      elif self.state == 3:
        self.reward = 5
        self.state = 13
      
      else:
        self.reward = 0
        self.state -= 5
        
    # Going down    
    if action == self.DOWN:
      
      if self.state in range(20, 25) and self.state != 1 and self.state != 3:
        self.state = self.state
        self.reward = -1
        
      elif self.state == 1:
        self.reward = 10
        self.state = 21
      
      elif self.state == 3:
        self.reward = 5
        self.state = 13
      
      else:
        self.reward = 0
        self.state += 5
        
        
    # Going right   
    if action == self.RIGHT:
      
      if self.state in range(4, 25, 5) and self.state != 1 and self.state != 3:
        self.state = self.state
        self.reward = -1
        
      elif self.state == 1:
        self.reward = 10
        self.state = 21
      
      elif self.state == 3:
        self.reward = 5
        self.state = 13
      
      else:
        self.reward = 0
        self.state += 1
        
        
    # Going left    
    if action == self.LEFT:
      
      
      if self.state in range(0, 21, 5) and self.state != 1 and self.state != 3:
        self.state = self.state
        self.reward = -1
        
      elif self.state == 1:
        self.reward = 10
        self.state = 21
      
      elif self.state == 3:
        self.reward = 5
        self.state = 13
      else:
        self.reward = 0
        self.state -= 1
  
    
    self.is_done = False
  
    
    return [self.state, self.reward, self.is_done, {}]
  
  # This method reset the episode 
  def _reset(self, state):
    # Initial state
    self.state = state 

### Exercise 3
Gridworld example is shown below for your reference.Now that you have created this class, complete the following steps:<br><br>

0. Create an object of this class.
0. Explore the properties of the object:
 - How many actions are there?
 - How many states are there?
 - What are the possible actions for state 0 and 5?
0. For a random policy, what is the probability of moving up given you are at state 8? How about for state 10? 
0. What is the probability of moving up at \\(t\_{10}\\) if you are at state 9 given you were at state 6 at \\(t_0\\)?

![gridenv](https://files.training.databricks.com/images/rl/actions_ab.png)

In [7]:
# ANSWER
# Answers to q1 and q2
my_obj = GridWorldAdvancedEnvironment()
print(f" There are {my_obj.na} actions.")
print(f" There are {my_obj.ns} states.")
print(f" Possible actions for state 0 are UP, DOWN, LEFT, RIGHT")
print(f" Possible actions for state 5 are UP, DOWN, LEFT, RIGHT")


# Answers to q3 and q4
# 3. 0.25, 0.25
# 4. 0.25


-sandbox
&copy; 2020 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="http://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="http://help.databricks.com/">Support</a>