**Practice session**

Build your own RL agent.

In [1]:
import gym
#import gym.envs.box2d.lunar_lander as ll
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
np.set_printoptions(precision=3)

Three environments in this practical session.

- [Blackjack](https://gym.openai.com/envs/Blackjack-v0/). The classic casino game.
- [Inverted Pendulum](https://gym.openai.com/envs/Pendulum-v0/). The inverted pendulum swingup problem is a classic problem in the control literature. In this version of the problem, the pendulum starts in a random position, and the goal is to swing it up so it stays upright.
- [Lunar Lander](https://gym.openai.com/envs/LunarLander-v2/). Land a lunar module on the moon (just like the Chinese recently did).

<div class="alert alert-warning"><b> Exercice:</b><br>
    Use the previous classes and the "getting started" below to implement an agent that learns an optimal control policy for one or several environments. Start with blackjack for a toy problem then move on to one of the two other ones.
</div>

# Blackjack

In [2]:
blackjack = gym.make('Blackjack-v0')

  result = entry_point.load(False)


In [3]:
help(blackjack)

Help on BlackjackEnv in module gym.envs.toy_text.blackjack object:

class BlackjackEnv(gym.core.Env)
 |  Simple blackjack environment
 |  
 |  Blackjack is a card game where the goal is to obtain cards that sum to as
 |  near as possible to 21 without going over.  They're playing against a fixed
 |  dealer.
 |  Face cards (Jack, Queen, King) have point value 10.
 |  Aces can either count as 11 or 1, and it's called 'usable' at 11.
 |  This game is placed with an infinite deck (or with replacement).
 |  The game starts with each (player and dealer) having one face up and one
 |  face down card.
 |  
 |  The player can request additional cards (hit=1) until they decide to stop
 |  (stick=0) or exceed 21 (bust).
 |  
 |  After the player sticks, the dealer reveals their facedown card, and draws
 |  until their sum is 17 or greater.  If the dealer goes bust the player wins.
 |  
 |  If neither player nor dealer busts, the outcome (win, lose, draw) is
 |  decided by whose sum is closer to 2

In [None]:
# state = [player's current sum, dealer's one showing card, player has a usable ace]
x = blackjack.reset()
print(x)
# actions = 0: stick (stop their turn), 1: hit (ask for an additional card)
print("action space: ", blackjack.action_space)
# rewards: -1 for loosing, +1 for winning
y,r,d,_ = blackjack.step(0)
print(y)
print(r)

# Pendulum

In [None]:
pendulum = gym.make('Pendulum-v0')

In [None]:
# State space, action space
# x = (cos(theta), sin(theta), thetaDot)
x = pendulum.reset()  # random initialization
print(x)
# action = applied torque
a = [0.]
print("action space type:", pendulum.action_space)
print("lower bound on torque:", pendulum.action_space.low)
print("upper bound on torque:", pendulum.action_space.high)

In [None]:
# Transitions, rendering
pendulum.render()
for i in range(100):
    y,r,d,_=pendulum.step(a)
    pendulum.render()

For this session, you will work with discrete action spaces so, even though the simulator takes continous actionsas we only 

In [None]:
# Simplified, discrete actions space
actions = [[-2.], [0.], [2.]]
for i in range(100):
    y,r,d,_=pendulum.step(actions[2])
    pendulum.render()

# Lunar lander

In [None]:
lunarLander = gym.make('LunarLander-v2')

In [None]:
# State space, action space
# x = [position_x, 
#      position_y, 
#      velocity_x, 
#      velocity_y, 
#      angle, 
#      angular_velocity, 
#      left_leg_touches_ground, 
#      right_leg_touches_ground]
# action 0: do nothing, 
# action 1: fire left orientation engine, 
# action 2: fire main engine,
# action 3: fire right orientation engine
x = lunarLander.reset()
print(x)
print(lunarLander.action_space)

In [None]:
lunarLander.render()
for i in range(100):
    y,r,d,_=lunarLander.step(np.random.randint(4))
    lunarLander.render()

# Closing windows

In [None]:
pendulum.close()
lunarLander.close()

# Your RL agent

Use the above examples, the previous classes and your imagination to experiment with your own RL agent.