## <font color='darkblue'>Prefce</font>
Here we are going to use a toy testing environment `GridWorld` to demonstrate the usage of this lab.

### <font color='darkgreen'>Importing Packages</font>
Firstly, let's import all the necessary packages:

In [1]:
from skyline import lab
from skyline.lab import gridworld_env
from skyline.lab import gridworld_utils

### <font color='darkgreen'>Make Lab Environment</font>
We can list supported environment as below:

In [2]:
lab.list_env()

===== GridWorld =====
This is a environment to show case of Skyline lab. The environment is a grid world where you can move up, down, right and leftif you don't encounter obstacle. When you obtain the reward (-1, 1, 2), the game is over. You can use env.info() to learn more.




Then We use function <font color='blue'>make</font> to obtain the desired environment. e.g.:

In [3]:
grid_env = lab.make(lab.Env.GridWorld)

In [4]:
# Check what our environment looks like:
grid_env.info()

- environment is a grid world
- x means you can't go there
- s means start position
- number means reward at that state
.  .  .  1
.  x  . -1
.  .  .  x
s  x  .  2



In [5]:
# Show available actions
grid_env.available_actions()

['U', 'D', 'L', 'R']

In [6]:
# Get current state
grid_env.current_state

GridState(i=3, j=0)

Let's take a action and check the state change:

In [7]:
# Take action 'Up'
grid_env.step('U')

# Check current state
grid_env.current_state

GridState(i=2, j=0)

After taking action `U`, we expect the axis-i to move up from 2->1 and we can confirm it from the output state. Let's reset the environment by calling method <font color='blue'>reset</font> which will bring the state of environment back to intial state `GridState(i=2, j=0)`:

In [8]:
# Reset environment
grid_env.reset()

# Check current state
grid_env.current_state

GridState(i=3, j=0)

## <font color='darkblue'>Experiments of RL algorithms</font>
Here we are going to test some well-known RL algorithms and demonstrate the usage of this lab:

### <font color='darkgreen'>Monte Carlo Method</font>
<b><font size='3ptx'>In this method, we simply simulate many trajectories (<font color='darkbrown'>decision processes</font>), and calculate the average returns.</font></b> ([wiki page](https://en.wikiversity.org/wiki/Reinforcement_Learning#Monte_Carlo_policy_evaluation))

We implement this algorithm in `monte_carlo.py`. The code below will demonstrate the usage of it:

In [9]:
from skyline.lab.alg import monte_carlo

In [10]:
mc_alg = monte_carlo.MonteCarlo()

In [11]:
grid_env.info()

- environment is a grid world
- x means you can't go there
- s means start position
- number means reward at that state
.  .  .  1
.  x  . -1
.  .  .  x
s  x  .  2



In [12]:
grid_env.random_action(gridworld_env.GridState(1, 0))

'U'

In [13]:
# Training
mc_alg.fit(grid_env)

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:03<00:00, 2538.64it/s]


Let's check what value function we get:

In [14]:
gridworld_utils.print_values(mc_alg._state_2_value, grid_env)

---------------------------
 1.18| 1.31| 1.46| 1.00|
---------------------------
 1.31| 0.00| 1.62|-1.00|
---------------------------
 1.46| 1.62| 1.80| 0.00|
---------------------------
 1.31| 0.00| 2.00| 2.00|


Then let's print the learned policy:

In [15]:
gridworld_utils.print_policy(mc_alg._policy, grid_env)

---------------------------
  R  |  R  |  D  |  ?  |
---------------------------
  D  |  x  |  D  |  ?  |
---------------------------
  R  |  R  |  D  |  x  |
---------------------------
  U  |  x  |  R  |  ?  |


Finally, let's reset the environment and play the game:

In [16]:
grid_env.reset()

In [17]:
# Play game util done
print(f'Begin state={grid_env.current_state}')
while not grid_env.is_done:
    result = mc_alg.play(grid_env)
    print(result)
print(f'Final state={grid_env.current_state}')

Begin state=GridState(i=3, j=0)
ActionResult(action='U', state=GridState(i=2, j=0), reward=0, is_done=False, is_truncated=False, info=None)
ActionResult(action='R', state=GridState(i=2, j=1), reward=0, is_done=False, is_truncated=False, info=None)
ActionResult(action='R', state=GridState(i=2, j=2), reward=0, is_done=False, is_truncated=False, info=None)
ActionResult(action='D', state=GridState(i=3, j=2), reward=0, is_done=False, is_truncated=False, info=None)
ActionResult(action='R', state=GridState(i=3, j=3), reward=2, is_done=True, is_truncated=False, info=None)
Final state=GridState(i=3, j=3)


In [18]:
mc_alg._state_2_value

{GridState(i=0, j=0): 1.1764648192771077,
 GridState(i=0, j=1): 1.3084620719178073,
 GridState(i=0, j=2): 1.4580000000000002,
 GridState(i=1, j=0): 1.3086440381679392,
 GridState(i=1, j=2): 1.6197544101433314,
 GridState(i=2, j=0): 1.4580000000000002,
 GridState(i=2, j=1): 1.6196147443519608,
 GridState(i=2, j=2): 1.7995799299883288,
 GridState(i=3, j=0): 1.3068093930421927,
 GridState(i=3, j=2): 2.0}

In [19]:
mc_alg._q

{GridState(i=0, j=0): {'U': 0,
  'D': 1.1605357643002028,
  'L': 0,
  'R': 1.1764648192771077},
 GridState(i=0, j=1): {'U': 0,
  'D': 0,
  'L': 1.0560951529411764,
  'R': 1.3084620719178073},
 GridState(i=0, j=2): {'U': 0,
  'D': 1.4580000000000002,
  'L': 1.1764287976539605,
  'R': 1.0},
 GridState(i=1, j=0): {'U': 1.04715191696751,
  'D': 1.3086440381679392,
  'L': 0,
  'R': 0},
 GridState(i=1, j=2): {'U': 1.3066297297297307,
  'D': 1.6197544101433314,
  'L': 0,
  'R': -1.0},
 GridState(i=2, j=0): {'U': 1.161856872000001,
  'D': 1.1668365269461092,
  'L': 0,
  'R': 1.4580000000000002},
 GridState(i=2, j=1): {'U': 0,
  'D': 0,
  'L': 1.3050743237704914,
  'R': 1.6196147443519608},
 GridState(i=2, j=2): {'U': 1.4530344827586203,
  'D': 1.7995799299883288,
  'L': 1.4499778441558437,
  'R': 0.0},
 GridState(i=3, j=0): {'U': 1.3068093930421927, 'D': 0, 'L': 0, 'R': 0},
 GridState(i=3, j=2): {'U': 1.6100205338809028, 'D': 0, 'L': 0, 'R': 2.0}}

## <font color='darkblue'>Supplement</font>
* [Udemy - Artificial Intelligence: Reinforcement Learning in Python](https://www.udemy.com/course/artificial-intelligence-reinforcement-learning-in-python/)