## <font color='darkblue'>Prefce</font>
Here we are going to use a toy testing environment `GridWorld` to demonstrate the usage of this lab.

### <font color='darkgreen'>Importing Packages</font>
Firstly, let's import all the necessary packages:

In [1]:
from skyline import lab
from skyline.lab import gridworld_env
from skyline.lab import gridworld_utils

### <font color='darkgreen'>Make Lab Environment</font>
We use function <font color='blue'>make</font> to obtain the testing environment. e.g.:

In [2]:
grid_env = lab.make(lab.Env.GridWorld)

In [3]:
# Show available actions
grid_env.available_actions()

['U', 'D', 'L', 'R']

In [4]:
# Get current state
grid_env.current_state

GridState(i=2, j=0)

Let's take a action and check the state change:

In [5]:
# Take action 'Up'
grid_env.step('U')

# Check current state
grid_env.current_state

GridState(i=1, j=0)

After taking action `U`, we expect the axis-i to move up from 2->1 and we can confirm it from the output state. Let's reset the environment by calling method <font color='blue'>reset</font> which will bring the state of environment back to intial state `GridState(i=2, j=0)`:

In [6]:
# Reset environment
grid_env.reset()

# Check current state
grid_env.current_state

GridState(i=2, j=0)

## <font color='darkblue'>Experiments of RL algorithms</font>
Here we are going to test some well-known RL algorithms and demonstrate the usage of this lab:

### <font color='darkgreen'>Monte Carlo Method</font>
<b><font size='3ptx'>In this method, we simply simulate many trajectories (<font color='darkbrown'>decision processes</font>), and calculate the average returns.</font></b> ([wiki page](https://en.wikiversity.org/wiki/Reinforcement_Learning#Monte_Carlo_policy_evaluation))

We implement this algorithm in `monte_carlo.py`. The code below will demonstrate the usage of it:

In [7]:
from skyline.lab.alg import monte_carlo

In [8]:
mc_alg = monte_carlo.MonteCarlo()

In [9]:
grid_env.info()

- environment is a grid world
- x means you can't go there
- s means start position
- number means reward at that state
.  .  .  1
.  x  . -1
s  .  .  .



In [10]:
grid_env.random_action(gridworld_env.GridState(1, 0))

'U'

In [11]:
# Training
mc_alg.fit(grid_env)

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:03<00:00, 2812.32it/s]


In [12]:
gridworld_utils.print_values(mc_alg._state_2_value, grid_env)

---------------------------
 0.81| 0.90| 1.00| 0.00|
---------------------------
 0.73| 0.00| 0.90| 0.00|
---------------------------
 0.66| 0.73| 0.81| 0.72|


In [13]:
gridworld_utils.print_policy(mc_alg._policy, grid_env)

---------------------------
  R  |  R  |  R  |     |
---------------------------
  U  |     |  U  |     |
---------------------------
  U  |  R  |  U  |  L  |


In [14]:
grid_env.reset()

In [15]:
# Play game util done
while not grid_env.is_done:
    result = mc_alg.play(grid_env)
    print(result)

ActionResult(action='U', state=GridState(i=1, j=0), reward=0, is_done=False, is_truncated=False, info=None)
ActionResult(action='U', state=GridState(i=0, j=0), reward=0, is_done=False, is_truncated=False, info=None)
ActionResult(action='R', state=GridState(i=0, j=1), reward=0, is_done=False, is_truncated=False, info=None)
ActionResult(action='R', state=GridState(i=0, j=2), reward=0, is_done=False, is_truncated=False, info=None)
ActionResult(action='R', state=GridState(i=0, j=3), reward=1, is_done=True, is_truncated=False, info=None)


## <font color='darkblue'>Supplement</font>
* [Udemy - Artificial Intelligence: Reinforcement Learning in Python](https://www.udemy.com/course/artificial-intelligence-reinforcement-learning-in-python/)