# Introduction to the Environment

Mario, an iconic game that needs no introduction, is a game that has been used to show the capabities of reinforcement learning. In this notebook, we will walk through how the environment works and provide a background to the capabities of the specific Mario gym environment.

### Imports
Before we get everything to work, we must import gym_super_mario_bros and then we assign the gym environment at runtime.

In [None]:
from nes_py.wrappers import JoypadSpace
import gym_super_mario_bros
from gym_super_mario_bros.actions import SIMPLE_MOVEMENT

The full NES action space provides 256 discrete actions but the gym_super_mario_bros.actions provides us with a specific set of actions to use (`RIGHT_ONLY`, `SIMPLE_MOVEMENT`, and `COMPLEX_MOVEMENT`) for the `nes_py.wrappers.JoypadSpace` wrapper. The actions inside of RIGHT_ONLY are detailed below and the other actions available are detailed below in the Appendix.

``` Py
"""Static action sets for binary to discrete action space wrappers."""
# actions for the simple run right environment
RIGHT_ONLY = [
    ['NOOP'],
    ['right'],
    ['right', 'A'],
    ['right', 'B'],
    ['right', 'A', 'B'],
]
```

In this code, gym_super_mario_bros.make is an alias for gym.make and render can be removed to speed up the code.

The code starts with making the Super Mario Bros environment. There are a couple of options for the environment. This includes:
- SuperMarioBros-v0
- SuperMarioBros-v1
- SuperMarioBros-v2
- SuperMarioBros-v3
- SuperMarioBros2-v0
- SuperMarioBros2-v1

To be more specifc with the stages we use, we can also edit the stages a bit more.

We can use the template: `SuperMarioBros-<world>-<stage>-v<version>`

Where:
- <world> is a number in {1, 2, 3, 4, 5, 6, 7, 8} indicating the world
- <stage> is a number in {1, 2, 3, 4} indicating the stage within a world
- <version> is a number in {0, 1, 2, 3} specifying the ROM mode to use
    - 0: standard ROM
    - 1: downsampled ROM
    - 2: pixel ROM
    - 3: rectangle ROM
    
For example, we could use: `SuperMarioBros-4-2-v1`

There is also an option for random stage selection. For that, refer to the appendix.

In [None]:
env = gym_super_mario_bros.make('SuperMarioBros-4-2-v1')

In the next line of code, we add the environment that we're using and add the set of actions we have at our disposal.

In [None]:
env = JoypadSpace(env, SIMPLE_MOVEMENT)

We also initialize the variable `done` as `true`.

In [None]:
done = True

### Reward Function
The reward is determined by how far right the agent moves as fast as possible without dying. There are three separate variables important for calculating reward:
1. v: The difference in agent's x values between states
    - in this case, this is the instantaneous velocity for the given step
    - v = x1 - x0
        - x0 is the x position before the step
        - x1 is the x position after the step
    - moving right: v > 0 
    - mmoving left: v < 0
    - not moving: v = 0
2. c: the difference in the game clock between frames
    - the penalty prevents the agent from standing still
    - c = c0 - c1
        - c0 is the clock reading before the step
        - c1 is the clock reading after the step
    - no clock tick: c = 0
    - clock tick: c < 0
3. d: a death penalty that penalizes teh agent for dying in a state
    - this penalty encourages the agent to avoid death
    - alive: d = 0
    - dead: d = -15

The reward is in the range (-15, 15) and the formula for reward is r = v + c + d



### `info` dictionary
| Key        | Type   | Description
|:-----------|:-------|:------------------------------------------------------|
| `coins   ` | `int`  | The number of collected coins
| `flag_get` | `bool` | True if Mario reached a flag or ax
| `life`     | `int`  | The number of lives left, i.e., _{3, 2, 1}_
| `score`    | `int`  | The cumulative in-game score
| `stage`    | `int`  | The current stage, i.e., _{1, ..., 4}_
| `status`   | `str`  | Mario's status, i.e., _{'small', 'tall', 'fireball'}_
| `time`     | `int`  | The time left on the clock
| `world`    | `int`  | The current world, i.e., _{1, ..., 8}_
| `x_pos`    | `int`  | Mario's _x_ position in the stage (from the left)
| `y_pos`    | `int`  | Mario's _y_ position in the stage (from the bottom)

In [None]:
for step in range(5000):
    if done:
        state = env.reset()
    state, reward, done, info = env.step(env.action_space.sample())
    env.render()

env.close()

All the code together would run as such:

In [None]:
from nes_py.wrappers import JoypadSpace
import gym_super_mario_bros
from gym_super_mario_bros.actions import SIMPLE_MOVEMENT
env = gym_super_mario_bros.make('SuperMarioBros-v0')
env = JoypadSpace(env, SIMPLE_MOVEMENT)

done = True
for step in range(5000):
    if done:
        state = env.reset()
    state, reward, done, info = env.step(env.action_space.sample())
    env.render()

env.close()

## Appendix

### Action Spaces Available
These are the sets of actions available for the environment to use. For this, we can see different sets of actions the environment is able to use. Provided from [https://github.com/Kautenja/gym-super-mario-bros/blob/master/gym_super_mario_bros/actions.py].

``` Py
"""Static action sets for binary to discrete action space wrappers."""


# actions for the simple run right environment
RIGHT_ONLY = [
    ['NOOP'],
    ['right'],
    ['right', 'A'],
    ['right', 'B'],
    ['right', 'A', 'B'],
]

# actions for very simple movement
SIMPLE_MOVEMENT = [
    ['NOOP'],
    ['right'],
    ['right', 'A'],
    ['right', 'B'],
    ['right', 'A', 'B'],
    ['A'],
    ['left'],
]


# actions for more complex movement
COMPLEX_MOVEMENT = [
    ['NOOP'],
    ['right'],
    ['right', 'A'],
    ['right', 'B'],
    ['right', 'A', 'B'],
    ['A'],
    ['left'],
    ['left', 'A'],
    ['left', 'B'],
    ['left', 'A', 'B'],
    ['down'],
    ['up'],
]
```

### Random Stages

Random stage selection environment allows us to select random stages and provide a single attempt to clear it. The death will lead to a reset and the environment will randomly select a new stage. To use this, we append `RandomStages` to the `SuperMarioBros` id. 

``` py
gym.make('SuperMarioBrosRandomStages-v0', stages=['1-4', '2-4', '3-4', '4-4'])
```

We could also seed random stage selection using the `seed` method of the env (`env.seed(222)`) before calls to reset. We could also put the seed into the reset (`reset(seed=222)`).