There is an issue with the renders in Visual studio jupyter that shuts down the kernel when you force quit the window.

In [3]:
import gym
import time
import warnings
from collections import deque
from gym import spaces
import numpy as np
import numpy as np
warnings.filterwarnings("ignore", category=DeprecationWarning) 

In [2]:
gym.__version__

'0.25.1'

## Wrappers

You can opt not to use `render_mode='human'` when you initialize the `env`.  
This will stop the game window from appearing. 

In [None]:
env = gym.make("BreakoutNoFrameskip-v4")
# env = gym.make("BreakoutNoFrameskip-v4", render_mode='human')

print("Observation Space: ", env.observation_space.shape)
print("Action Space       ", env.action_space)

Our observation space is a continuous space of dimensions (210, 160, 3) corresponding to an RGB pixel observation of the same size.  
Our action space  contains 4 discrete actions (Left, Right, Do Nothing, Fire)

In [None]:
obs = env.reset()
score = 0
num_episodes = 5
for i_episode in range(num_episodes):
    for i in range(1000):
        action = env.action_space.sample()
        obs, reward, done, info = env.step(action)
        score += reward
        time.sleep(0.01)
        if done:
            print("\rEpisode:{} Score:{}".format(i_episode, score), end="")
            break
env.close()

Our observation space is a continuous space of dimensions (210, 160, 3) corresponding to an RGB pixel observation of the same size. Our action space  contains 4 discrete actions (Left, Right, Do Nothing, Fire)

Now that we have our environment loaded, let us suppose we have to make certain changes to the Atari Environment. It's a common practice in Deep RL that we construct our observation by concatenating the past `k` frames together. We have to modify the Breakout Environment such that both our `reset` and `step` functions return concatenated observations.

For this we define a class of type `gym.Wrapper` to override the `reset` and `return` functions of the Breakout `Env`. The `Wrapper` class, as the name suggests, is a wrapper on top of an `Env` class that modifies some of its attributes and functions.

The `__init__` function is defined with the `Env` class for which the wrapper is written, and the number of past frames to be concatenated. Note that we also need to redefine the observation space since we are now using concatenated frames as our observations. (We modify the observation space from (210, 160, 3) to (210, 160, 3 * num_past_frames.)

In the `reset` function, while we are initializing the environment, since we don't have any previous observations to concatenate, we concatenate just the initial observations repeatedly.

In [11]:
from gym.wrappers.atari_preprocessing import AtariPreprocessing

class ConcatObs(gym.Wrapper):
    def __init__(self, env, k):
        gym.Wrapper.__init__(self, env)
        self.k = k
        self.frames = deque([], maxlen=k)
        shp = env.observation_space.shape
        self.observation_space = \
            spaces.Box(low=0, high=255, shape=((k,) + shp), dtype=env.observation_space.dtype)

def reset(self):
    ob = self.env.reset()
    print(ob.shape)
    for _ in range(self.k):
        self.frames.append(ob)
    return 5

def step(self, action):
    ob, reward, done, info = self.env.step(action)
    self.frames.append(ob)
    return self._get_ob(), reward, done, info

def _get_ob(self):
    print("_get_ob")
    return np.array(self.frames)

In [12]:
env = gym.make("BreakoutNoFrameskip-v4")
wrapped_env = ConcatObs(env, 4)
print("The new observation space is", wrapped_env.observation_space)

The new observation space is Box(0, 255, (4, 210, 160, 3), uint8)


Let us now verify whether the observations are indeed concatenated or not.
> I'm not sure why its not displaying (4, 210, 160, 3)

In [13]:
wrapped_env.observation_space

Box(0, 255, (4, 210, 160, 3), uint8)

In [15]:
# Reset the Env
obs = wrapped_env.reset()
print("Intial obs is of the shape", obs.shape)

# Take one step
obs, _, _, _  = wrapped_env.step(0)
print("Obs after taking a step is", obs.shape)

Intial obs is of the shape (210, 160, 3)
Obs after taking a step is (210, 160, 3)


There is more to Wrappers than the vanilla Wrapper class. Gym also provides you with specific wrappers that target specific elements of the environment, such as observations, rewards, and actions. Their use is demonstrated in the following section.

`ObservationWrapper`: This helps us make changes to the observation using the observation method of the wrapper class.  
`RewardWrapper`: This helps us make changes to the reward using the reward function of the wrapper class.  
`ActionWrapper`: This helps us make changes to the action using the action function of the wrapper class.  


Let us suppose that we have to make the follow changes to our environment:

* We have to normalize the pixel observations by 255.
* We have to clip the rewards between 0 and 1.
* We have to prevent the slider from moving to the left (action 3).

In [4]:
import random 

class ObservationWrapper(gym.ObservationWrapper):
    def __init__(self, env):
        super().__init__(env)
    
    def observation(self, obs):
        # Normalise observation by 255
        return obs / 255.0

class RewardWrapper(gym.RewardWrapper):
    def __init__(self, env):
        super().__init__(env)
    
    def reward(self, reward):
        # Clip reward between 0 to 1
        return np.clip(reward, 0, 1)
    
class ActionWrapper(gym.ActionWrapper):
    def __init__(self, env):
        super().__init__(env)
    
    def action(self, action):
        if action == 3:
            return random.choice([0,1,2])
        else:
            return action

In [5]:
env = gym.make("BreakoutNoFrameskip-v4", render_mode='human')
wrapped_env = ObservationWrapper(RewardWrapper(ActionWrapper(env)))

obs = wrapped_env.reset()

for step in range(500):
    action = wrapped_env.action_space.sample()
    obs, reward, done, info = wrapped_env.step(action)
    
    # Raise a flag if values have not been vectorised properly
    if (obs > 1.0).any() or (obs < 0.0).any():
        print("Max and min value of observations out of range")
    
    # Raise a flag if reward has not been clipped.
    if reward < 0.0 or reward > 1.0:
        assert False, "Reward out of bounds"
    
    # Check the rendering if the slider moves to the left.
    time.sleep(0.001)

wrapped_env.close()

print("All checks passed")

A.L.E: Arcade Learning Environment (version 0.7.5+db37282)
[Powered by Stella]


All checks passed


In [6]:
print("Wrapped Env:", wrapped_env)
print("Unwrapped Env", wrapped_env.unwrapped)
print("Getting the meaning of actions", wrapped_env.unwrapped.get_action_meanings())

Wrapped Env: <ObservationWrapper<RewardWrapper<ActionWrapper<TimeLimit<OrderEnforcing<StepAPICompatibility<PassiveEnvChecker<AtariEnv<BreakoutNoFrameskip-v4>>>>>>>>>
Unwrapped Env <AtariEnv<BreakoutNoFrameskip-v4>>
Getting the meaning of actions ['NOOP', 'FIRE', 'RIGHT', 'LEFT']
