## Notebook 1

To begin with we need a game design and a map on which to play it. We start off with something very simple; a room with a goal G in one corner and the agent A in the other, with some walls w between and around the outside, and a hole H that the agent must not fall into. In particular we have:

In [1]:
# Define level
level = """
wwwwwwwwwwwww
wA       w  w
w   w      ww
www w   wwwww
w       wwG w
w H  w      w
wwwwwwwwwwwww
"""

# Define game
game = """
BasicGame 
    LevelMapping
        G > goal
        A > avatar
        H > hole
        w > wall
        
    InteractionSet
        avatar wall > stepBack
        goal avatar > killSprite
        avatar hole > killSprite

    SpriteSet  
        structure > Immovable
            goal > color=GREEN
            hole > color=RED
            wall > color=BROWN
            
    TerminationSet
        # SpriteCounter stype=goal limit=0 win=True
        SpriteCounter stype=goal   win=True
        SpriteCounter stype=avatar win=False
"""

# Import necessary functions
import sys
sys.path.insert(0, 'pyvgdlmaster/vgdl')
from mdpmap import MDPconverter
from core import VGDLParser
from rlenvironment import RLEnvironment
import pygame
import numpy as np

# Start game and produce image
g = VGDLParser().parseGame(game)
g.buildLevel(level)
rle = RLEnvironment(game, level, observationType='global', visualize=True)
rle._game._drawAll()
pygame.image.save(rle._game.screen, "example.png")

ImportError: No module named pygame

The agent gets +10 points for reaching the goal, -10 points for falling in the hole, and -1 point every turn that it has not reached either the goal or the hole (so it has an incentive to reach the goal as quickly as possible). This ends up giving us the following game:

<img src="example.png">

We now want to be able to extract information in the form of (state, reward, action)\_t for each time step t so as to be able to learn schemas. Each state can be extracted from the VGDL in the form of a list that describes the world in terms of objects and their positions, with a possible action and reward at each state, as follows:

In [None]:
# Set up RLE
rle.actionDelay = 200
rle.recordingEnabled = True
rle.reset()

# Get intial state information
initState = rle._obstypes.copy()
state = rle.getState()
initState['agent'] = [(state[0], state[1])]
initReward = 0

# Initialise parameters
numSteps = 1
actions = np.array([0,0,1,0])
ended = False
rStates = []
rReward = []
rActions = []
rStates.append(initState)
rReward.append(initReward)

# Perform sequence of actions
for i in range(numSteps):
    if ended == False:
        # Take and record action
        rle._performAction(actions)
        action = rle._allEvents[-1][1]
        rActions.append(action)
        # Get and record new state information
        newState = rle._obstypes.copy()
        state = rle.getState()
        newState['agent'] = [(state[0], state[1])]
        rStates.append(newState)
        # Get and record new reward information
        (ended, won) = rle._isDone()
        if ended:
            if won:
                newReward = 10
            else:
                newReward = -10
        else:
            newReward = -1
        rRewards.append(newReward)
# Record final action as None
rActions.append(None)

From the raw state data above we binary-encode sets of matrices for learning schemas. See https://www.overleaf.com/read/nrfnchwgmpdg for details on the form of the matrices required. The schemas are learnt as columns of weight matrices and then converted to logical representations for use in planning and policy formation.