## Introduction to Psychsim
Psychsim is a tool for constructing and simulating social scenarios rapidly. It is an implementation of the Partially Observable Markov Decision Process (POMDP), and allows rational agents to make decisions. In this tutorial, we build a maze in a 2D grid world, and create an agent that can solve the maze. The maze is 5 tiles by 5 tiles in size with many obstacles blocking the path.

This scenario can be visualized by running demo.py code, which requires the pyglet graphics library.


Begin by importing the Psychsim modules:

In [1]:
from psychsim.reward import *
from psychsim.pwl import *
from psychsim.action import *
from psychsim.world import *
from psychsim.agent import *

## Creating an Agent
First, define and create a world as the setting for the scenario. Then, create some agent, give it a name, and set the number of steps it looks ahead (horizon). That agent has to be added to the world to act.

In [2]:
world = World()
actor = Agent('Agent')
actor.setHorizon(5)
world.addAgent(actor)
print(world.agents.keys())

['Agent']


## Defining the Agent States
An agent can have states, which can be defined in the world with the agent's name, the state's name, and the type. In the grid world, we define the agent's x and y coordinate as states, and the coordinate of the end of the maze.

In [3]:
world.defineState(actor.name,'x',int)
world.defineState(actor.name,'y',int)
world.defineState(actor.name, 'goal_x', int)
world.defineState(actor.name, 'goal_y', int);

We can also set the initial values for the states, and access them. In a fully-observable deterministic model, probability of a state is 100%.

In [4]:
world.setState(actor.name, 'x', 0)
world.setState(actor.name, 'y', 0)
world.setState(actor.name, 'goal_x', 5)
world.setState(actor.name, 'goal_y', 5)

print('x: ' + str(world.getState(actor.name,'x')) + ' , y: ' + str(world.getState(actor.name,'y')))
print('goal x: ' + str(world.getState(actor.name,'goal_x')) + ' , goal y: ' + str(world.getState(actor.name,'goal_y')))

x: 100%	0 , y: 100%	0
goal x: 100%	5 , goal y: 100%	5


Next, we define a list of obstacles in the maze that the agent needs to move around.

In [5]:
obstacles = [(0,1),(0,2),(0,3),(1,3),(2,3),(3,3),(4,3),(4,2),(4,1),(2,1),(2,0)]

## Define the Agent Actions
In the grid world, the agent can only move right, left, up, and down. We can add these actions to the agent. Additionally, each action has some effect on the world dynamics. For example, if the agent moves right, its x-coordinate increases by one. We have to create a decision tree for these actions and their effect on the world for the decision-making process.

In [6]:
move_right = actor.addAction({'verb': 'MoveRight'})
tree = makeTree(incrementMatrix(stateKey(move_right['subject'], 'x'), 1.))
world.setDynamics(stateKey(move_right['subject'], 'x'), move_right, tree)

move_left = actor.addAction({'verb': 'MoveLeft'})
tree = makeTree(incrementMatrix(stateKey(move_left['subject'], 'x'), -1.))
world.setDynamics(stateKey(move_left['subject'], 'x'), move_left, tree)

move_up = actor.addAction({'verb': 'MoveUp'})
tree = makeTree(incrementMatrix(stateKey(move_up['subject'], 'y'), 1.))
world.setDynamics(stateKey(move_up['subject'], 'y'), move_up, tree)

move_down = actor.addAction({'verb': 'MoveDown'})
tree = makeTree(incrementMatrix(stateKey(move_down['subject'], 'y'), -1.))
world.setDynamics(stateKey(move_down['subject'], 'y'), move_down, tree)

## Determining the Legality of Actions
Next, we have to determine the legality of the agent's actions when it comes to obstacles. The agent cannot move outside of the map's boundaries and certainly not onto the obstacles themselves. To set the legality of these actions, we need to build a decision tree that checks the agent's states compared to some value. The tree is the legality of the "move right" action, with respect to two obstacles on the map. With an increased number of obstacles, the trees get more complicated.
<img src="files/decisiontree.png">

Recursive function calls that check the x and y values of the agent's current location against the list of obstacles:

In [7]:
def add_branch_plus_x(i):
    if i == -1:
        return True
    else:
        return {'if': equalRow(stateKey(actor.name, 'x'), obstacles[i][0]-1), True: {'if': equalRow(stateKey(actor.name, 'y'), obstacles[i][1]),True: False, False: add_branch_plus_x(i-1)}, False: add_branch_plus_x(i-1)}
 
def add_branch_minus_x(i):
    if i == -1:
        return True
    else:
        return {'if': equalRow(stateKey(actor.name, 'x'), obstacles[i][0]+1), True: {'if': equalRow(stateKey(actor.name, 'y'), obstacles[i][1]),True: False, False: add_branch_minus_x(i-1)}, False: add_branch_minus_x(i-1)}

def add_branch_plus_y(i):
    if i == -1:
        return True
    else:
        return {'if': equalRow(stateKey(actor.name, 'y'), obstacles[i][1]-1), True: {'if': equalRow(stateKey(actor.name, 'x'), obstacles[i][0]),True: False, False: add_branch_plus_y(i-1)}, False: add_branch_plus_y(i-1)}

def add_branch_minus_y(i):
    if i == -1:
        return True
    else:
        return {'if': equalRow(stateKey(actor.name, 'y'), obstacles[i][1]+1), True: {'if': equalRow(stateKey(actor.name, 'x'), obstacles[i][0]),True: False, False: add_branch_minus_y(i-1)}, False: add_branch_minus_y(i-1)}

Moving right is invalid if:<br>
1) agent's x+1 = obstacle's x and agent's y = obstacle's y<br>
2) agent's x = 5, which is the rightmost map boundary.<br>

In [8]:
tree = makeTree({'if': equalRow(stateKey(actor.name, 'x'), '5'),
             True: False, False: add_branch_plus_x(len(obstacles)-1)})
actor.setLegal(move_right, tree)

Moving left is invalid if:<br>
1) agent's x-1 = obstacle's x and agent's y = obstacle's y<br>
2) agent's x = 0, which is the leftmost map boundary.<br>

In [9]:
tree = makeTree({'if': equalRow(stateKey(actor.name, 'x'), '0'),
             True: False, False: add_branch_minus_x(len(obstacles)-1)})
actor.setLegal(move_left, tree)

Moving up is invalid if:<br>
1) agent's y+1 = obstacle's y and agent's x = obstacle's x<br>
2) agent's y = 5, which is the topmost map boundary.<br>

In [10]:
tree = makeTree({'if': equalRow(stateKey(actor.name, 'y'), '5'),
             True: False, False: add_branch_plus_y(len(obstacles)-1)})
actor.setLegal(move_up, tree)

Moving down is invalid if:<br>
1) agent's y-1 = obstacle's y and agent's x = obstacle's x<br>
2) agent's y = 0, which is the bottommost map boundary.<br>

In [11]:
tree = makeTree({'if': equalRow(stateKey(actor.name, 'y'), '0'),
             True: False, False: add_branch_minus_y(len(obstacles)-1)})
actor.setLegal(move_down, tree)

## Setting the Agent Reward Function
The rationale for agents to act is on their reward functions. In this example, we take the L1 distance between the agent and the end of the maze as the reward function. The agent's only reward is to minimize the L1 distance, making this a greedy algorithm to reach the end of the maze.

In [12]:
actor.setReward(minimizeDifference(stateKey(actor.name, 'x'), stateKey(actor.name, 'goal_x')), 1.0)
actor.setReward(minimizeDifference(stateKey(actor.name, 'y'), stateKey(actor.name, 'goal_y')), 1.0)

## Check for Termination
When the agent successfully reaches the end of the maze, the world needs to terminate. To create a termination condition, we need to create a decision tree that checks whether the agent's coordinate is the same as its goal coordinate.

In [13]:
tree = makeTree({'if': equalFeatureRow(stateKey(actor.name, 'x'), stateKey(actor.name, 'goal_x')),
                 True: {'if': equalFeatureRow(stateKey(actor.name, 'y'), stateKey(actor.name, 'goal_y')),
                        True: True,
                        False: False},
                 False: False})
world.addTermination(tree)

## Set Turn Order
In scenarios with multiple agents, the order in which the agents take turns will matter. If all the agents take action at the same time, use parallel. If the agents take turns one after another, then use sequential.

In [14]:
# Parallel action
# world.setOrder([set(world.agents.keys())])
# Sequential action
world.setOrder(world.agents.keys())

Now, we can run the scenario until termination. At each time step, we can also print the reward for each of the agent's actions.

In [15]:
while not world.terminated():
    result = world.step()
    world.explain(result,2)

100%
Agent-MoveRight
100%
Agent-MoveUp
	V(Agent-MoveLeft) = -49.000
	V(Agent-MoveUp) = -41.000
100%
Agent-MoveUp
	V(Agent-MoveDown) = -43.000
	V(Agent-MoveUp) = -37.000
100%
Agent-MoveRight
	V(Agent-MoveDown) = -39.000
	V(Agent-MoveRight) = -35.000
100%
Agent-MoveRight
	V(Agent-MoveLeft) = -35.000
	V(Agent-MoveRight) = -33.000
100%
Agent-MoveDown
	V(Agent-MoveDown) = -33.000
	V(Agent-MoveLeft) = -33.000
100%
Agent-MoveDown
	V(Agent-MoveDown) = -31.000
	V(Agent-MoveUp) = -33.000
100%
Agent-MoveRight
	V(Agent-MoveRight) = -27.000
	V(Agent-MoveUp) = -35.000
100%
Agent-MoveRight
	V(Agent-MoveLeft) = -31.000
	V(Agent-MoveRight) = -21.000
100%
Agent-MoveUp
	V(Agent-MoveLeft) = -25.000
	V(Agent-MoveUp) = -15.000
100%
Agent-MoveUp
	V(Agent-MoveDown) = -19.000
	V(Agent-MoveUp) = -10.000
100%
Agent-MoveUp
	V(Agent-MoveDown) = -13.000
	V(Agent-MoveUp) = -6.000
100%
Agent-MoveUp
	V(Agent-MoveDown) = -8.000
	V(Agent-MoveUp) = -3.000
100%
Agent-MoveUp
	V(Agent-MoveDown) = -4.000
	V(Agent-MoveLeft) =

<img src="files/demo.gif">