# Bayesian Theory Of Mind Problem Set


1. [Building The Model](#Distance-Calc)
  1. [Initializing The Model](#init)
  1. [Distance-Calc](#Distance-Calc)
  2. [Getting Points Within The Range](#Within-Range)
2. [The Mind Model](#Mind-Model)
  1. [Updating Beliefs](#updating-beliefs)
  2. [Transition Matrix](#transition-matrix)
  3. [Inferring Intent](#inferring-intent)
3. [Analysis of Bayesian Theory Of Mind](#analysis)

## Building The Model

<img src="scenario.png"/>


Before we start constructing a way to infer intent, we first need to find a way to represent our "world", the space in which we will infer intent. In this case, our world is a 15x15 square grid that our agent (in this case, Mark Watney) can traverse around. Scattered in known locations on the grid are resources A B and C. Mark is trying to retrieve these resources, but we're not sure which one he wants. This is where the intent comes in!

To be more technical and specific, we will write some class Mind that keeps track of our model's estimation of Mark's intent, belief about the world, and how likely he's going for a certain resource given his location.

Below, we'll initialize this Mind class, then dive a bit deeper into how to use it.



### Initialize The Mind (5 pts) <a id="temporal-word-problem"/>

Our first step is to initialize the Bayesian Theory of Mind. Below is a skeleton implementation of the class we'll use to model our BToM. We've initialized some stuff for you, but you'll have to do the rest. Let's initialize both our beliefs and our intents as a *uniform distribution*. That is, the probability for any belief or any intent starts out the same. 



In [10]:
class Mind:
    def __init__(self):
        # where resources are
        self.world = [(10,0), (0,9), (6,10)] # this array is the coordinates of the location of the resources
        self.map_length = 15 # How large the grid is
        self.actual_world = 'ABC' # Resources
        self.transition_matrix = []
    
        self.beliefs_worlds = ['ABC', 'ACB', 'BAC', 'BCA', 'CAB', 'CBA'] # Possible Beliefs
        self.prev_position = (0,0)
        self.position = (0,0)

        self.beliefs = [1/6, 1/6, 1/6, 1/6, 1/6, 1/6]
        self.intents = {'A': 1/3,'B': 1/3,'C': 1/3}
        

In [11]:
from nose.tools import assert_equal

my_mind = Mind()

assert_equal(my_mind.beliefs, [1/6, 1/6, 1/6, 1/6, 1/6, 1/6])
assert_equal(my_mind.intents, {'A': 1/3,'B': 1/3,'C': 1/3})

print ("success")


ModuleNotFoundError: No module named 'nose'

### Distance Calculation (5 pts) <a id="distance-calc"/>

In order for our model to understand anything, we need to be able to calculate distances between objects on our field. Write a function here to calculate the distance between two points p1 and p2 (both tuples of the form x,y)

In [6]:
# helper to get distance
def get_distance(p1, p2):
    #pass
    return ((p1[0]-p2[0])**2 + (p1[1]-p2[1])**2)**.5

Let's test that our Distance Calculation Works as expected

In [110]:
from nose.tools import assert_equal

assert_equal(get_distance((1,0), (2,0)), 1)
assert_equal(get_distance((1,1), (2,2)), (2)**0.5)
assert_equal(get_distance((1,0), (1,2)), 2)

print ("success")


success


### Getting Points Within A Range (10 pts) <a id="within-range"/>

Now, our next step is to be able to find the points that Mark can see when he is at some location. Implement the method below, that given a Mind object and location, returns all the resources that Mark can see. (5 squares away in x or y direction)

In [8]:
def within_range(mind, position):
    """ 
    Given a mind object and location, outputs the (x,y) coordinates of resources near the location
    
    Input: mind - a mind object which contains location of all the resources on map
           position - current position on map
    Output: locs - a list of locations (in x,y) available from current position
    """
    # should return resource positions that are visible from current position
#     locs = []
    
#     return locs

    locs = []
    for i in range(len(mind.world)):
        loc = mind.world[i]
        if abs(position[0]-loc[0])<=5 and abs(position[1]-loc[1])<=5:
            locs.append(loc)
    return locs


In [9]:
from nose.tools import assert_equal

my_mind = Mind()

assert_equal(within_range(my_mind, (8,0)), [(10, 0)])
assert_equal(within_range(my_mind, (0,0)), [])
assert_equal((6,10) in within_range(my_mind, (3,9)) and (0,9) in within_range(my_mind, (3,9)), True)
print ("success")


ModuleNotFoundError: No module named 'nose'

## The Mind Model

Next, we need to actually write some useful functions that will use the mind model to get intent


### Updating Beliefs (20 pts) <a id="updating-beliefs"/>

In this section, we'll work on updating what our BToM thinks that Mark believes.

The function below is passed a mind instance and the current state. We should update our probability distribution of what we think Mark believes given this new information.

Here are the steps you should follow:
- Obtain all locations of resources that are within range of our state (use a function you made!)
For each of the locations you will then:
- Get the resource that is at that location in the actual world
- Go through all the 6 possible worlds, keep track of which are consistent and inconsistent with the resource being in that location.
- For each of the consistent worlds, multiply the previous belief in that world by 0.9 over the number of consistent worlds.
- Similarly for the inconsistent worlds, multiply the previous belief in that world by 0.1 over the number of inconsistent worlds.
- Normalize the new beliefs

Hint: If the position of the resource we just discovered matches some possible state of the world, Mark is also much more likely to think that that is the true state of the world.



In [12]:
def beliefs_update(self, state):
    "Update beliefs"
    near_locations = self.within_range(state)

    # figure out which locations you can see
    # update beliefs for the ones close to you

    #redistribute beliefs

    for loc in near_locations:
        i = self.world_loc.index(loc)
        resource = self.world_state[i]
        print ("observed resource %s" % resource)
        consistently_observed_worlds = []
        inconsistently_observed_worlds = []
        for j in range(len(self.beliefs_worlds)):
            world = self.beliefs_worlds[j]
            if world[i] == resource:
                consistently_observed_worlds.append(j)
            else:
                inconsistently_observed_worlds.append(j)
        factor = len(consistently_observed_worlds)
        sum_of_consistently_observed = sum([self.beliefs[i] for i in consistently_observed_worlds])
        sum_of_inconsistently_observed = sum([self.beliefs[i] for i in inconsistently_observed_worlds])
        for i in range(len(self.beliefs)):
            if i in consistently_observed_worlds:
                self.beliefs[i] = 0.9 * self.beliefs[i]/sum_of_consistently_observed
            else:
                self.beliefs[i] = 0.1 * self.beliefs[i]/sum_of_inconsistently_observed

        self.beliefs = [float(i)/sum(self.beliefs) for i in self.beliefs]
        print("new beliefs %s " % self.beliefs)


In [14]:
from nose.tools import assert_equal

# TODO update these tests!!!!

my_mind = Mind()

update_beliefs(my_mind, (10,0), 'A')
assert_equal(my_mind.beliefs[0], 0.5)
update_beliefs(my_mind, (0,9), 'B')
assert_equal(my_mind.beliefs[0], 0.8125)

print ("success")


ModuleNotFoundError: No module named 'nose'

### Defining Transition and Actions (20 pts) <a id="transition-matrix"/>

In addition to wanting to figure out Mark's intent, our model also models Mark's mind in the context of a planning problem as well. We'd like to know what action Mark wants to take next. Although this also includes the value iteration algorithm as well as getting the best policy (the best actions he should take), we'll be defining just some of the functions necessary to get the best policy.

We'll be implementing a function, transition, that giving a state and an action, returns a list of (result state, probability) pairs. 

Hint: We included two helper function, get_next_state, that gives the next state that would be reached by taking some action, and actions, which gets the next set of possible functions. Use them to help implement transition.

In [15]:
def get_next_state(self, state, action):
    """
    Helper function. Given state and action, gives next state that would
    be reached. Returns None if off map.
    """
    next_state = None
    x = state[0]
    y = state[1]

    if action == 'right':
        if 0<=(x+1)<self.map_length and 0<=y<self.map_length:
            next_state = (x+1,y)
    elif action == 'left':
        if 0<=(x-1)<self.map_length and 0<=y<self.map_length:
            next_state = (x-1,y)
    elif action == 'up':
        if 0<=x<self.map_length and 0<=(y-1)<self.map_length:
            next_state = (x,y-1)
    elif action == 'down':
        if 0<=x<self.map_length and 0<=(y+1)<self.map_length:
            next_state = (x,y+1)

    return next_state


def actions(self, state):
    "Set of actions that can be performed in this state."
    actions = []
    x = state[0]
    y = state[1]

    if 0<=(x+1)<self.map_length and 0<=y<self.map_length:
        actions.append('right')
    if 0<=(x-1)<self.map_length and 0<=y<self.map_length:
        actions.append('left')
    if 0<=x<self.map_length and 0<=(y-1)<self.map_length:
        actions.append('up')
    if 0<=x<self.map_length and 0<=(y+1)<self.map_length:
        actions.append('down')

    return actions

    
def transition(self, state, action):
    """
    Transition model.  From a state and an action, return a list
    of (result-state, probability) pairs.
    """
    actions = self.actions(state)
    pairs = []

    action_valid = False
    next_state = self.get_next_state(state, action)

    x = state[0]
    y = state[1]

    if next_state:
        pairs.append((next_state, 0.9))
        remaining = len(actions) - 1
        prob = 0.1 / remaining
        for a in actions:
            if a != action:
                pairs.append((self.get_next_state(state, a), prob))
    else:
        prob = 1.0 / len(actions)
        for a in actions:
            pairs.append((self.get_next_state(state, a), prob))

    return pairs


In [116]:
from nose.tools import assert_equal

my_mind = Mind()

# TODO also need to update these tests
my_mind.position = (1,0)
update_transition_matrix(my_mind)
assert_equal(my_mind.transition_matrix, [0.44756872175704115, 0.2177541879876714, 0.3346770902552874])
my_mind.prev_position = (1,0)
my_mind.position = (1,1)

update_transition_matrix(my_mind)
assert_equal(my_mind.transition_matrix,[0.19990528833005725, 0.4109589772800559, 0.38913573438988686])


print ("success")


[1.0, -0.0553851381374173, 0.48156390219165246]
[-0.0553851381374173, 0.9931273898388682, 0.8847097465119482]
success


### Inferring Intent (20 pts) <a id="inferring-intent"/>

Last thing! We need to put it together and infer the intent of Mark using the functions we've defined!

In this function, we're passed in a Mind object. Using it, we should get the total probability (normalized) that Mark's intent is to get a particular resource.

Steps:
- Based on the action Mark takes next, get the next state
- For each of the resource positions in the actual world orientation, get the difference in distance Mark is from each of the resource positions based on the state before and after his action. Note: some of these may be negative.
- For each of the resources, taking into account the belief distribution in all the different worlds, we want to calculate a score, based on the distance it is from Mark in that world and the belief in that world (to simplify, you can simply multiply the distance by the belief).
- Update the old intents by adding this score, discounted by a factor of 0.3. Ensure that the values do not exceed 1 or go below 0.
- Finally, normalize the new intents.

More specifically, you should mutate mind.intents and update it with the new intents! We've already implemented the step for receiving an observation, and updating other propertiess, but now you must finish it with update_intents.


In [117]:
def intents_update(self, action):
    "Update intents."
    """
        belief: [Pr(ABC), Pr(ACB), Pr(BAC), Pr(BCA), Pr(CAB), Pr(CBA)]
        state: current location on the grid
        action: action he takes
    """

    next_state = self.get_next_state(self.state, action)

    # difference in distance that this action takes us, or how much closer or farther
    # we get from the 3 resource locations
    dists = []

    for resource_pos in self.world_loc:
        dist1 = self.get_distance(resource_pos, self.state)
        dist2 = self.get_distance(resource_pos, next_state)
        diff_dist = dist1-dist2
        dists.append(diff_dist)

    print (dists)

    new_intents_scores = {'A': 0, 'B': 0, 'C': 0}

    for i in range(len(self.beliefs_worlds)):
        world = self.beliefs_worlds[i]
        belief = self.beliefs[i]
        for i in range(3):
            r = world[i]
            new_intents_scores[r] += dists[i] * belief

    new_intents = {}
    factor = 0.3
    sum_new_intents = 0
    for r in self.intents:
        new_intents[r] = max(self.intents[r] + factor * new_intents_scores[r], 0)
        sum_new_intents += new_intents[r]

    self.intents = {'A': new_intents['A']/sum_new_intents, 'B': new_intents['B']/sum_new_intents, 'C': new_intents['C']/sum_new_intents}



In [118]:
def receive_observation(self, action):
    "Receive observation, update model, and get updated intents."
    print ("------")
    self.state = self.get_next_state(self.state, action)
    self.beliefs_update(self.state)
    U = self.value_iteration()
    # print (U)
    policy = self.best_policy(U)
    print (self.state)
    self.intents_update(action)
    # print ('beliefs: ', self.beliefs)
    print ('intents: ', self.intents)

In [18]:
from nose.tools import assert_equal

my_mind = Mind()

# TODO update this too!!!

# receive_observation(my_mind,1, 0)
# receive_observation(my_mind,2, 0)
# receive_observation(my_mind,3, 0)
# receive_observation(my_mind,4, 0)
# receive_observation(my_mind,5, 0)
# receive_observation(my_mind,6, 0)
# receive_observation(my_mind,5, 0)
# receive_observation(my_mind,4, 0)
# receive_observation(my_mind,3, 0)

# #{'A': 0.035648663780556554, 'B': 0.48217566810972184, 'C': 0.48217566810972157}

# assert_equal(abs(my_mind.intents['A'] - 0.035648663780556554) < 0.001, True)

# assert_equal(abs(my_mind.intents['B'] - 0.48217566810972184) < 0.001, True)

# assert_equal(abs(my_mind.intents['C'] - 0.48217566810972157) < 0.001, True)

# print ("successs")

ModuleNotFoundError: No module named 'nose'

## Analysis of Bayesian Theory of Mind (10 pts) <a id="distance-calc"/>

In the box below, highlight some of the advantages and disadvantages of Bayesian Theory of Mind. This is open ended! What makes it different from other approaches, and why does that make it better? Are there any drawbacks you can think of?