In [1]:
import numpy as np

# Haunted Mansion

Welcome to the haunted mansion, a 3 dimensional array for an agent to learn to navigate. The agent will start out in position (0,0,0) and it will try to exit at point (size,size,size). I will start with a relatively small size and adjust it if needed.

### The agent will encounter a few things throughout its journey through the haunted mansion. 
- It will encounter monsters sporadically
    - There will be "hot spots", maybe 3x3x3 areas where monsters are likely to spawn, and in the center of themis lots of treasure. 
    
- It will encounter treasure throughout its adventure
    - The treasure will be the *value* metric of the agent. The immediate reward will be the steps that it takes. 
    

### Two policies
1. We'll try to design a policy where the agent wants to get *out* of the haunted mansion as quickly as it can. 

2. We'll also try a "hero" policy where it collects as much treasure as it can and gains strength (through slaying monsters) 

### Strength
The agent will have a strength that determines whether it will die or win against a given monster. The monsters will spawn probabilistically in a given "hot spot" and they will have strengths as well. If the agent wins (ie. agent strength > monster strength), it gains strength. If they are equally matched, it is a coin flip. 

### Treasure 
Treasure will be the value function of the environment. There will be more treasure available inside of the monster zones. 

In [72]:
# Let's first set up the environment and the agent

# importing scipy.stats for normal distribution for monster spawning probabilities
import scipy.stats

tre = 0
mon = 1

class Environment: 
    def __init__(self, size):
        self.hotspot_prob_cache3x3 = self.gen_hotspot_prob_matrix(3)
        self.hotspot_prob_cache5x5 = self.gen_hotspot_prob_matrix(5)
        self.map = self.generate_map(size, hotspots=[(size//2, size//2, size//2, 3)])
        
    def generate_map(self, map_size, hotspots):
        env = np.empty(shape=(map_size, map_size, map_size, 2))
        
        for i in range(map_size):
            for j in range(map_size):
                for k in range(map_size):
                    treasure = 0
                    monster = None
                    env[i,j,k] = [treasure, monster]
                    
        for (x, y, z, s) in hotspots:
            r = s // 2
            env[x-r:x+r+1, y-r:y+r+1, z-r:z+r+1] = self.populate_hotspot((x, y, z), s)
        
        return env
            
            
    def populate_hotspot(self, loc, hotspot_size):
        r = hotspot_size // 2
        x, y, z = loc
        
        hotspot = np.empty(shape=(hotspot_size, hotspot_size, hotspot_size, 2))
        
        for i in range(hotspot_size):
            for j in range(hotspot_size):
                for k in range(hotspot_size):
                    # treasure is multiplied by the probability of a monster
                    treasure = self.hotspot_prob_cache3x3[i,j,k] * 10
                    monster = np.random.randint(8)+1       # random strength 1 to 8
                    hotspot[i,j,k] = [treasure, monster]
        
        return hotspot
        
        
    
    # the distance from the center of the hotspot determines
    # how likely a monster is to spawn there - there is always
    # a monster in the center
    def gen_hotspot_prob_matrix(self, size):
        
        arr = np.zeros((size,size,size))
        mid = (size//2, size//2, size//2)
        xh, yh, zh = mid
        
        # generate probabilities at given relative locations
        for i in range(size):
            for j in range(size):
                for k in range(size):
                    dist = np.sqrt((i-xh)**2 + (j-yh)**2 + (k-zh)**2)
                    gen_prob = scipy.stats.norm(0, size/2).cdf(-dist)
                    arr[i,j,k] = gen_prob
        
        # always a monster in the center!
        arr[mid] = 1
                    
        return arr
        


In [75]:
env = Environment(6)

# here's the probability of a monster spawning in a given 3x3x3 hotspot
# note there will always be a monster in the middle of the hotspot
# (this is where the most treasure is!)
print(env.hotspot_prob_cache3x3)

[[[0.12410654 0.17288929 0.12410654]
  [0.17288929 0.25249254 0.17288929]
  [0.12410654 0.17288929 0.12410654]]

 [[0.17288929 0.25249254 0.17288929]
  [0.25249254 1.         0.25249254]
  [0.17288929 0.25249254 0.17288929]]

 [[0.12410654 0.17288929 0.12410654]
  [0.17288929 0.25249254 0.17288929]
  [0.12410654 0.17288929 0.12410654]]]


In [77]:
# and here we can see the "hot spot" on a small map
# the numbers are the strengths of the monsters in that area
print(env.map[:,:,:,1])

[[[nan nan nan nan nan nan]
  [nan nan nan nan nan nan]
  [nan nan nan nan nan nan]
  [nan nan nan nan nan nan]
  [nan nan nan nan nan nan]
  [nan nan nan nan nan nan]]

 [[nan nan nan nan nan nan]
  [nan nan nan nan nan nan]
  [nan nan nan nan nan nan]
  [nan nan nan nan nan nan]
  [nan nan nan nan nan nan]
  [nan nan nan nan nan nan]]

 [[nan nan nan nan nan nan]
  [nan nan nan nan nan nan]
  [nan nan  3.  4.  5. nan]
  [nan nan  2.  4.  1. nan]
  [nan nan  6.  2.  6. nan]
  [nan nan nan nan nan nan]]

 [[nan nan nan nan nan nan]
  [nan nan nan nan nan nan]
  [nan nan  1.  2.  4. nan]
  [nan nan  5.  3.  3. nan]
  [nan nan  4.  4.  8. nan]
  [nan nan nan nan nan nan]]

 [[nan nan nan nan nan nan]
  [nan nan nan nan nan nan]
  [nan nan  5.  8.  3. nan]
  [nan nan  6.  3.  1. nan]
  [nan nan  7.  5.  7. nan]
  [nan nan nan nan nan nan]]

 [[nan nan nan nan nan nan]
  [nan nan nan nan nan nan]
  [nan nan nan nan nan nan]
  [nan nan nan nan nan nan]
  [nan nan nan nan nan nan]
  [nan nan

## Agent
Now we have to design an agent.

An agent has actions that respond to the environment. The actions here will be up, down, left, right, 

In [None]:
class Agent:
    def __init__(self):
        self.strength = 1
        self.treasure = 0.0
        self.actions = ['U', 'D', 'L', 'R', 'F', 'B']