# Probabilistic Search and Destroy

Authors:

- Rohan Rele (rsr132)
- Aakash Raman (abr103)
- Alex Eng (ame136)
- Adarsh Patel (aap237)

This project was completed for Professor Wes Cowan's Fall 2019 offering of the CS 520: Intro to Artificial Intelligence course, taught at Rutgers University, New Brunswick.

# Problem Representation

In this project, we consider a two-dimensional map of cells in which one cell is randomly designated as the target. The location of the target is not known to any solving agent. Therefore, the problem is to devise an agent which can effectively query the landscape of cells, contribute towards its knowledge base based on observations, and ultimately find the target in the **minimal number of queries.**

The knowledge base itself will contain probabilistic knowledge, i.e. 

$$\text{Belief}[\text{Cell}_i] = P(\text{Target in Cell}_i|  \text{Observations through time } t)$$ 

For every cell, this is the probability that a given cell contains the target given the existing knowledge base. Initially, as the agent has no prior knowledge about the map, the belief for each cell is $\frac{1}{dim^2}$.

Each cell also contains a terrain type which corresponds to the probability that a query will return a false negative, i.e.

$$P(\text{Target not found in Cell}_i | \text{Target is in Cell}_i)$$

which is $0.1$ for **flat** terrain cells, $0.3$ for **hilly** terrain cells, $0.7$ for **forested** terrain cells, and $0.9$ for cells whose terrain is a maze of **caves.**

We assume that for any given map, each cell is assigned the flat terrain type with probability $0.2$, the hilly terrain type with probability $0.3$, the forested terrain type with probability $0.3$, and the caves terrain type with probability $0.2$.

## Landscape

We implement the landscape as a class, which has the following fields:

- `dim` (int): the dimension of the $dim$ by $dim$ map
- `landscape` (2D list) of `landCell` objects, each of which tracks:
    - `target` (int): `PRESENT = 1` if this cell is the target, or `ABSENT = 0` otherwise
    - `terrain` (int): `FLAT = 0.1` if this cell has flat terrain, `HILLY = 0.3`, `FOREST = 0.7`, or `MAZE = 0.9`, etc.
- `target_x` (int): the x-coordinate of the target cell
- `target_y` (int): the y-coordinate of the target cell

A landscape is initialized with non-target cells that are assigned terrain types based on the probabilities previously described. It then randomly selects one cell to be the target.

In [None]:
class landscape:

    dim = 0
    landscape = [[]]
    target_x = 0
    target_y = 0

    def __init__(self, dim):
        self.dim = dim
        self.landscape = [[landCell() for _ in range(self.dim)] for _ in range(self.dim)]

        target_x = random.randint(0, dim - 1)
        target_y = random.randint(0, dim - 1)
        self.landscape[target_x][target_y].target = PRESENT
        self.target_x = target_x
        self.target_y = target_y

For more implementation details, see `Landscape.py`.

The `landCell` object is also defined in a class.

In [None]:
class landCell:

    def __init__(self):
        x = random.randint(1, 100)
        self.target = ABSENT
        if x <= 20:
            self.terrain = FLAT
        elif 20 < x <= 50:
            self.terrain = HILLY
        elif 50 < x <= 80:
            self.terrain = FOREST
        else:
            self.terrain = MAZE

    def getTerrain(self):
        return self.terrain

    def isTarget(self):
        return (self.target==PRESENT)

For more implementation details, see `Cell.py`.

For example, an initialized $dim = 50$ landscape may look like this:

![Blank Landscape](./imgs/landscape_blankTest.png)

where the target is located at (40, 48).

## Agent

We also implement the agent as a class, which has the following fields:

- `knowledge` (2D list) of `agentCell` objects, each of which tracks:
    - `belief` (float): the probability that a given cell contains the target, as described above; initially $1/{dim}^2$
    - `status` (boolean): either `VISITED = True` or `UNVISITED = False` depending on whether or not the cell has been queried previously; initially `False`
- `rule` (int): either 1 or 2, corresponding to the two probability rules described below
- `num_actions` (int): the number of actions, whether queries or movements (in the later case of a movement-restricted agent), executed so far; initially 0
- `i` (int): x-coordinate of first cell to query; if -1, then select one at random
- `j` (int): y-coordinate of first cell to query; if -1, then select one at random

In [None]:
class agent:
    num_actions = 0

    def __init__(self, landscape, rule):
        self.ls = landscape
        d = self.ls.dim

        if rule == 1 or rule == 2:
            self.rule = rule
        else:
            print("Invalid rule, set to 1 by default")
            self.rule = 1

        self.knowledge = [[agentCell() for j in range(d)] for i in range(d)]
        for i in range(d):
            for j in range(d):
                self.knowledge[i][j].setBelief(1/(d**2))
        
        self.i = random.randint(0, self.ls.dim-1) if start_i == -1 else start_i
        self.j = random.randint(0, self.ls.dim-1) if start_j == -1 else start_j

For more implementation details, see `Agent.py`.

The `agentCell` object is also defined in a class.

In [None]:
class agentCell:

    def __init__(self):

        self.belief = 0
        self.status = UNVISITED

    def getBelief(self):
        return self.belief

    def getStatus(self):
        return self.status

    def setBelief(self,belief):
        self.belief = belief

    def setStatus(self,status):
        self.status = status

For more implementation details, see `Cell.py`.

For example, an initialized agent knowledge base with $dim = 50$ landscape may look like this:

![Blank Beliefs](./imgs/belief_blankTest.png)

where each cell has initial belief $\frac{1}{50^2} = 0.0004$.

The `agent` class also has the following methods (non-exhaustive list):

- `searchCell(cell)` (boolean): query a `landCell` object. If it is not the target, return `False`. If it is the target, then only return `True` with probability $p = 1 - P(\text{false negative})$, where the false negative probability depends on that cell's terrain type as described previously. Otherwise, return `False`. 

In [None]:
def searchCell(self,cell):
    if not cell.isTarget():
        return False
    else:
        p = 1 - cell.getTerrain()
        if random.uniform(0, 1) < p:
            return True
        else:
            return False

- `getVisited()` (list): return a list of all (x,y) coordinates which the agent has already queried at a given point in time

In [None]:
def getVisited(self):
    n = self.ls.dim
    coords = []
    for x in range(n):
        for y in range(n):
            if self.knowledge[x][y].getStatus():
                coords.append((x,y))
    return coords

# Updating the belief state

We require a method to update the belief state given the results of a query. There are two cases: 

1. A query of a cell found the target, in which case the belief for this cell is set to 1, and the beliefs for all other cells are set to 0.

2. A query of a cell did not find the target, in which case the belief for this cell must be adjusted considering the probability that the query returned a false negative.

The latter case considers the probability 

$$P(\text{Target in Cell}_i | \text{Observations}_t \land \text{Failure in Cell}_j)$$

and relies on the probabilistic knowledge base.

### Probabilistic intuition

Let $H := \{\text{Target in Cell}_i\}$ and $E := \{\text{Target not found if we queried every cell}\}$. 

Then we want $P(H|E)$, or the probability that the target is in a cell given we have not found the target in any of our queries. We would like to compute this upon a failed query of a cell and accept this quantity as that cell's new belief.

By **Bayes' theorem,** this quantity is:

$$P(H|E) = \frac{P(E|H)P(H)}{P(E)}$$

Observe that $P(E|H)$ is the probability that the target is not found given the target is in the cell, which is exactly the false negative probability described above per terrain type. And $P(H)$ is exactly the agent's belief for that cell in the previous time step.

One can see that observing a failed query for a given cell will decrease our belief that this cell contains the target, but this decrease is scaled by the possibility of false negatives.

$P(E)$ is calculated with the following function: 

$$P(E) = \sum_{i \text{ visited}} P(\text{Target not found in Cell}_i | \text{Target is in Cell}_i) + \sum_{j \text{ unvisited}} P(\text{Target not in Cell}_j)$$

That is, it is the probability that some queried cell was a false negative and that the unqueried cells do not contain the target. In this situation, querying all remaining cells would not lead to us finding the target.

Finally, once a queried cell is updated, we must update the rest of the knowledge base. 

Let $R_i = |{\text{new belief of Cell}_i} - {\text{old belief of Cell}_i}|$, or the difference between the new and old beliefs of the queried cell.

Then for all remaining cells $j$, use the following update formula: 

$$\text{Belief}^{t+1}_j = \text{Belief}^t_j + \frac{\text{Belief}^t_j * R_i}{1 - R_i}$$

This, in a sense, scales the previous belief by how much our query impacted the belief of the queried cell. One can see how failed queries will increase the beliefs of all other cells per iteration, although this increase may be marginal.

### Implementation

Based on the intuition above, the implementation of a belief update is:

In [None]:
def updateBelief(self,x,y):
    #P(H|E) = P(E|H)P(H)/P(E)
    #H: Target in cell
    #E: Target not found
    curr_belief = self.knowledge[x][y].getBelief()
    num = self.ls.landscape[x][y].getTerrain()*curr_belief
    denom = self.probNotFound()
    remainder = abs(curr_belief - (num/denom))
    self.knowledge[x][y].setBelief(num/denom)
    for i in range(self.ls.dim):
        for j in range(self.ls.dim):
            if i == x and j == y:
                continue
            else:
                temp = self.knowledge[i][j].getBelief()
                self.knowledge[i][j].setBelief(temp + (temp*remainder)/(1-remainder))
    return self.knowledge

This method returns a new belief for a specified cell and updates the beliefs about the rest of the map, as described earlier. 

**Note:** We do not update the knowledge base when the target is found (case 1 above), as this is trivial: if the target is found, the program will terminate anyway.

The above method relies on the function `probNotFound` which computes $P(H|E)$ exactly as described above.

In [None]:
def probNotFound(self):
    n = self.ls.dim
    res = 0
    coords = self.getVisited()
    res = (n**2 - len(coords))/(n**2)
    for coord in coords:
        res += (self.ls.landscape[coord].getTerrain())*(self.knowledge[coord].getBelief())
    return res

For more implementation details, see `Agent.py`.

# Agent search strategies

Armed with this `updateBelief` function, we need to define how the agent will choose cells to query in order to search maps. We consider two rules for which cell the agent should query next:

1. Query the cell with the highest belief, i.e. the probability that **the target is in that cell.**

2. Query the cell with the highest probability that **the target will be found in such a query.**

We implement both probability rules, and then use either of them to implement the agent's search algorithm.

## Rule 1: Prioritize the probability that the target is in a given cell

This is based on the same $P(H|E)$ computed above. Upon each failed query, we update the entire knowledge base of beliefs as described above, and then we visit the cell with the highest belief. Its implementation is described above via `updateBelief`.

## Rule 2: Prioritize the probability that the target will be found, if a given cell is searched

### Probabilistic intuition

Observe that this probability is different from $\text{Belief}_i$. It must consider the impact of the terrain's interference with queries, i.e. potential false negatives. We want the following probability:

$$P(F) := P(\text{Target found in Cell}_i | \text{Observations}_t)$$

which is equal to:

$$(1 - P(\text{Target not found in Cell}_i | \text{Target is in Cell}_i)) * P(\text{Target is in Cell}_i)$$

Note that $P(\text{Target is in Cell}_i)$ is exactly $P(H)$ from before, and $P(\text{Target not found in Cell}_i | \text{Target is in Cell}_i)$ is determined by the terrain type of $\text{Cell}_i$, so we can easily compute $P(F)$.

### Implementation

The probability that the target will be found at a given cell if it is searched is computed by:

In [None]:
def probFound(self,x,y):
        return (1-self.ls.landscape[x][y].getTerrain())*self.knowledge[x][y].getBelief()

## Agent search implementation for both rules

We drive the above probability-calculating and belief update functions with the below methods in the `agent` class:

In [None]:
def getMaxLikCell(self, start_i, start_j):
    if self.rule == 1:
        #get max i for P(Target in Cell i)
        belief = np.array([[self.knowledge[i][j].getBelief() for j in range(self.ls.dim)] for i in range(self.ls.dim)])
        return np.unravel_index(belief.argmax(),belief.shape)
    elif self.rule == 2:
        #get max i for P(Target FOUND in Cell i)
        belief = [[self.knowledge[i][j].getBelief()*(1-self.ls.landscape[i][j].getTerrain()) for j in range(self.ls.dim)] for i in range(self.ls.dim)]
        belief = np.array(belief)
        return np.unravel_index(belief.argmax(),belief.shape)

This method will find the cell with maximum probability for either rule:

1. Return the cell coordinates with the highest Rule 1 probability, i.e. belief. For arbitrary $\text{Cell}_j$, this is given by:

$$\text{argmax}_j \quad P_\text{Rule1}(j)$$

2. Return the cell coordinates with the highest Rule 2 probability, i.e. the equation in `probFound` above. For arbitrary $\text{Cell}_j$, this is given by:

$$\text{argmax}_j \quad P_\text{Rule2}(j)$$

In either case, ties between multiple maximum-probability cells are broken arbitrarily.

Finally, the driver code which will repeat belief updates and getting the maximum-probability cell is:

In [None]:
def findTarget(self):
    i = self.i; j = self.j
    self.num_actions += 1
    while not self.searchCell(self.ls.landscape[i][j]):
        self.knowledge = self.updateBelief(i,j)
        next_i,next_j = self.getMaxLikCell(i,j)
        self.num_actions += 1
        i,j = next_i,next_j
    return (i,j)

This method will use the `i` and `j` fields of the `agent` class to start the search, corresponding to either randomized or user-inputted coordinates for the first cell to query. Then, for subsequent iterations, it will query the cell with the highest-probability (based on either rule), update all beliefs, and increment the number of actions incurred along the way. 

It terminates once the target is found. At termination, the agent will have recorded the number of actions taken to find the target.

## Performance comparison between both rules

For all comparisons listed below between the two probability rules prioritized in `getMaxLikCell`, we run trials on map(s) of $dim = 50$.

### Repeated trials over the same landscape

For a fixed landscape, we run $n=200$ trials of Rule 1 agents and Rule 2 agents solving the same map, and we record the number of actions taken for each agent and trial. 

Note that for each trial, we start both agents' queries at the *same* cell, selected at random. Per trial, we also reset the location of the target to a new and different location, chosen uniformly at random via the following code:

In [None]:
def resetTarget(self):
    x = self.target_x; y = self.target_y
    self.landscape[x][y].target = ABSENT
    
    new_targ_coords = list(set([(x,y) for x in range(self.dim) for y in range(self.dim)]).difference({(x,y)}))
    
    new_target = random.choice(new_targ_coords)
    self.target_x = new_target[0]
    self.target_y = new_target[1]
    self.landscape[self.target_x][self.target_y].target = PRESENT

Here, we use the following randomly-generated landscape, where the target is **initially** at (41, 36).

![Fixed Landscape Rule One v. Two Trials](./imgs/landscape_ruleOneTwoComparison.png)

The comparison data generated is as follows:

In [1]:
import pandas as pd
fixed_ruleOneTwoComp_df = pd.read_csv('./data/fixed_map_comparison_ruleOneTwo.csv')[['RuleOne', 'RuleTwo']]
fixed_ruleOneTwoComp_df.transpose()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,190,191,192,193,194,195,196,197,198,199
RuleOne,4663,630,17309,4837,14650,10832,1801,26230,4480,2937,...,6816,2131,1591,10574,19533,1684,18856,896,21249,12430
RuleTwo,1877,129,1397,11290,1525,1009,369,7965,18998,29344,...,3951,436,318,750,607,6279,445,17597,4003,2309


The above trial-by-trial data on the number of searches required for the agents (using rules 1 or 2) to find the target can be visualized via the following scatter plot:

![Rules 1-2 Comparison by Trials, Scatter](./imgs/fixed_map_comparison_ruleOneTwoScatter.png)

It appears that the agent using Rule 1 requires a higher number of searches to find the target. But this pattern is not immediately clear. To mediate this, we consider the quantity:

$$\text{Diff} = \text{Number of searches}_{\text{Rule1}} - \text{Number of searches}_{\text{Rule2}}$$

![Rules 1-2 Difference by Trials, Plot](./imgs/fixed_map_comparison_ruleOneTwoDiff.png)

![Rules 1-2 Difference by Trials, Box](./imgs/fixed_map_comparison_ruleOneTwoDiffBox.png)

The above images show that the difference between the number of searches required following rules 1 and 2 has high spread. The following 1-variable statistics describe the distribution:

In [2]:
fixedmap_1varstats = pd.read_csv('./data/fixedmap_1varstats.csv', index_col=0)
fixedmap_1varstats

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
diff,200.0,1647.5,9229.267409,-32161.0,-2413.5,870.5,4922.0,36092.0


All in all, for a fixed map, it appears that **the agent using Rule 2** outperforms the agent using Rule 1 in terms of, on average, **1648 fewer searches required to find the target.** However, the large variance visualized above strongly motivates repeated trials over multiple randomly-generated maps in order to see if this pattern holds in general.

### Repeated trials over multiple landscapes

We conduct similar trials to compare both agents for $N = 50$ distinct and randomly generated maps. For each map and agent, we record the average number of actions taken to find the target over $n = 30$ trials. As above, we start both agents at the same initial queried cell (at random) and reset the target to a random new location in between each trial.

In [7]:
multi_ruleOneTwoComp_df = pd.read_csv('./data/multimap_comparison_ruleOneTwo.csv')[['RuleOne', 'RuleTwo']]
multi_ruleOneTwoComp_df.transpose()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,40,41,42,43,44,45,46,47,48,49
RuleOne,6239.666667,8238.2,7995.2,4642.6,5201.333333,6096.2,5929.8,3793.733333,6416.133333,4689.8,...,4645.466667,5928.066667,6189.0,3903.866667,6836.933333,4990.0,4923.266667,7069.466667,8294.866667,7375.866667
RuleTwo,3687.333333,7271.0,6089.666667,4928.133333,8382.6,6516.933333,5877.8,4493.466667,7009.933333,3859.933333,...,5404.133333,6142.333333,4896.2,6497.6,7608.733333,5126.333333,5827.266667,7152.266667,9738.533333,7034.2


![Rules 1-2 Comparison by Trials, Multi, Scatter](./imgs/multi_map_comparison_ruleOneTwoScatter.png)

![Rules 1-2 Difference by Trials, Multi, Plot](./imgs/multi_map_comparison_ruleOneTwoDiff.png)

![Rules 1-2 Difference by Trials, Multi, Box](./imgs/multi_map_comparison_ruleOneTwoDiffBox.png)

As before, the difference distribution shown above has quite a large spread. The distribution can be described by the following one-variable descriptive statistics:

In [8]:
multimap_1varstats = pd.read_csv('./data/multimap_1varstats.csv', index_col=0)
multimap_1varstats

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
diff,50.0,259.573333,1722.672219,-3181.266667,-787.8,26.233333,1290.0,4715.533333


It appears that over multiple maps, **rule two** still outperforms rule one by ways of, on average, **260 fewer searches required to find the target.** 

### Intuition

This makes intuitive sense, because rule two prioritizes a more realistic goal: *finding* the target versus trying to ordain where the target is. In a sense, the two rules are related, but rule two takes more directly into account the possibility of false negatives in a query. Instead of prioritizing where the target is, but possibly still getting a false negative in response, the rule two-prioritizing agent focuses on the most revealing queries. This is more "intelligent" in the sense that the agent's queries are more effective in driving belief updates, and this yields a lower number of searches required to find the target.

Of course, these data does not seem like a decisive statistical victory for rule two. Accounting for multiple maps has reduced the standard deviation in the difference distribution from 8418 to 1722, but we see that a few outliers still have a pronounced effect on the distributions described above. After running trials on multiple maps, we do see that at best rule two outperforms rule one by 4715 fewer searches, and at worst rule one outperforms rule two by 3181 fewer searches; one might say that in both extreme worst case scenarios, rule two still prevails.

But even though we hold maps fixed over repeated trials, variations in the data are naturally likely a result of:

- Randomly selecting a first cell to query
- Probabalistic revealing of false negatives

As expected from the problem statement, every trial for the same map will vary in terms of how the agent will solve, which sequence of cells to query, etc. because the environment is probabilistic.

# Restricted agent movement

Now, we consider an agent for which movement between one queried cell to the next **matters.** 

That is, we consider movement of one cell in any direction to be an action that the agent takes, just like querying is considered as one action. For example, if an agent has just queried (0,0) and then wishes to query (5,5), it will need to navigate 5 cells down and 5 cells right, meaning 10 total cells traveled across. Then, the next query at (5,5)  counts as an action, making the total number of additional actions, assuming querying (0,0) has already been counted, is $10 + 1 = 11$.

Note that we are using the **Manhattan distance** to consider travel costs, and not the Euclidean distance, because we assume the agent can only travel rectilinearlly, i.e. up, down, right, or left.

## Rule 3: Run with Rule 2, but also incur action costs for agent travel

As described above, we define a **Rule 3** agent to be exactly the same as a Rule 2 agent, not a Rule 1 agent, as we found Rule 2 to perform better. However, we additionally modify the incrementing of the number of actions required as follows:

In addition to counting individual cell queries as one action each, also calculate the **Manhattan distance** between subsequently queried cells:

$$\text{dist}((x_1, y_1), (x_2, y_2)) = | x_1 - x_2 | + | y_1 - y_2 |$$

Then, multiply this quantity by one action per distance unit and add this to the total number of actions incremented at each time step.

The implementation simply adds on a case (which also works for rule 4 below) for additional incrementation for the `findTarget` method in the `agent` described previously: 

In [None]:
def findTarget(self):
    # ... previous code, which also increments by 1 for queries
    if self.rule == 2 or self.rule == 3:
        self.num_actions += math.abs(i - next_i) + math.abs(j - next_j)
    # ...

and adds on a case in `getMaxLikCell` which is identical to the case for rule 2:

In [None]:
def getMaxLikCell(self, start_i, start_j):
    # ... previous code for rules 1, 2
    elif self.rule == 3:
        belief = [[self.knowledge[i][j].getBelief()*(1-self.ls.landscape[i][j].getTerrain()) for j in range(self.ls.dim)] for i in range(self.ls.dim)]
        belief = np.array(belief)
        return np.unravel_index(belief.argmax(),belief.shape)
    # ...

Naively, running a rule 3 agent on maps will solve them just as a rule 2 agent would, except it would incur much higher values for the number of actions required to find the target. That is, rule 3 is an unintelligent approach to solving this distance-based problem.

To mediate the effects of this performance hit, we need to re-adapt our agent's priorities in order to account for necessary travel distance as well.

## Rule 4: Prioritize cells both on Rule 2 and travel costs

### Intuition

Instead of using the Rule 1 or Rule 2 probabilities to prioritize the next cell queried, we instead devise a new heuristic which depends only on the cell's Rule 2 probability (of finding the target) as well as the travel cost of reaching that cell from the current one.

For simplicity of notation, let the previous Rule 2 probability for any cell $i$ to be denoted as $p_i$. 

If the most-recently queried cell is $\text{Cell}_i$, then let $m_{ij}$ be the total number of actions required to query any other $\text{Cell}_j$ where we **do not require** $j \neq i$. This is equal to Manhattan distance between $\text{Cell}_i$ and  $\text{Cell}_j$ plus 1 for the cost of querying $\text{Cell}_j$.

Next, let $k_t$ be the number of actions at time $t$ to end game, i.e. the number of queries and/or movements left in the game prior to finding the target. Then, the probability of not finding the target prior to end game, if we query $\text{Cell}_j$ next after $\text{Cell}_i$, is given by:

$$P(\text{\{find target only at end game\}}) = (1-p_i)^{\frac{k_t}{m_{ij}}}$$


We intuitively want to minimize this probability at each time step such that we can find the target in fewer than $k$ steps, i.e. before end game. To accomplish this, at each time step and regardless of the value of $k_t$, we choose the next cell to query after $\text{Cell}_i$ as:

$$\text{argmin}_j \quad \{ h(i,j) = (1-p_i)^{\frac{1}{m_{ij}}} \}$$

That is, we choose the $\text{Cell}_j$ such that we minimize the heuristic $h(i,j)$, which relates the rule 2 probability and the distance between $\text{Cell}_i$, $\text{Cell}_j$ in an **exponential relationship.**

That way, very high distances from the current cell and very low rule 2 probabilities penalize the heuristic value by increasing it, and vice versa, which is exactly what we wish our rational rule 3 agent to do.

Implementation-wise, this is nothing more than an alternative heuristic to rules 1 and 2 to prioritize cells to query at each time step, except we wish to **minimize** this heuristic, unlike the probabilities in rules 1 and 2, which we sought to maximize.

### Implementation

The rule 4 agent simply adds on a case for `getMaxLikCell` for rule 4 to calculate the argmin of the exact heuristic described above, and then it proceeds as usual:

In [None]:
def getMaxLikCell(self, start_i, start_j):
    # ... previous code for rules 1-3
    else:
        belief = [[math.pow(1 - self.knowledge[i][j].getBelief()*(1-self.ls.landscape[i][j].getTerrain()), 1/(1 + math.abs(start_i - i) + math.abs(start_j - j)) ) \
                for j in range(self.ls.dim)] for i in range(self.ls.dim)]
        belief = np.array(belief)
        return np.unravel_index(belief.argmin(),belief.shape)

**Note:** Again, we are finding the argmin of the heuristic and not the argmax of the likelihood (as for rules 1 and 2), as the method name suggests. This is for compatibility with the driver `findTarget` method.

**Note 2:** Recall that for rule 3, we added a case in `findTarget` for both rules 3 and 4 to additionally increment the number of agent actions by the Manhattan distance travelled. This holds here as well.

## Performance comparison between rules 3 and 4

When comparing agents using rules 1 and 2, we found that the multi-map comparison controlled for variance better than the fixed-map comparison. We also found it useful to consider the difference in number of actions taken to find the target, i.e.

$$\text{Diff} = \text{Number of searches}_{\text{Rule3}} - \text{Number of searches}_{\text{Rule4}}$$

We repeat this particular trial and comparison methodology here for agents using rules 3 and 4, generating $N = 20$ distinct $dim=50$ maps and obtaining the average (over $n=10$ trials per map) number of actions taken to find the target for both rules.

As before, we start both agents at the same initial queried cell (at random) and reset the target to a random new location in between each trial.

In [5]:
multi_ruleThreeFourComp_df = pd.read_csv('./data/multimap_comparison_ruleThreeFour.csv')[['RuleThree', 'RuleFour']]
multi_ruleThreeFourComp_df.transpose()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
RuleThree,29329.1,36727.9,32621.4,33314.4,46540.8,49841.5,33185.1,73028.2,39707.6,41723.4,30311.6,26156.6,33825.5,62614.2,49114.8,42106.8,29039.9,47265.5,48582.7,40435.2
RuleFour,12272.3,10802.8,14107.1,8385.6,24289.0,9705.1,6275.2,12444.3,6366.4,6893.0,11876.3,11679.4,4315.3,5042.9,9460.1,14121.0,9060.1,5975.7,7064.3,9834.5


![Rules 3-4 Comparison by Trials, Multi, Scatter](./imgs/multi_map_comparison_ruleThreeFourScatter.png)

![Rules 1-2 Difference by Trials, Multi, Plot](./imgs/multi_map_comparison_ruleThreeFourDiff.png)

![Rules 1-2 Difference by Trials, Multi, Box](./imgs/multi_map_comparison_ruleThreeFourDiffBox.png)

In [4]:
multimap_1varstats_ruleThreeFour = pd.read_csv('./data/multimap_1varstats_ruleThreeFour.csv', index_col=0)
multimap_1varstats_ruleThreeFour.transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
diff,20.0,31275.09,12658.964923,14477.2,21683.8,28748.0,39775.125,60583.9


Encouragingly, we see that **Rule 4 strictly outperforms Rule 3**, i.e. naively running with Rule 2, by a performance improvement of, on average, **31,275 fewer actions taken.** This includes costs of both queries and traveling.

The distribution of the performance improvement does still hvae a large spread, with improvements ranging from 14,477 fewer to as many as 60,584 fewer actions taken to find the target. However, the standard deviation is significantly lower than the mean, and in any case, all trials showed improvement over the baseline rule 3 agent.

## Further improvements via utility

Our strategy with the Rule 4 agent leverages the theory of utility in the sense that it balances negative utility from traveling (which raises the $h(i,j)$ value described earlier) with positive utility from potentially finding targets at cells (which lowers the $h(i,j)$ value described earlier). 

With the exponential weighting on the traveling cost, one could say our heuristic is more sensitive to changes in the traveling cost than it is to changes in the rule 2 probability. However, one could also say this **prioritizes immediate rewards over future rewards.**

That is, what about scenarios in which the agent might sacrifice some immediate utility by traveling further for higher rule 2 probability-cells instead of querying closer cells with lower rule 2 probabilities? What about if there were a cluster of many cells with rule 2 probability 0.6 all within 5 travel units away, and one cell with rule 2 probability 0.9 that was 15 travel units away? The former case would have heuristic $(1-0.6)^{\frac{1}{6}} = 0.8584$, and the latter case would have heuristic $(1-0.1)^{\frac{1}{16}} = 0.8660$. Assuming we only care about these two possibilities, our agent would choose to travel closer for the immediate but lower potential reward, instead of risking a long travel path for a higher potential reward.

Our rule 4 agent does not account for utility of clusters or total utilities of cells traveled along a path to reach the cell queried. Further improvements might build atop the 'naive' utility function in rule 4 in order to account for these factors.

# A drunk man

This old joke shines a light at a major problem with a lot of different algorithmic strategies to solve this search and destroy problem. The biggest problem in solving this assignment was that you could never be perfectly sure of a “true false.” Everything was based off probabilities. To that point, there were certain terrains that had lower probabilities of having a false negative. In the order of lowest probabilities of false negatives to highest: flat, hilly, forested, maze of caves.  When we would present probabilities to our probabilistic inference model using Bayes’ Theorem, probabilities of a false negative for flat was a lot smaller then that of a maze of caves. The intuition with algorithms like these is trying to gauge targets at flat or hilly surfaces before forested or a maze of caves terrains since its probability of finding a false negative was a lot higher. 

After collecting our own data on the Rule 1 vs Rule 2 comparison, we were able to justify the decision of the drunk man (jokingly). After running the implementations of each against one another, we were able to deduce on average, on multiple landscapes and on a fixed landscape, Rule 2 outperforms Rule 1 by at least a few hundred searches. In our own implementation, it seems that searching cells of terrains that have low false negative rates are better to search before searching terrains that may have high probabilities of containing the target, regardless of its terrain. 

There were cases where Rule 1 outperformed Rule 2 on certain terrains, but even at its worst case Rule 1 seemed to still not perform terribly worse than Rule 2. So even at its best and worst, Rule 2 seemed to be the winning algorithm in terms of success rates and efficiency. 

What also becomes known is that there will always be outliers with both rules that require either an extremely high amount of searches or an incredibly low amount of searches. Even though, the distribution on random maps over a fixed map seemed to lower the standard deviation significantly (i.e. 8418 searches vs 1722 searches), this is still a rather high standard deviation in itself. Variations for a search and destroy algorithm of this sort are always going to be rather large. 

The joke demonstrates how a drunk man (seemingly for this problem a low efficiency AI) seems to believe that it’s better to search somewhere where it’s easier to find a target (under the streetlight) rather than search areas where there is a higher rate of success (parking lot). But according to our findings from Rule 1 vs Rule 2 efficiency, it almost seems as if the drunk man has a point. Obviously, in the joke, the drunk man is wrong. But in truth for false negative target searching, it may in fact be better to search places where there is a higher probability of finding the target before searching places where this is a higher probability of containing a target. 

# A moving target

Now, consider a target which is not static in that:

1. Upon every failed query, the target moves to one of its neighbors at random.
2. Upon moving to one of its neighbors, some sensor returns an observation of terrain type ("FLAT" , "HILLY", etc.). However, the sensor is broken, so it returns some type that the new target is **not,** at random. 

For example, if we query a cell and do not find the target, and we get the new observation "HILLY", then we know that the target is now in a cell which is either "FLAT", "FOREST", or "MAZE".

## Implementation

Due to the object-oriented nature of our `Landscape` class, the target can easily be moved.

The below code moves the target to one of its neighbors at random, and returns a terrain type the new target is not, at random.

In [None]:
def moveTarget(self):
    x = self.target_x; y = self.target_y
    self.landscape[x][y].target = ABSENT

    nbrs = []
    for dx, dy in dirs:
        if 0 <= x + dx < self.dim and 0 <= y + dy < self.dim:
            nbrs.append((x + dx, y + dy))

    new_nbr = random.choice(nbrs)
    self.target_x = new_nbr[0]; self.target_y = new_nbr[1]
    self.landscape[self.target_x][self.target_y].target = PRESENT

    terrains = {FLAT, HILLY, FOREST, MAZE}
    new_terrain = {(self.landscape[self.target_x][self.target_y]).terrain}
    return random.choice(list(terrains.difference(new_terrain)))

## Strategy

With a moving target, our first problem that arrives is that we have no true knowledge base of where the target will move after each iteration. In our stationary target search, there is an eventual end where we will find the target since it must exist in one of the squares of the grid. So in the worst case scenario it will take dim x dim searches to find the target. In a moving target scenario, this is not the case. There is a possibility that the target will never be found. 

To deal with this problem, we are provided with the information of the tracker. The tracker tells us with 100% assurance what terrain the target will not be in. So there is a 1/3 likeliness it exists in either of the other three terrains. With this information we can knock out 1/4 of the cells (this isn’t always 1/4, it depends on the random allocation according to the given probabilities). After this, we are also provided with the information as to where the target moved next since it only moves to its neighbors. 

At the base step, we are provided with a matrix that has 1/(total cells) probability in each cell. After making our first random guess, we can update the table according to our original strategy of obtaining probabilities. Now, we have the probabilities of each of the given cells according to the probabilities we attained through our normal means using Bayes Theorem. But now, we will use the information we receive from the tracker to remove cells of the terrain that the tracker reported since we know for certain they do not contain the target. Now, we have a reduced cell matrix. We choose the cell with the highest probability of containing the cell. 

Now, after our first guess reports a failed search - the target has moved. We are told that the target has moved to a neighbor of its previous cell. Since we are not sure where the target was, we cannot assume its neighbors. But what we can do is look into the neighbors of the previous cell and lower some of their probabilities depending on if they have neighbors from the opposite side or not. For example, if the cell is neighbors with a corner cell then you can lower the probability of both the cell and its neighbor since there is no other way of reaching that cell other than through the suspected target cell. 

A lot of this information that we now have access to is useful in the sense we are better off with this information than not and we can leverage the information to find a certain target faster. But, I am not sure how much better the tracker information really is compared to not having it. There isn’t too much use into the tracker since we can never truly be sure where the target is so we must make many assumptions. 

## A moving target AND an agent with restricted movement