<center><h1>Grad Project #3</h1></center>

In this final project for 16.413 / 6.4132, we are going to look at the search and rescue domain using symbolic representations. We are going to look at both planning using PDDL, and also inference using PDDL. This project isn't going to focus on implementing specific algorithms however -- we're going to use two libraries: `pyperplan` and `sympy`. (You'll remember `sympy` from problem set 8). 

The project is structured as follows: 
* [0. Credit for Contributors]
* [1. Planning in Search and Rescue](!start)
    * 1.1 Search and Rescue Warmup 1
    * 1.2 Search and Rescue Warmup 2
    * 1.3 Search and Rescue PDDL Planner
* [2. Inference from observations](!inference)
    * 2.1 Inferring unknown values
    * 2.2 Belief update
* [3. Putting It Together](!integrated)
    * 3.1 Safe but not so smart
    * 3.2 Safe and smart
    * 3.3 Reckless
* [4. Analysis](!analysis)

## <a id="contributors">0. Credit for Contributors</a>

List the various students, lecture notes, or online resouces that helped you complete this project:

Ex: I worked with Bob on the inference.

<div class="alert alert-info">
Write your answer in the cell below this one.
</div>

In [None]:
--> *(double click on this cell to delete this text and type your answer here)*

## Imports and Utilities

In [None]:
!pip install pyperplan
!pip install sympy

In [2]:
import os
import time
import copy
import numpy as np
import pdb
from sympy import Symbol, And, Or, satisfiable
from pyperplan.pddl.parser import Parser
from pyperplan import grounding, planner
import tempfile


class State:
    """States have the following attributes:

    "robot": A (row, col) representing the robot's loc.
    "hospital": A (row, col) representing the hospital's loc.
    "carrying": The str name of a person being carried,
      or None, if no person is being carried.
    "people": A dict mapping str people names to (row, col)
      locs. If a person is being carried, they do not
      appear in this dict.
    "state_map": A numpy array of str 'C', 'F', 'S', and 'W',
      where 'C' represents free space, 'F' represents fire,
      'S' represents smoke, and 'W' represents an obstacle(wall).
      The robot may safely enter any cell that is clear (‘C’)
      or contains smoke (‘S’).
    """

    def __init__(self,
                 robot=None,
                 hospital=None,
                 carrying=None,
                 people=None,
                 state_map=None):
        default_state_map = np.array([['C', 'C', 'C', 'C', 'C', 'C', 'C'],
                                      ['C', 'W', 'W', 'C', 'C', 'W', 'W'],
                                      ['C', 'C', 'C', 'C', 'C', 'C', 'C'],
                                      ['C', 'C', 'W', 'C', 'C', 'C', 'C'],
                                      ['C', 'C', 'W', 'C', 'W', 'C', 'C'],
                                      ['C', 'C', 'C', 'C', 'C', 'W', 'C'],
                                      ['C', 'W', 'C', 'C', 'W', 'C', 'C']],
                                     dtype=str)
        default_robot = (0, 0)  # top left corner
        default_hospital = (6, 6)  # bottom right corner
        default_carrying = None
        default_people = {
            "p1": (4, 0),
            "p2": (6, 0),
            "p3": (0, 6),
            "p4": (3, 3)
        }
        self.state_map = state_map if state_map is not None else default_state_map
        self.robot = robot if robot is not None else default_robot
        self.hospital = hospital if hospital is not None else default_hospital
        self.carrying = carrying if carrying is not None else default_carrying
        self.people = people if people is not None else default_people

    def get_safe_grid(self):
        """
        "safe_grid": A grid map of boolean values where `True`
        indicate the locations where the robot are allowed to move into.

        Clear and Smoke grid cells are safe to enter
        """
        safe_grid = np.logical_or(self.state_map == "C", self.state_map == "S")
        return safe_grid

    def render(self, msg=None):
        height, width = self.state_map.shape
        state_arr = np.full((height, width), "  ", dtype=object)
        state_arr[self.state_map == 'W'] = "##"
        state_arr[self.state_map == 'F'] = "XX"
        state_arr[self.state_map == 'S'] = "||"
        state_arr[self.state_map == 'U'] = "??"
        state_arr[self.hospital] = "Ho"
        state_arr[self.robot] = "Ro"
        # Draw the people not at the hospital
        for person, loc in self.people.items():
            if loc == self.hospital:
                continue
            elif loc == self.robot:
                person = "R" + person[-1]
            state_arr[loc] = person
        # Add padding
        padded_state_arr = np.full((height + 2, width + 2), "##", dtype=object)
        padded_state_arr[1:-1, 1:-1] = state_arr
        state_arr = padded_state_arr
        carrying_str = f"Carrying: {self.carrying}"
        # Print
        if msg:
            print(msg)
        for row in state_arr:
            print(''.join(row))
        print(carrying_str)
        print()

    def copy(self):
        state_copy = copy.copy(self)
        state_copy.state_map = self.state_map.copy()  # copy the numpy array
        state_copy.people = self.people.copy()
        return state_copy


class SearchAndRescueProblem:
    """Defines a search and rescue (SAR) problem.

    In search and rescue, a robot must navigate to, pick up, and
    drop off people that are in need of help.

    Actions are strs. The following actions are defined:
      "up" / "down" / "left" / "right" : Moves the robot. The
        robot cannot move into obstacles or off the map.
      "pickup-{person}": If the robot is at the person, and if
        the robot is not already carrying someone, picks them up.
      "dropoff": If the robot is carrying a person, they are
        dropped off at the robot's current location.
      "look...": later we'll allow these actions, but they
        have no effect on the state.

    This structure serves as a container for a transition model
    "get_next_state(state, action)", an observaton model "get_observation(state)"
    and an action model "get_legal_actions(state)"

    Example usage:
      problem = SearchAndRescueProblem()
      state = State()
      state.render()
      action = "down"
      next_state = problem.get_next_state(state, action)
      next_state.render()
    """

    def __init__(self):
        self.action_deltas = {
            "up": (-1, 0),
            "down": (1, 0),
            "left": (0, -1),
            "right": (0, 1),
        }

    @staticmethod
    def is_valid_location(loc_r, loc_c, state, verbose=False):
        if not (0 <= loc_r < state.state_map.shape[0] and
                0 <= loc_c < state.state_map.shape[1]):
            if verbose:
                print(
                    "WARNING: attempted to move out of bounds, action has no effect."
                )
            return False
        if not state.get_safe_grid()[loc_r, loc_c]:
            if verbose:
                print(
                    "WARNING: attempted to move into an obstacle/unsafe region, action has no effect."
                )
            return False
        return True

    @staticmethod
    def get_legal_actions(state):
        legal_actions = ["up", "down", "left", "right", "dropoff"]
        for person in state.people:
            legal_actions.append(f"pickup-{person}")
        return legal_actions

    def get_next_state(self, state, action, verbose=False):
        legal_actions = self.get_legal_actions(state)
        if action not in legal_actions and not action.startswith('look'):
            raise ValueError(
                f"Unrecognized action {action}. Actions must be one of: {legal_actions}"
            )

        if action in ["up", "down", "left", "right"]:
            dr, dc = self.action_deltas[action]
            r, c = state.robot
            if not self.is_valid_location(
                    r + dr, c + dc, state, verbose=verbose):
                if verbose:
                    print(f"Action {action} is invalid in {state}.")
                return state, False
            new_state = state.copy()
            new_state.robot = (r + dr, c + dc)
            return new_state, True

        elif action.startswith("pickup"):
            person = action.split("-")[1]
            if state.carrying is not None:
                if verbose:
                    print(
                        "WARNING: attempted to pick up a person while already carrying someone, action has no effect."
                    )
                return state, False
            if person not in state.people or (state.people[person] !=
                                              state.robot):
                if verbose:
                    print(
                        "WARNING: attempted to pick up a person not at the robot location, action has no effect."
                    )
                return state, False
            new_state = state.copy()
            del new_state.people[person]
            new_state.carrying = person
            return new_state, True

        elif action == "dropoff":
            if state.carrying is None:
                if verbose:
                    print(
                        "WARNING: attempted to dropoff while not carrying anyone, action has no effect."
                    )
                return state, False
            person = state.carrying
            new_state = state.copy()
            new_state.carrying = None
            new_state.people[person] = state.robot
            return new_state, True

        elif action.startswith('look'):
            return state, True

        else:
            raise KeyError

    def get_observation(self, state):
        """Return the states of the adjacent (non-wall) grid squares."""
        height, width = state.state_map.shape
        deltas = self.action_deltas
        r, c = state.robot
        observation = {(r, c): state.state_map[r, c]}
        for direction, (dr, dc) in deltas.items():
            nr = r + dr
            nc = c + dc
            if not (0 <= nr < height and 0 <= nc < width):
                continue
            if state.state_map[nr, nc] == "W":
                continue
            observation[(nr, nc)] = state.state_map[nr, nc]
        return observation


def execute_plan(problem, plan, state):
    for action in plan:
        state.render(msg=f'execute_plan: {action}')
        # Resulting state
        state, valid = problem.get_next_state(state, action)
        assert valid, ('Attempted to execute invalid action '+ state + ' ' + action)
    state.render(msg=f'execute_plan: Final state')
    return state


def agent_loop(problem, initial_state, policy, initial_belief, max_steps=200):
    """See MP01 introduction."""
    state = initial_state
    state.render(msg='initial state')
    belief = initial_belief
    belief.render(msg='initial belief')
    # An initial observation
    observation = problem.get_observation(state)
    print('Initial observation', observation)
    # Update the belief, first with transition, then with observation
    belief = belief.update(problem, observation)
    belief.render(msg='new belief')
    for step in range(max_steps):
        action = policy(belief)
        if action in ('*Success*', '*Failure*'):
            print('Terminate with', action)
            return action, state, belief
        # Resulting state
        state, valid = problem.get_next_state(state, action)
        assert valid, 'Attempted to execute invalid action'
        # Get observation of grid squares around the robot
        observation = problem.get_observation(state)
        # Update the belief, first with transition, then with observation
        belief = belief.update(problem, observation, action)
        print('agent_loop: step', step, 'action', action, 'observation',
              observation)
        state.render(msg='new state')
        belief.render(msg='new belief')
    return '*Failure*', state, belief


def get_num_delivered(state):
    """Returns the number of people located in the hospital."""
    num_delivered = 0
    for loc in state.people.values():
        if loc == state.hospital:
            num_delivered += 1
    return num_delivered


def execute_count_num_delivered(problem, state, plan):
    """Execute a plan for search and rescue and count the number of people
    delivered.

    Args:
      problem: A SearchAndRescueProblem
      plan: A list of action strs, see SearchAndRescueProblem.

    Returns:
      num_delivered: int
    """
    state = execute_plan(problem=problem, plan=plan, state=state)
    return get_num_delivered(state)


def run_planning(domain_pddl_str,
                 problem_pddl_str,
                 search_alg_name,
                 heuristic=None):
    """Plan a sequence of actions to solve the given PDDL problem.

    This function is a lightweight wrapper around pyperplan.

    Args:
      domain_pddl_str: A str, the contents of a domain.pddl file.
      problem_pddl_str: A str, the contents of a problem.pddl file.
      search_alg_name: A str, the name of a search algorithm in
        pyperplan. Options: astar, wastar, gbf, bfs, ehs, ids, sat.
      heuristic: A str or a pyperplan `Heuristic` class.
        A str, the name of a heuristic in pyperplan.
          Options: blind, hadd, hmax, hsa, hff, lmcut, landmark.
        A pyperplan `Heuristic` class.
          See: https://github.com/aibasel/pyperplan/blob/main/doc/documentation.md#implementing-new-heuristics

    Returns:
      plan: A list of actions; each action is a pyperplan Operator.
    """
    # Parsing the PDDL
    domain_file = tempfile.NamedTemporaryFile(delete=False, dir='.')
    problem_file = tempfile.NamedTemporaryFile(delete=False, dir='.')
    with open(domain_file.name, 'w') as f:
        f.write(domain_pddl_str)
    with open(problem_file.name, 'w') as f:
        f.write(problem_pddl_str)
    parser = Parser(domain_file.name, problem_file.name)
    domain = parser.parse_domain()
    problem = parser.parse_problem(domain)
    os.remove(domain_file.name)
    os.remove(problem_file.name)

    # Ground the PDDL
    task = grounding.ground(problem)

    # Get the search alg
    search_alg = planner.SEARCHES[search_alg_name]

    if heuristic is None:
        return search_alg(task)

    if isinstance(heuristic, str):
        # Get the heuristic from pyperplan
        heuristic_initialized = planner.HEURISTICS[heuristic](task)
    else:
        # Use customized heuristic
        heuristic_initialized = heuristic(task)

    # Run planning
    return search_alg(task, heuristic_initialized)


# Test Cases

# First problem
P1_B0 = np.array([["U", "U", "U", "U", "U"], ["U", "U", "U", "U", "U"],
                  ["U", "U", "U", "U", "U"], ["U", "U", "U", "U", "U"],
                  ["U", "U", "U", "U", "U"], ["U", "U", "U", "U", "U"]])

P1_B1 = np.array([["C", "S", "C", "C", "C"], ["S", "U", "U", "U", "U"],
                  ["S", "U", "U", "U", "U"], ["S", "U", "U", "U", "U"],
                  ["C", "U", "U", "U", "U"], ["C", "C", "C", "C", "C"]])

P1_G0 = np.array([["C", "S", "C", "C", "C"], ["S", "F", "S", "C", "C"],
                  ["S", "F", "S", "S", "S"], ["S", "F", "F", "F", "F"],
                  ["C", "S", "S", "S", "S"], ["C", "C", "C", "C", "C"]])

# Second problem
P2_B1 = np.array([["C", "S", "C", "C", "C"], ["S", "U", "U", "C", "U"],
                  ["S", "U", "U", "C", "U"], ["S", "U", "U", "U", "U"],
                  ["C", "U", "U", "C", "U"], ["C", "C", "C", "C", "C"]])

P2_G0 = np.array([["C", "S", "C", "C", "C"], ["S", "F", "S", "C", "C"],
                  ["S", "F", "S", "C", "S"], ["S", "F", "F", "S", "F"],
                  ["C", "S", "S", "C", "S"], ["C", "C", "C", "C", "C"]])


def test_policy(belief_map, true_map, problem, policy):
    """Test a policy on a SearchAndRescue problem.

    Args:
        belief_map: A numpy array specifying the belief map
        true_map:   A numpy array specifying the state map
        problem:    A SearchAndRescueProblem instance
        policy:     A policy returned by a policy making fn.
                    e.g. make_planner_policy(problem, planner)
    """
    height, width = true_map.shape
    bottom, right = height - 1, width - 1
    robot = (0, right)
    hospital = (bottom, right)
    people = {'pp': (bottom, right - 1)}  # Peter Parker
    carrying = None
    # Environment state
    env_state = State(robot=robot,
                      hospital=hospital,
                      people=people,
                      carrying=carrying,
                      state_map=true_map)
    # Initial belief: omniscient
    b0 = BeliefState(robot=robot,
                     hospital=hospital,
                     people=people,
                     carrying=carrying,
                     state_map=belief_map)
    # Do it
    return agent_loop(problem, env_state, policy, b0)





# 1. Planning in Search and Rescue 

Recall our "search and rescue" robot who is charged with navigating a sometimes dangerous grid to find and help people in need. A lot of the problems we will look at here will look familiar from the previous projects and some of the homeworks, but there are some key differences. Unlike in project 1, we will restrict ourselves to a deterministic problem domain; we will however look at both fully observed and partially observed cases.

Let us first focus on a single planning problem in the search-and-rescuedomain, illustrated below:

![search and rescue problem](sar_problem.png)

This problem features four "people" (bears) with names p1, p2, p3, and p4. A robot, initialized in the top left corner, should navigate to each person, pick them up one by one, and deliver them to the hospital (bottom right).

* We always know the locations of the people, the robot, and the hospital. 
* Some locations may have walls in them.  You know about all the walls in advance, as well.
* There is also fire!  And smoke!  But, initially, you may not know which locations have fire and/or smoke.  Nonetheless, you need to move around in the domain, make observations, do belief updates, make 
plans, and rescue bears.
* The way the environment works, whenever the robot enters a grid cell, it observes the true environment state of all the neighboring cells.  Each environment cell contains exactly one of:  wall ('W'), fire ('F'), smoke ('S') or nothing (clear, 'C').
* The robot may safely enter any cell that is clear ('C') or contains smoke ('S').

We will first consider planning to navigate to, pick up, and drop off people at a hospital, and we are going to ask you to put this all together into a planning and execution system!

Take a Look at the code .  `State` is a class with the following attributes:

* "state_map": a 2D numpy array of characters 'W', 'F', 'S', 'C'.
* "robot": A (row, col) representing the robot's loc.
* "hospital": A (row, col) representing the hospital's loc.
* "carrying": The str name of a person being carried, or None, if no person is being carried.
* "people": A dict mapping str people names to (row, col) locs. If a person is being carried, they do not appear in this dict.

States have a couple of useful methods: `render` prints a representation of the state and `copy` does what you would expect. `get_safe_grid` returns a boolean numpy array where True represents a safe space (not fire or wall).

Actions are strs. The following actions are defined:
* "up" / "down" / "left" / "right" : Moves the robot. The robot cannot move into obstacles or off the map.
* "pickup-{person}": If the robot is at the person, and if the robot is not already carrying someone, picks.
* "dropoff": If the robot is carrying a person, they are dropped off at the robot's current location.  *Allow for there being multiple dropoff locations, even though we will only dropoff at hospitals in this example.*

Please now take a moment to read the docstring for `SearchAndRescueProblem` to make sure that you understand the state and action spaces.

Finally, we're going to use a Python PDDL planner called `pyperplan` [link](https://github.com/aibasel/pyperplan) to find our plans. 

Let's familiarize ourselve with State and SearchAndRescueProblem.

### 1.1 Search and Rescue Warmup 1

Let's make sure we can access fields of the SearchAndRescue State. Please write a function to check if a row and col have an obstacle in a SearchAndRescue State.

For reference, our solution is **1** line(s) of code.

In [3]:
def sar_warmup1(sar_state, row, col):
    """Check if a row and col have an obstacle in a SearchAndRescueProblem
    state.

    Args:
      sar_state: A SearchAndRescue State.
      row: An int.
      col: An int.

    Returns:
      has_obstacle: True if (row, col) has an obstacle(wall) in sar_state.
    """
    raise NotImplementedError() 

### 1.2 Search and Rescue Warmup 2

Let's make sure we know how to encode a plan. Please write a function that returns a hand-coded list of actions that will deliver person 'p1' (in the image above) to the hospital location. (You'll need to work out the plan for yourself -- don't use `pyperplan` yet!)

For reference, our solution is **1** line(s) of code.

In [4]:
def sar_warmup2():
    """Hand-code a list of actions that will deliver person 'p1' to the
    hospital location.

    Returns:
      actions: A list of str actions that will take person p1 to the hospital loccation.
    """
    raise NotImplementedError() 

### Tests

In [None]:
def sar_warmup_test2():
    problem = SearchAndRescueProblem()
    plan = sar_warmup2()
    state = execute_plan(problem, plan, State())
    assert state.people["p1"] == (6, 6)

sar_warmup_test2()

print('Tests passed.')

### 1.3 Search and Rescue PDDL Planner


Now that you're warmed up, let's try making a planner to solve a SearchAndRescueProblem!

The core function in this planner class is 'get_plan'. This function needs to do the following:
1. Create PDDL domain and problem strings for search and rescue. The operators should work for any grid size, obstacles, people locations, and hospital location.
2. Invoke `run_planning` using the given `search_algo` search algorithm with the `heuristic` heuristic.
3. Convert the output of run_planning (pyperplan Operators) into actions that can be executed, via `execute_plan`.

We have given you most of the structure of the needed functions, but you will need to look (carefully!) through the provided python to find the `TODO` sections that you need to complete. 

For reference, 'get_plan' takes ~1-2 seconds to run with our implementation if using 'gbf' search and 'hff' heuristic. To get credit on gradescope, make sure that your function finishes in <10 seconds.

**Notes**:
* In this problem, you will need to construct somewhat complicated strings.  We *strongly* encourage you to read about [Python-3 f-strings](https://www.digitalocean.com/community/tutorials/how-to-use-f-strings-to-create-strings-in-python-3) which make this process much easier than the alternatives.
* You may find `state.render()` useful for debugging.
* We also highly recommend printing out the domain and problem after they have been created, and copying them into [editor.planning.domains](http://editor.planning.domains) to check whether it's possible to find a plan. This editor can be helpful for syntax checking.
* We also recommend writing careful test cases for yourself --- it's really easy to forget preconditions or effects. When Nick was debugging this, he forgot to make sure that the robot was at the location of the person, so the plans were (confusingly) super-short! 
* The image above with the robot and the bears is a faithful depiction of the initial state. For example, the initial locations of the people are: "p1": (4, 0), "p2": (6, 0), "p3": (0, 6), "p4": (3, 3).
* One part of this problem that may be initially counterintuitive is the way that we'll represent locations in PDDL. In the problem, a location is a tuple of integers. PDDL does not support such representations -- everything needs to be just an object with a string name.
So to represent a location like (3, 5), we will make a string "l3-5" (where the first character there is a lowercase L), and we'll create an object with that name, of type "location". We will also need a way to encode the fact that the robot can only move between adjacent locations in the grid.
In Python, we can compare the numeric values of locations like (3, 5) and (3, 6) to see if they are neighbors. But in PDDL, all we have are the objects with string names, and we need to encode everything in terms of predicates. So, we will create a predicate `(conn ?v0 - location ?v1 - location ?v2 - direction)`, which says that location `?v0` is connected to location `?v1` in direction `?v2`. For example, `(conn l3-5 l3-6 right)` might appear in the initial state. We can then use these `conn` predicates in the preconditions of a `move` operator to encode the fact that the robot can only move between adjacent locations.
* We do not recommend modelling the hospital explicitly with special objects / types / predicates. Instead, the goal should be to deliver all people to the hospital, that is, `l6-6`.
In words, the goal should be "person1 is at l6-6 and person2 is at l6-6 and person3 is at l6-6 and person4 is at l6-6."

In [6]:
class SearchAndRescuePlanner:
    """A planner for a search and rescue problem.

    The core function in this class is 'get_plan'
    This function does the following:
        1. Create PDDL domain and problem strings for search and rescue. The operators should work for any grid size, obstacles, people locations, and hospital location.
        2. Invoke `run_planning` using the given `search_algo` search algorithm with the `heuristic` heuristic.
        3. Convert the output of run_planning (pyperplan Operators) into actions
           that can be given to the SearchAndRescueProblem.

    Example Usage:
        problem = SearchAndRescueProblem()
        state = State()

        planner = SearchAndRescuePlanner(search_algo='astar', heuristic='lmcut')
        plan, plan_time = planner.get_plan(state)
        state = execute_plan(problem, plan, state)

    'get_plan' Returns:
        plan: A list of actions; each action is a str, see SearchAndRescueProblem.
        plan_time: Total planning time(sec) used for plan searching.

    For reference, 'get_plan' takes ~1-2 seconds to run with our implementation if using 'gbf' search and 'lmcut' heuristic.
    """

    def __init__(self, search_algo='astar', heuristic='lmcut'):
        self.search_algo = search_algo
        self.heuristic = heuristic

    def generate_domain_pddl(self,
                             domain_name,
                             added_operators='',
                             added_predicates=''):
        # <<< TODO: fill in missing parts in the PDDL domain below >>>
        predicates_str = """(conn ?v0 - location ?v1 - location ?v2 - direction)
        (is-clear ?v0 - location)
        ; TODO: write more here
        
        """
        
        # <<< TODO: fill in missing parts in the PDDL domain below >>>
        operators_str = """(:action move-robot
    :parameters (?from - location ?to - location ?dir - direction)
    :precondition (and
      (conn ?from ?to ?dir)
      ; TODO: write more here
      
    )
    :effect (and
      ; TODO: write more here
      
    )
  )
  (:action pickup-person
    :parameters (?person - person ?loc - location)
    :precondition (and
      ; TODO: write more here
     
    )
    :effect (and
      ; TODO: write more here
     
    )
  )
  (:action dropoff-person
    :parameters (?person - person ?loc - location)
    :precondition (and
      ; TODO: write more here
      
    )
    :effect (and
      ; TODO: write more here
      
    )
  )"""

        domain_pddl = f"""(define (domain {domain_name})
    (:requirements :typing)
    (:types person location direction)
    (:constants
      down - direction
      left - direction
      right - direction
      up - direction
    )
    (:predicates
      {predicates_str}
      {added_predicates}
    )
    {operators_str}
    {added_operators}
)"""
        return domain_pddl

    def get_plan(self, state):
        search_algo, heuristic = self.search_algo, self.heuristic
        domain_name, added_predicate, added_operator = self.update_pddl_domain()
        domain_pddl = self.generate_domain_pddl(
            domain_name,
            added_operators=added_operator,
            added_predicates=added_predicate)
        # Create objects str
        obj_str = self.get_obj_strs(state)

        # Create init str
        init_str = self.get_init_strs(state)

        # Create goal str
        goal_str = self.get_goal_strs(state)

        problem_pddl = f"""(define (problem searchandrescue) (:domain {domain_name})
      (:objects
      {obj_str}
      )
      (:init
      {init_str}
      )
      (:goal (and {goal_str}))
    )"""

        start_time = time.time()
        plan = run_planning(domain_pddl, problem_pddl, search_algo, heuristic)
        time_elapsed = time.time() - start_time
        if plan is None:
            print("Failed to find a plan.")
            return None, time_elapsed

        # Convert operators to actions
        actions = self.parse_plan(plan)
        return actions, time_elapsed

    def get_obj_strs(self, state):
        height, width = state.state_map.shape
        objects_strs = [f"{person} - person" for person in state.people]
        # <<< TODO: add object strs for locations >>>
       
        
        if state.carrying is not None:
            objects_strs.append(f"{state.carrying} - person")
        objects_str = " ".join(objects_strs)
        return objects_str

    def get_init_strs(self, state):
        height, width = state.state_map.shape
        robot_r, robot_c = state.robot
        init_strs = [f"(robot-at l{robot_r}-{robot_c})"]
        for person, (r, c) in state.people.items():
            init_strs.append(f"(person-at {person} l{r}-{c})")
        if state.carrying is not None:
            init_strs.append(f"(carrying {state.carrying})")
        else:
            init_strs.append("(handsfree)")
            
        deltas = {
            "up": (-1, 0),
            "down": (1, 0),
            "left": (0, -1),
            "right": (0, 1),
        }
        
        safe_grid = state.get_safe_grid()
        for r in range(height):
            for c in range(width):
                # Here we're going to add one (conn ...) atom for every pair
                # of adjacent locations.
                for direction, (dr, dc) in deltas.items():
                    if not (0 <= r + dr < height and 0 <= c + dc < width):
                        continue
                    # For example, if r == 0, c == 0, dr == 0, dc == 1, then
                    # this line adds the atom (conn l0-0 l0-1 right).
                    init_strs.append(
                        f"(conn l{r}-{c} l{r + dr}-{c + dc} {direction})")
                # <<< TODO: add more init strs >>>
                

        init_str = " ".join(init_strs)
        return init_str

    def get_goal_strs(self, state):
        goal_strs = []
        hospital_r, hospital_c = state.hospital
        # <<< TODO: add goal strs >>>

        

        if state.carrying is not None:
            # <<< TODO: add goal strs >>>
            
            pass
        goal_str = " ".join(goal_strs)
        return goal_str

    def update_pddl_domain(self):
        domain_name = 'searchandrescue'
        added_predicate = ''
        added_operator = ''
        return domain_name, added_predicate, added_operator

    def parse_plan(self, plan):
        actions = []
        for op in plan:
            if "move-robot" in op.name:
                _, direction = op.name[:-1].rsplit(" ", 1)
                action = direction
            elif "pickup-person" in op.name:
                _, person, _ = op.name.split(" ")
                action = f"pickup-{person}"
            else:
                assert "dropoff-person" in op.name
                action = "dropoff"
            actions.append(action)
        return actions

### Tests

In [None]:
def sar_test():
    problem = SearchAndRescueProblem()
    planner = SearchAndRescuePlanner(search_algo="gbf", heuristic="hff")
    state = State()
    plan, plan_time = planner.get_plan(state)
    assert execute_count_num_delivered(problem=problem, state=state,
                                       plan=plan) == 4

sar_test()

print('Tests passed.')

# <a id="inference">2. Inference from observations</a>


Now, let's look at the inference problem.  This is similar to the problem we saw in homework 8. We will consider several problems with varying grid sizes and different sets of observations. For example, consider the grid below:
```
# Fire, Unknown, Clear, Smoke, Wall
GRID0 = np.array([
  ["F", "U", "C"],
  ["W", "C", "U"],
  ["U", "U", "C"]
], dtype=object)
```
This grid has 9 locations and 5 observations: there is fire in the top left, wall below it, and the center, top right, and bottom right locations are all known to be clear of smoke or fire.

We will assume the following axioms:
1. Each location has exactly one of {smoke, fire, clear, wall}.

2. There is smoke at a location only if there is a fire in at least one of the adjacent (above, below, left, right) locations. Diagonals are not adjacent!

3. There is smoke _or_ fire at a location if there is a fire in at least one of the adjacent locations, unless it's known to be 'W'.

Take a moment to run your human inference engine: which unknown values in the grid above can be determined?


### 2.1 Inferring unknown values 

Please write a program that takes a grid as input and infers unknown values.

Your program should output a new grid with all determinable unknown values replaced with the inferred value. If an unknown value cannot be determined, it should be left unknown.

**Your program should use sympy.**

In [8]:
def infer_unknown_values(grid):
    """Fill in any unknown values in the grid that can be inferred.

    Args: grid: A list of lists of "F", "U", "S", "W", or "C".
    Returns:
      inferred_grid: A copy of grid with some unknown values replaced.

    Example:
      >> grid = [
      >>   ["F", "U", "C"],
      >>   ["W", "C", "U"],
      >>   ["U", "U", "C"]
      >> ]
      >> infer_unknown_values(grid)
      >> [["F" "S" "C"]
      >>  ["W" "C" "C"]
      >>  ["U" "U" "C"]]
    """
    raise NotImplementedError() 

### Tests

In [None]:

assert infer_unknown_values([["U", "F"]]) == [["U", "F"]]


assert infer_unknown_values([["F", "U", "C"], ["S", "C", "U"], ["U", "U", "C"]]) == [["F", "S", "C"], ["S", "C", "C"], ["U", "U", "C"]]


assert infer_unknown_values([["U", "C", "C"], ["S", "C", "U"], ["U", "U", "C"]]) == [["C", "C", "C"], ["S", "C", "C"], ["F", "S", "C"]]


assert infer_unknown_values([["U", "S", "C", "U"], ["U", "U", "C", "U"], ["U", "S", "C", "U"]]) == [["F", "S", "C", "C"], ["U", "U", "C", "C"], ["F", "S", "C", "C"]]


assert infer_unknown_values([["U", "U", "C", "U", "U", "U", "U", "U"], ["C", "U", "U", "U", "U", "U", "U", "U"], ["U", "U", "U", "U", "U", "U", "U", "U"], ["U", "U", "U", "U", "U", "U", "C", "C"], ["U", "U", "U", "U", "U", "U", "C", "C"], ["U", "C", "U", "U", "U", "U", "U", "U"], ["U", "U", "U", "F", "U", "U", "U", "U"], ["U", "U", "U", "U", "U", "U", "U", "U"]]) == [["C", "C", "C", "U", "U", "U", "U", "U"], ["C", "U", "U", "U", "U", "U", "U", "U"], ["U", "U", "U", "U", "U", "U", "U", "U"], ["U", "U", "U", "U", "U", "U", "C", "C"], ["U", "U", "U", "U", "U", "U", "C", "C"], ["U", "C", "U", "U", "U", "U", "U", "U"], ["U", "U", "U", "F", "U", "U", "U", "U"], ["U", "U", "U", "U", "U", "U", "U", "U"]]


assert infer_unknown_values([["C", "U", "C", "U", "U", "C", "U"], ["U", "W", "W", "U", "C", "W", "W"], ["U", "F", "U", "U", "U", "F", "U"], ["C", "S", "W", "C", "U", "U", "U"], ["U", "U", "W", "U", "W", "U", "U"], ["C", "C", "U", "C", "U", "W", "U"], ["U", "W", "C", "U", "W", "U", "C"]]) == [["C", "C", "C", "C", "C", "C", "C"], ["C", "W", "W", "C", "C", "W", "W"], ["S", "F", "U", "U", "S", "F", "U"], ["C", "S", "W", "C", "U", "U", "U"], ["C", "C", "W", "C", "W", "U", "U"], ["C", "C", "C", "C", "C", "W", "U"], ["C", "W", "C", "C", "W", "C", "C"]]


assert infer_unknown_values([["C", "U", "C", "U", "U", "C", "U"], ["U", "W", "W", "U", "C", "W", "W"], ["U", "F", "U", "U", "U", "F", "U"], ["C", "S", "W", "C", "U", "F", "U"], ["U", "U", "W", "U", "W", "U", "U"], ["C", "C", "U", "C", "U", "W", "F"], ["U", "W", "C", "U", "W", "U", "U"]]) == [["C", "C", "C", "C", "C", "C", "C"], ["C", "W", "W", "C", "C", "W", "W"], ["S", "F", "U", "U", "S", "F", "U"], ["C", "S", "W", "C", "S", "F", "U"], ["C", "C", "W", "C", "W", "U", "U"], ["C", "C", "C", "C", "C", "W", "F"], ["C", "W", "C", "C", "W", "U", "U"]]
print('Tests passed.')

### 2.2 Belief update

Now let us use our ability to infer unknown values to finish the implementation of the update method for BeliefState.

Make sure to look at the utilities defined at the top of this notebook, although you may not need to use all of them.

In [10]:
class BeliefState(State):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        if "state_map" not in kwargs:
            self.state_map = np.array([['U', 'U', 'U', 'U', 'U', 'U', 'U'],
                                       ['U', 'W', 'W', 'U', 'U', 'W', 'W'],
                                       ['U', 'U', 'U', 'U', 'U', 'U', 'U'],
                                       ['U', 'U', 'W', 'U', 'U', 'U', 'U'],
                                       ['U', 'U', 'W', 'U', 'W', 'U', 'U'],
                                       ['U', 'U', 'U', 'U', 'U', 'W', 'U'],
                                       ['U', 'W', 'U', 'U', 'W', 'U', 'U']],
                                      dtype=str)

    def update(self, problem, obs, action=None):
        """
        problem: SearchAndRescueProblem instance
        obs: {loc: entry, loc: entry,...}
        act: string or None

        # <<< TODO: >>>
            1. Do transition from action (if any)
            2. Update from observation
            3. Do inference
        """
        raise NotImplementedError() 

    def get_optimistic_state(self):
        """Returns a copy of the belief with a completed map in which Unknowns
        are assumed to be Clear."""
        new_state = self.copy()
        new_state.state_map[self.state_map == 'U'] = 'C'
        return new_state

    def get_careful_state(self):
        """Returns a copy of the belief.

        Unknown states will not be treated as safe, see get_safe_grid.
        """
        return self.copy()

### Tests

In [None]:
def beliefupdate_test1():
    state_map = np.array([["C", "S", "C", "C", "C"], ["S", "F", "S", "C", "C"],
                          ["S", "F", "S", "S", "S"], ["S", "F", "F", "F", "F"],
                          ["C", "S", "S", "S", "S"], ["C", "C", "C", "C", "C"]])
    beliefstate_map = np.array([["U", "U", "U", "U", "U"],
                                ["U", "U", "U", "U", "U"],
                                ["U", "U", "U", "U", "U"],
                                ["U", "U", "U", "U", "U"],
                                ["U", "U", "U", "U", "U"],
                                ["U", "U", "U", "U", "U"]])
    problem = SearchAndRescueProblem()
    state = State(state_map=state_map)
    bel = BeliefState(state_map=beliefstate_map)
    observation = problem.get_observation(state)
    new_bel = bel.update(problem, observation)
    assert new_bel.robot == (0, 0)
    assert new_bel.state_map.tolist() == [['C', 'S', 'U', 'U', 'U'],
                                          ['S', 'U', 'U', 'U', 'U'],
                                          ['U', 'U', 'U', 'U', 'U'],
                                          ['U', 'U', 'U', 'U', 'U'],
                                          ['U', 'U', 'U', 'U', 'U'],
                                          ['U', 'U', 'U', 'U', 'U']]

beliefupdate_test1()


def beliefupdate_test2():
    state_map = np.array([["C", "S", "C", "C", "C"], ["S", "F", "S", "C", "C"],
                          ["S", "F", "S", "S", "S"], ["S", "F", "F", "F", "F"],
                          ["C", "S", "S", "S", "S"], ["C", "C", "C", "C", "C"]])
    beliefstate_map = np.array([["U", "U", "U", "U", "U"],
                                ["S", "U", "U", "U", "U"],
                                ["U", "U", "U", "U", "U"],
                                ["U", "U", "U", "U", "U"],
                                ["U", "U", "U", "U", "U"],
                                ["U", "U", "U", "U", "U"]])
    problem = SearchAndRescueProblem()
    state = State(state_map=state_map)
    bel = BeliefState(state_map=beliefstate_map)

    new_state, _ = problem.get_next_state(state, 'down')
    observation = problem.get_observation(new_state)
    new_bel = bel.update(problem, observation, 'down')
    assert new_bel.robot == (1, 0)
    assert new_bel.state_map.tolist() == [['C', 'S', 'U', 'U', 'U'],
                                          ['S', 'F', 'U', 'U', 'U'],
                                          ['S', 'U', 'U', 'U', 'U'],
                                          ['U', 'U', 'U', 'U', 'U'],
                                          ['U', 'U', 'U', 'U', 'U'],
                                          ['U', 'U', 'U', 'U', 'U']]

beliefupdate_test2()

print('Tests passed.')

# <a id="integrated">3. Putting It Together</a>

It's time to bring the planning and the inference together! Look at the procedure `agent_loop` that takes as input:

* "initial_state": (a true state of the environment, which includes a state map, as well as the current locations of the hospital, the robot and the people and whether the robot is carrying someone.
* "initial_belief" (an initial belief state, which is like an environment state, except that the state map may include characters 'U' indicating that the contents of a location is unknown)
* "policy" : a procedure that takes in a belief state and returns the next action to take;  the action can be one of the original actions or '*Success*' (which means the goal has been achieved and all the people are at the hospital) or '*Failure*' (which means that the agent is certain that the goal is impossible to achieve.)  
* "max_steps" : just a total number of steps to run the simulation to avoid infinite loops

We are going to ask you to write five different policies for this domain.

1. Safe but not so smart
1. Safe and smart
1. Reckless
1. Safe and smart if possible, else reckless
1. Looks before it leaps

### 3.1 Safe but not so smart

The agent is scared and just running directly to the hospital. However, it at least takes observations into account as it moves.

Let's make an agent that:

* Updates the belief state based on every observation using propositional inference (implement the `belief.update` method that gets called in `agent_loop`).

* Executes a policy that, given the currently updated belief, checks to see whether it can move safely into a square that is closer (in Manhattan distance) to the hospital.  If so, it returns the action that would move it closer.  If it reaches the hospital, it returns `*Success*`.  If it cannot safely make a move (due to walls or fire) closer to the hospital, it waves its arms in anguish and returns `*Failure*`.


In [13]:
def make_greedy_policy(problem):
    def policy(belief):
        """Returns an action or '*Failure*"""
        # TODO: complete
       

    # return the policy function
    return policy

### Tests

In [None]:
def policy_test1():
    problem = SearchAndRescueProblem()
    policy = make_greedy_policy(problem)

    # Empty map
    state = State()
    state.state_map[:, :] = 'C'
    bel = BeliefState()
    bel.state_map[:, :] = 'C'

    s_or_f, final_state, final_bel = agent_loop(problem, state, policy, bel)
    print('Final robot location', final_state.robot)
    assert s_or_f == '*Success*' and final_state.robot == final_state.hospital

policy_test1()

def policy_test2():
    problem = SearchAndRescueProblem()

    # Use default map
    state = State()
    bel = BeliefState()

    policy = make_greedy_policy(problem)
    s_or_f, final_state, final_bel = agent_loop(problem, state, policy, bel)
    r, c = final_state.robot
    hr, hc = final_state.hospital
    distance = abs(hr - r) + abs(hc - c)
    print('Final robot location', final_state.robot)
    print('Final distance =', distance)
    assert distance < 12

policy_test2()

print('Tests passed.')

### 3.2 Safe and smart

Let's try and make an agent that is both safe and a little smarter (although probably still a bit too conservative). 

We will continue to run the belief update method that you implemented above for the rest of these cases, so we only need to worry about the policy.  

* The first time the policy is called to generate an action, it should formulate, in PDDL, a planning problem to find a complete plan for moving people to the hospital that only traverses squares that are known not to have fire or walls, and returns the first step.

* On subsequent calls to the policy, it should just return the next step of the plan.

Just as we did in Section 1, we want you to use `pyperplan`.

In [16]:
def make_planner_policy(problem, planner):
    # Keep memory of plan and which step we're on
    status = {'plan': None, 'step': None}

    def policy(belief):
        """Returns an action string or '*Failure*' or '*Success*'."""
        raise NotImplementedError() 
    
    # return the policy function
    return policy

### Tests

In [None]:
def sar_policy_test():
    problem = SearchAndRescueProblem()
    base_planner = SearchAndRescuePlanner(search_algo="gbf", heuristic="hff")

    def planner(state):
        plan, time = base_planner.get_plan(state)
        return plan

    policy = make_planner_policy(problem, planner)
    state = State()
    # Observable
    bel = BeliefState(state_map=state.state_map)
    s_or_f, final_state, final_bel = agent_loop(problem, state, policy, bel)
    assert get_num_delivered(final_state) == 4

sar_policy_test()

print('Tests passed.')

### 3.3 Reckless

For this and the remaining questions, we're not going to autograde your solutions, and we're not providing test cases. We've given you boxes to hold your implementation, but you will need to define your own tests cases, and in section 4, we'll ask you to do some analysis.  

First, let's try to make our agent more aggressive but still not step into the fire!   As we saw in 3.1, even if we make a plan that might cause us to move into the fire, just before we are about to take an action, we can know for sure whether there is fire in the square we are about to move into.   So, let's make an (almost) reckless replanning agent.

* The first time the policy is called to generate an action, it should formulate in PDDL a planning problem to find a very optimistic  plan for moving people to the hospital that only traverses squares that are **not known to have** fire or walls, and returns the first step.

* On subsequent calls to the policy, if the next step in the plan is safe to execute given the updated belief, it should return that action.  Otherwise, it should make a new plan!

In [20]:
def make_reckless_policy(problem, planner):
    # Keep memory of plan and which step we're on
    status = {'plan': None, 'step': None}

    def policy(belief):
        """Returns an action string or '*Failure*' or '*Success*'."""
        raise NotImplementedError() 
    
    # return the policy function
    return policy

Use the code below to run your policy and provide an execution trace. 

In [None]:
problem = SearchAndRescueProblem()
base_planner = SearchAndRescuePlanner(search_algo="gbf", heuristic="hff")

def reckless_planner(state):
    plan, time = base_planner.get_plan(state)
    return plan

policy = make_reckless_policy(problem, reckless_planner)
state = State()
# Observable
bel = BeliefState(state_map=state.state_map)
s_or_f, final_state, final_bel = agent_loop(problem, state, policy, bel)

### 3.4 Safe and smart if possible, else reckless

Try to construct a new policy that is a useful combination of "safe and smart" and "reckless" strategies, which combines the best aspects of each.

In [22]:
def make_hybrid_policy(problem, planner):
    # Keep memory of plan and which step we're on
    status = {'plan': None, 'step': None}

    def policy(belief):
        """Returns an action string or '*Failure*' or '*Success*'."""
        raise NotImplementedError() 
    
    # return the policy function
    return policy

Use the code below to run your policy and provide an execution trace. 

In [None]:
problem = SearchAndRescueProblem()
base_planner = SearchAndRescuePlanner(search_algo="gbf", heuristic="hff")

def hybrid_planner(state):
    plan, time = base_planner.get_plan(state)
    return plan

policy = make_hybrid_policy(problem, hybrid_planner)
state = State()
# Observable
bel = BeliefState(state_map=state.state_map)
s_or_f, final_state, final_bel = agent_loop(problem, state, policy, bel)

### 3.5 Looks before it leaps

Another way to approach this problem is to plan in belief space.  That sounds fancy, but actually can be relatively simple to do:

* In your PDDL formulation, instead of just having a fluent `(is-clear ?loc)`, we'll have two fluents:  `(is-clear ?loc)` and `(is-unknown ?loc)`.

* The precondition for moving into a square should still be `(is-clear ?loc)`.

* You can add an operator that explicitly "looks" at a neighboring square;  we'll assume that it's optimistic and so if you look at a square that was previously unknown, it is now known to be clear.

* Note that you can define a subclass of `SearchAndRescuePlanner` and redefine the `update_pddl_domain` method to add to the previous PDDL domain definition.  You'll also need to change the `parse_plan` method to handle the new action and the `get_init_strs` method to handle the new facts.

* Make a replanning policy that plans in this belief-space formulation and executes its plan as long as it's safe, and replans otherwise.

In [55]:
def make_belief_space_policy(problem, planner):
    # Keep memory of plan and which step we're on
    status = {'plan': None, 'step': None}

    def policy(belief):
        """Returns an action string or '*Failure*' or '*Success*'."""
        raise NotImplementedError() 
    
    # return the policy function
    return policy

Use the code below to run your policy and provide an execution trace. 

In [None]:
problem = SearchAndRescueProblem()
base_planner = SearchAndRescuePlanner(search_algo="gbf", heuristic="hff")

def belief_planner(state):
    plan, time = base_planner.get_plan(state)
    return plan

policy = make_belief_space_policy(problem, belief_planner)
state = State()
# Observable
bel = BeliefState(state_map=state.state_map)
s_or_f, final_state, final_bel = agent_loop(problem, state, policy, bel)

# <a id="analysis">4. Analysis</a>

In the following questions, please provide textual answers in the provided boxes. 

* **First-order Logic:** In what way would having access to first-order logic have been helpful in this problem?

<div class="alert alert-info">
Write your answer in the cell below this one.
</div>

--> *(double click on this cell to delete this text and type your answer here)*

* **Heuristics:** The heuristics used in `pyperplan` are *domain-independent*; we can use them for Search and Rescue, for blocks world, etc.  An alternative strategy would be to hand-specify a *domain-specific* heuristic. What would a good domain-specific heuristics for search and rescue look like?

<div class="alert alert-info">
Write your answer in the cell below this one.
</div>

--> *(double click on this cell to delete this text and type your answer here)*

* **Look first:** We do one observation before choosing our first action.  Give an example scenario where omitting this step and using the reckless or two-phase planner would make an error.

<div class="alert alert-info">
Write your answer in the cell below this one.
</div>

--> *(double click on this cell to delete this text and type your answer here)*

* **Replanning:**  Currently, for the policies in 3.1-3.4, we replan whenever executing the next step would be unsafe. That might not be the best replanning strategy. Describe another strategy, give a concrete example of where it would do something differently than the current strategy,
and say what the general trade-offs would be between that one and the current one.

--> *(double click on this cell to delete this text and type your answer here)*

* **Planner Analysis:** Using the `test_policy` function, run your policies in the following three scenarios, specifically run:
* The safe-and-smart planner
* The reckless planner
* The belief-space (looks before it leaps) planner
with the scenarios defined as:
* belief_map = P1_B0, true_map = P1_G0
* belief_map = P1_B1, true_map = P1_G0
* belief_map = P2_B1, true_map = P2_G0

(These belief maps and true maps are all defined in the utility code at the top of this notebook.) 

In your answer below, don't count "look" as a step.
* How many steps does each policy take to solve each problem?
* Which method takes the fewest steps summed over all three problems?
* Which one would you choose if planning is very expensive compared to execution?
* Which one would you choose if execution is very expensive compared to planning? (In this case, what additional modifications might you make to your method)?

--> *(double click on this cell to delete this text and type your answer here)*

# Submission 

Your final submission to gradescope should include this notebook, completed with your example runs and your completed text answers. 

## Feedback

If you have any feedback for us, please complete [this form](https://forms.gle/COMPLETE ME)!