<a href="https://colab.research.google.com/github/gtbook/robotics/blob/main/S30_vacuum_intro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
%pip install -q -U gtbook


Note: you may need to restart the kernel to use updated packages.


In [2]:
import numpy as np
import plotly.express as px
import gtsam

from gtbook.discrete import Variables
from gtbook.display import show, pretty


# A Robot Vacuum Cleaner

> A vacuum cleaning robot can be used to model reasoning over time about probabilistic actions in a discrete state space.

<img src="Figures3/S30-iRobot_vacuuming_robot-08.jpg" alt="Splash image with a robot that vaguely looks like a vacuum" width="60%" align=center style="vertical-align:middle;margin:10px 0px">

The second robot we will discuss is a vacuuming robot.  We assume that this robot is equipped with software that can perform navigation, motion planning, and motion control.  This assumption allows us to focus on high-level problems (for example, deciding which room to clean next), without worrying now about low-level details (e.g., planning specific paths
to cover a particular room, or navigating through a doorway).

There are a number of differences between this robot and the trash sorting robot of the previous chapter.
First, the effects of actions depend on the current world state; if a robot is in the living room and moves to
its left, it will arrive to a different location than if it had started in the office.
Second, the actions executed by the vacuum cleaning robot have uncertain effects.
This is much different than the actions of the trash sorting robot, which achieved its goals deterministically,
regardless of the current state (the "move object to the metal bin" action moves an object to the metal
bin, regardless of the category of the object, and with 100% reliability).
Third, because the effects of actions depend on state, achieving goals in the future will depend
on the actions the robot executes now (since current actions affect future states).
Therefore, this robot must consider how the world state evolves with the passing of time.

In this chapter, we will learn about probabilistic outcomes of actions, which we can model with conditional probability distributions, just like we did with the measurements in the previous chapter.
We will learn how to propagate uncertainty forward in time for specific sequences of actions,
and how to generate sample trajectories from the corresponding probability distributions.
For our vacuum robot, states correspond to rooms in the house and trajectories
correspond to the robot moving from room to room.

Sensing will be quite limited in this chapter. 
We will use a simple (discrete) light sensor.
However, because sensor measurements depend on state, and because state depends on
the sequence of actions that has been executed, perception becomes a more interesting
problem.
While our trash sorting robot relied on simple MLE or MAP estimation using the current sensor reading,
our vacuum cleaning robot will combine knowledge about the history of its actions (which have uncertain
effects) with the sequence of sensor measurements.
We will solve this perception problem using **Hidden Markov Models (HMM's)**.
An HMM is a probabilistic model for sensing over time.
While they initially gained popularity in the field of speech recognition,
we will use them to find the most probable sequence of states given a set of measurements and actions.

Planning is also more interesting for our vacuum cleaning robot.
Instead of choosing a single action to minimize a cost for the current action,
we must reason about sequences of actions that occur over time.
To do this, we will introduce **Markov Decision Processes (MDPs)**. 
An MDP adds the notion of *reward*, which will allow us to reason about optimal actions.
We will even be able to deduce an optimal *policy*, i.e., a recipe for what to do in each state
to maximize the aggregate reward over time.

Finally, we will introduce the notion of reinforcement learning, where we will estimate the parameters of an MDP data during the robot's normal operation.