<a href="https://colab.research.google.com/github/gtbook/robotics/blob/main/S11_intro_state.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
%pip install -U -q gtbook

Note: you may need to restart the kernel to use updated packages.


# Representing State

> Choosing the right representation for state is key to effective reasoning about the effects of actions in the world.

<img src="Figures1/S11-Robot_menagerie-09.jpg" alt="Splash image with robot pondering state" width="40%" align=center style="vertical-align:middle;margin:10px 0px">

In order to reason about the world and about its own actions in the
world, a robot requires some sort of representations of both itself, and
the the world that it inhabits. For any specific robotic system, the
system designer must decide what to represent, and which
representational scheme should be used. Furthermore, different
representation schemes might be used for different aspects of a
particular problem. For example, if a robot house keeper is charged with
doing the laundry, reasoning about moving from the bedroom to the
laundry room versus reasoning about folding the laundry would require
fundamentally different types of representations. For the former, a
high-level description of the layout of rooms in the house might
suffice, while for the latter, the robot might need to model articles of
clothing as nonrigid, deformable objects. 
In the chapters that follow,
we will explore a variety of representational schemes, ranging from
high-level, discrete abstractions to low-level continuous
representations.



## Representing the World State

The robot’s information about its environment is generally referred to
as the *world state*. For a chess playing robot, this might include a
complete list of the positions of all chess pieces on the board. For a
house-keeping robot, the world state might include a map of the house,
locations of furniture, and locations of various household objects. In
both cases, the world state excludes many details about the world that
are not directly relevant to the robot’s objectives. For example, a
house-keeping robot might not need to know the colors of the walls, the
thermostat setting for the bedroom, or the titles of books on a shelf.
The key idea of *world state* is that it should include the information
necessary for the robot to understand the environment well enough to
successfully perform its tasks.

How to represent the world state depends on the kind of information that
is required. High-level, symbolic representations are often sufficient
for the purpose of constructing general robot plans. A classical example
is the STRIPS system for robot planning in a simple *blocks world*.
Figure
<a href="#fig:blocks-world" data-reference-type="ref" data-reference="fig:blocks-world">1</a>
shows a simple example. In STRIPS, the state of the world is represented
by symbolic relations, such as those shown in Figure
<a href="#fig:blocks-world" data-reference-type="ref" data-reference="fig:blocks-world">1</a>b.
This simple set of relations tells us that Blocks A and C are resting on
the table, that Block B rests on the top of Block A, and that nothing
rests atop either Block B or Block C. Equipped with such a description,
STRIPS can create plans, for example, to place Block A atop Block C by
picking up Block B and placing it on the table, then picking up Block A
and placing it atop block C. This plan excludes any geometric
description of how to perform the task, but it provides a high-level
description of the sub-tasks that the robot must execute to accomplish
its end goal. In later chapters, we will see how this kind of planning
works in detail.

<figure>
<img src="https://github.com/gtbook/robotics/blob/main/Figures1/blocks-world.jpg?raw=1" title="fig:" id="fig:blocks-world" style="width:12.5cm" alt=""/>
<figcaption>An example from the Blocks World. (a) A simple blocks world scene. (b) The symbolic description of the world state.
</figcaption>
</figure>

While this kind of high-level, qualitative state description may be
useful for task-level planning, because it fails to capture any of the
geometric aspects of the environment, it would be insufficient when the
robot begins to actually move in, and interact with, its environment.
For mobile robots, it is often sufficient to use a discrete grid to
represent which parts of the environment contain objects, and thus
cannot be traversed by the robot. Because such objects impede the
ability of the robot to move freely, we typically refer to these as
*obstacles*. The prototypical path planning problem in robotics is to
find a path for the robot from its initial location to a specified goal
location, while avoiding collision with any obstacles in the
environment. Figure
<a href="#fig:chap1-occupancy-grid" data-reference-type="ref" data-reference="fig:chap1-occupancy-grid">2</a>(a)
shows an example of an *occupancy grid*, a grid-based map that
explicitly indicates which grid cells are occupied by obstacles. If we
assume that the robot is able to move to any adjacent empty grid cell
that is directly above, below, left, or right of its current location,
planning can be accomplished using graph search methods. Figure
<a href="#fig:chap1-occupancy-grid" data-reference-type="ref" data-reference="fig:chap1-occupancy-grid">2</a>(b)
illustrates a path in the occupancy grid. In later chapters we will
describe how to build an occupancy-grid map from sensor data, how the
robot can determine its location in the map, how graph search algorithms
can be used to construct a path from start to goal, and how the robot
can execute this plan to navigate in its environment.

<figure>
<div class="row">
  <div class="column">
    <img src="https://github.com/gtbook/robotics/blob/main/Figures1/occupancy-grid.jpg?raw=1" title="fig:" id="fig:chap1-occupancy-grid" style="width:5.5cm" alt="" />
    <p class="center">(a)</p>
  </div>
  <div class="column">
    <img src="https://github.com/gtbook/robotics/blob/main/Figures1/occupancy-grid-plan.jpg?raw=1" title="fig:" id="fig:chap1-occupancy-grid" style="width:5.5cm" alt="" />
    <p class="center">(b)</p>
  </div>
</div>
<figcaption>Using an occupancy grid to represent the environment of a mobile robot. (a) Shaded cells are occupied by obstacles. (b) A plan to move from the initial to goal cell in the occupancy grid. 
</figcaption>
</figure>

In some cases, for example if task requires manipulating objects in the
environment, a more precise geometric description of the state may be
required. Suppose, for example, that a chess playing robot wishes to
move its king. In this case, for the robot to grasp the king, its
precise location must be known. There are several ways to represent this
kind of geometric information, but the most common is to define a
Cartesian coordinate frame that is *rigidly attached* to the king. This
merely means that the relationship between the king and this coordinate
frame is fixed, and does not change when the king is moved. One such
possible assignment is shown in Figure
<a href="#fig:chap1-king-frame" data-reference-type="ref" data-reference="fig:chap1-king-frame">3</a>.
In order to grasp the king, the robot would perform appropriate
geometric computations to bring its fingers to specific positions
relative to this coordinate frame. This requires, of course, knowing the
precise position and orientation of the king’s coordinate frame relative
to the robot, information that can be obtained using the robot’s
sensors. In later chapters, we will describe in detail the geometric
computations required for this kind of task, as well as how sensors can
be used to determine relevant geometric aspects of the world, including
the positions and orientations of objects in the robot’s work space.

<figure>
<div class="row">
  <div class="column">
    <img src="https://github.com/gtbook/robotics/blob/main/Figures1/king-frame.jpg?raw=1" title="fig:" id="fig:chap1-king-frame" style="height:9cm" alt="" />
    <p class="center">(a)</p>
  </div>
  <div class="column">
    <img src="https://github.com/gtbook/robotics/blob/main/Figures1/king-frame-rotated.jpg?raw=1" title="fig:" id="fig:chap1-king-frame" style="height:9cm" alt="" />
    <p class="center">(b)</p>
  </div>
</div>
<figcaption>A chess piece with an attached Cartesian coordinate frame. (a) The frame origin is located at the center of the cross, the z-axis is aligned with the main axis of the body, and the y-axis lies in the plane containing the cross. The x-axis completes a right-handed coordinate frame (b) The frame is rigidly attached to the chess piece, and moves when the piece moves.
</figcaption>
</figure>

It is often the case that robots must interact with moving objects
(e.g., parts on a conveyor belt, other robots). For these situations, it
is often advantageous to explicitly include the notion of time in the
representation. For example, a robot that plays table tennis should be
able to estimate not only the instantaneous position of the ball at any
moment in time, but also the trajectory of the ball, thus enabling the
prediction of where the ball will be at future moments in time. In this
case, if we represent the coordinates of the ball by a vector
$x \in \mathbb{R}^3$, we make explicit the dependence on time by writing
$x(t)$. Furthermore, if we are interested also in the velocity of the
ball, we write $\dot{x}(t) = \frac{d}{ dt} x(t)$ to denote the time
derivative of the ball’s position. Under the laws of Newtonian physics,
the position and velocity of the ball at any moment in time completely
determine the future trajectory of the ball (assuming no effects of
wind, etc.). In the vocabulary of physics, the *state* of a system is a
collection of information sufficient to determine the entire future
evolution of the system behavior. For this reason, in physics, one
refers to the pair $(x,\dot{x})$ as the state of the ball. Our use of
the term *state* is somewhat more general than that used to describe
physical systems; however the intuition behind both terms is essentially
the same.

As an example, Figure
<a href="#fig:chap1-projectile" data-reference-type="ref" data-reference="fig:chap1-projectile">4</a>
illustrates basic projectile motion. If the position and velocity of the
ball are both known at either time $t_0$ or at time $t_1$, then it is
possible to predict the position and velocity of the ball at any future
moment in time. Although this example is fairly simple, in many robotics
applications the system’s state is represented using position and
velocity. This idea applies not only to various moving objects in the
environment, but even to the motion of the robot itself. This is the
case for robot arms, whose motion depends on torques generated by
motors, and for unmanned air vehicles (UAVs) such as quadrotors, whose
motion depends on aerodynamic forces generated by spinning propellers.

<figure>
<img src="https://github.com/gtbook/robotics/blob/main/Figures1/projectile.jpg?raw=1" title="fig:" id="fig:chap1-projectile" style="width:12.5cm" alt="" />
<figcaption>The motion of a ball follows the arc of a parabola. Given the position and velocity at any moment in time, it is possible to exactly predict the position and velocity at any future moment in time.
</figcaption>
</figure>

It is often the case that discrete time representations are preferable
to using continuous time. This is because many algorithms used in
robotics rely on numerical methods to compute solutions. This should not
be surprising, since computer algorithms are by their nature
discrete-time entities. In this case, we use the notation $x_t$ to
denote the value of the state $x$ at time instant $t$. Often, we can
compute exact discrete-time system representations by integrating an
appropriate description of the system dynamics, such as

$$x_{t+1} = x_t + \int_t^{t + 1} \dot{x}(t) dt$$

It is often the case that the robot does not have access to a complete
and correct model of its environment. For example, if sensors are used
to determine the world state, there will invariably be errors and
uncertainties associated to the sensor measurements. There are a variety
of ways that one can deal with such uncertainties. We can include
uncertainty explicitly in our representations, e.g., using tools from
probability theory, or we can incorporate uncertainty into our model of
the robot’s actions in the world. We will consider both of these options
in later chapters.


## Representing the Robot's State

While the robot is, technically, an object within the world, it enjoys
the special status of being able to act in the world to effect changes.
Furthermore, the robot has direct control over its own actions, unlike
obstacles or other actors in the world, over which the robot has, at
best, indirect control. Therefore, rather than merely incorporate
information about the robot into the world state, we typically represent
the robot state separately, using representations that are specifically
developed for modeling the robot’s geometry, dynamics, and manipulation
capabilities.

The most basic information about a robot’s state is merely a description
of the robot’s location in its environment. Four examples are shown in
Figure
<a href="#fig:chap1-four-robots" data-reference-type="ref" data-reference="fig:chap1-four-robots">5</a>.
For a vacuum cleaning robot, Figure
<a href="#fig:chap1-four-robots" data-reference-type="ref" data-reference="fig:chap1-four-robots">5</a>a,
this could be a set of $x,y$ coordinates of the robot’s centroid with
respect to a floor plan of the house. For a robot arm, Figure
<a href="#fig:chap1-four-robots" data-reference-type="ref" data-reference="fig:chap1-four-robots">5</a>b,
we might specify a vector of joint angles. For a UAV, Figure
<a href="#fig:chap1-four-robots" data-reference-type="ref" data-reference="fig:chap1-four-robots">5</a>c,
we might specify the $x,y,z$ coordinates of the vehicle, along with its
orientation. For a humanoid robot, Figure
<a href="#fig:chap1-four-robots" data-reference-type="ref" data-reference="fig:chap1-four-robots">5</a>d,
we might specify the position and orientation of the torso’s centroid,
along with a vector of angles for each of the robot’s joints. In
robotics, all of these correspond to the general notion of a robot’s
*configuration*. More precisely, a configuration $q$ of the robot
provides a complete specification of the location of every point on the
robot with respect to a reference frame. The set of all configurations
is called the *configuration space*, and is denoted by $\cal Q$.

<figure>
<img src="https://github.com/gtbook/robotics/blob/main/Figures1/four-robot-cspaces.jpg?raw=1" title="fig:" id="fig:chap1-four-robots" style="width:12.5cm" alt="" />
<figcaption>Configurations for for different robots. (a) A vacuum cleaning robot whose configuration is specified by x,y coordinates in the world frame. (b) A two-link robot arm, whose configuration is specified by its two joint angles (<i>&theta;<sub>1</sub>,&theta;<sub>2</sub></i>). (c) A quadrotor, whose configuration is specified by three position parameters x,y,z and three orientation parameters <i>&phi;,&theta;,&psi;</i>. (d) A humanoid robot whose configuration is specified by the position and orientation of the body-attached frame <i>x,y,z,&phi;,&theta;,&psi;</i> and the joint angles in the legs <i>&theta;<sub>1</sub>, &theta;<sub>2</sub>, &theta;<sub>3</sub>, &theta;<sub>4</sub></i>.
</figcaption>
</figure>

A common way to define a robot’s configuration is to rigidly attach a
coordinate frame to each component of the robot that can move, and to
then specify the position and orientation of each of these frames. Of
the examples given above, we can consider the vacuum cleaning robot and
the UAV to be single rigid objects. For the vacuum cleaning robot,
assuming that the orientation of the robot is not of concern, we would
have $q = (x,y)$, for a UAV we might use $q = (x,y,z,\phi,\theta,\psi)$,
where the latter three parameters specify the orientation of the three
axes of the UAV’s body-attached coordinate frame with respect to a
reference frame.

For many robots, motion of the individual components is constrained by
the design of the mechanical system. For example, the motion of any link
in a robot arm is determined by the rotation of a single motor that
connects this link to the previous link. In such cases, a single
parameter (in this case a joint angle) is sufficient to specify the
configuration of the link, and the configuration of the entire robot arm
can be specified by $q = (\theta_1, \cdots , \theta_n)$, in the case of
an arm with $n$ revolute joints (i.e., each joint is driven by a
rotating motor). More complex mechanisms require a combination of these
methods. For example, the configuration of a humanoid robot might be
represented by
$q = (x,y,z,\phi,\theta,\psi,\theta_1, \dots , \theta_n)$, in which the
first six parameters specify the position and orientation of a
body-attached frame (e.g., located at the centroid of the torso), and
$\theta_1, \cdots, \theta_n$ specify the angles for the individual
joints of the robot. In later chapters, we will investigate the
configuration spaces for a variety of robots, from simple wheeled mobile
robots to more complex mobile manipulators comprised of moving platforms
with an attached manipulator arm.

The configuration of a robot answers the question of where the robot is
at a specific instant in time. If we wish instead to describe the motion
of a robot, we must consider the configuration to be time varying, and
in this case both the configuration and its time derivative (a velocity)
are relevant. As was the case above for moving objects, we often package
the configuration and its time derivative into a single vector

$$x(t) = \left[ \begin{array}{c} q(t) \\ \dot{q}(t) \end{array}\right]$$

In many disciplines related to robotics, $x$ is referred to as the
state. This is particularly true in the areas of dynamical systems and
control theory. In this text, we will maintain a more general use of the
term *state*, but when relevant, we will adopt discipline-appropriate
terminology (e.g., when describing how to control a robot’s motion).

In many applications, position and velocity provide a sufficiently
detailed description of robot motion. This is not true, however, when we
must explicitly consider forces that affect the robot’s motion. For
example, we can essentially regard a vacuum cleaning robot as a device
that responds to position and velocity commands: we issue a command to
the robot to move to a certain position at a certain velocity, and the
robot has no difficulty in executing this command. There are, however,
numerous applications in which simple geometric descriptions of robot
motion are not adequate. Consider for example the case of a quadrotor
that maneuvers by exploiting aerodynamic forces, or a humanoid robot
whose locomotion depends on interaction forces between its feet and the
ground. In these cases, we typically consider position, velocity, and
acceleration, i.e., in terms of $x$ and $\dot{x}$, or, if making the
configuration and its derivatives more explicit, in terms of
$q, \dot{q},$ and $\ddot{q}$.