<a href="https://colab.research.google.com/github/gtbook/robotics/blob/main/S25_sorter_decision_theory.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
%pip install -q -U gtbook

Note: you may need to restart the kernel to use updated packages.


In [2]:
from gtbook.discrete import Variables
import numpy as np
import pandas as pd
import gtsam
import plotly.express as px
try:
    import google.colab
except:
    import plotly.io as pio
    pio.renderers.default = "png"


# Decision Theory
> TODO(tweet)


Planning is super simple in this problem.  At each time instant, we make a decision about what to do, and do it.
There's no coupling between actions, no dependence on the current action for ensuring success at future stages.
Thus, planning reduces to nothing more than simple decision making: at this moment, based on the best information
I have, what single action should I execute. And then repeat this each time a new item of trash arrives.


## Evaluating the Cost of Applying an Action

> When the world is uncertain, choosing actions is a game of chance. Probability provides some useful tools. The concept of *Expectation* is key.

Uncertainty can arise from many sources.
For the moment, consider the extreme case in which our trash sorting robot is not equipped with any sensing capabilities.
How can this robot make a decision about how to act in the world? 
Without sensing, the best information that is available is the prior knowledge the robot has about the world.
In our case, this knowledge is encoded as the prior proability $P(C)$ over the categories of trash. 
Looking back at the table of priors in the previous sections, we see that paper occurs about 30% of the time
and that cardboard occurs about 20% of the time, meaning that the paper bin is the appropriate destination
about 50% of the time.
We could adopt a simple decision rule for selecting actions:
*Choose the action that maximizes the prior probability of making the right choice.*
This would result in the robot always putting trash in the paper bin.
If we do this, we expect that the robot will do the right thing around 50% of the time. This isn't great, but it's better than any other action, given the typical distribution of categories of trash.
This approach, however, takes no account of the cost of wrong actions, which can result in significant problems.

One way to account for the costs of actions would be to apply an action that minimizes the worst-case cost.
This provides a quantitative upper bound on how badly things could go.
We can express this formally as
$$a^* = \arg \min_{a_i} \max_{c \in \Omega} \mathrm{cost}(a_i,c).$$
From the table above, we see that this approach leads
to always executing the *nop* action, since the worst-case cost for this action is 1,
while the worst-case costs for the other three actions are 6, 2, and 10.
This approach, however, makes no sense at all. It merely reduces our trash sorting
system to a conveyor belt that allows all items of trash to pass through, unsorted.

What is the right way to approach this problem? The ideal decision would always minimize the cost of exectuting the action, but because our knowledge of the world is uncertain (captured in the prior probability distribution),
it is impossible to know which action this would be. 
In such cases, the concept of $expectation$ from probability theory provides a principled way to reason about decisions.

The idea of expected cost is this: If we were to perform the action many times, what do we expect would be the average cost over those many actions.
Let us denote the cost of applying action $a$ to trash category $c$ by $cost(a,c)$. The expected value for the cost of applying action $a$
is merely the weighted average of the costs $cost(a,c)$, where the weights are exactly the prior probabilities assigned to the categories, $c$:

$$E[ cost(a, C) ] = \sum_{c \in \Omega} cost(a,c) P(c) $$


In the equation above for expectation, the notation $E [ cost(a, C) ]$ denotes the expected cost for exectuing the action $a$ with the expectation being taken with respect to the randomly occuring trash category.
We use upper case $C$ to indicate that the category is a random quantity, and that the expectation
should be computed with respect to the probability distribution on categories
(i.e., the priors given in the previous section).





In [3]:
# as before, in S12:
categories = ["cardboard", "paper", "can", "scrap metal", "bottle"]
actions = ["glass bin", "metal bin", "paper bin", "nop"]
cost = np.array([[2,  2,  4,  6,  0],
                 [1,  1,  0,  0,  2],
                 [0,  0,  5, 10,  3],
                 [1,  1,  1,  1,  1]])
pd.DataFrame(cost, index=actions, columns=categories)

Unnamed: 0,cardboard,paper,can,scrap metal,bottle
glass bin,2,2,4,6,0
metal bin,1,1,0,0,2
paper bin,0,0,5,10,3
nop,1,1,1,1,1


In [4]:
variables = Variables()
Category = variables.discrete("Category", categories)
category_prior = gtsam.DiscretePrior(Category, "200/300/250/200/50")
category_prior

 *P(0)*:

|0|value|
|:-:|:-:|
|0|0.2|
|1|0.3|
|2|0.25|
|3|0.2|
|4|0.05|



With this, we can calculate the expected cost of taking action "put in glass bin":

$E[cost(a_1, C)] = 2 \times 0.25 + 2 \times 0.2 + 4 \times 0.3 + 6 \times 0.2 + 0 \times 0.05 = 3.3$

If we apply this to each action, we arrive to the following table of expected costs for the actions:

**CAN WE CHANGE THE NOTATION BELOW FROM P(T) TO SOMETHING ABOUT COSTS??**

In [5]:
cost @ category_prior.pmf()

array([3.2, 0.6, 3.4, 1. ])


Using the table above, it is clear that the optimal action given only the prior distribution on
categories is to always place the trash item in the metal bin.
If we wish to improve upon this, it will be necessary for the robot to somehow improve
its knowledge of the world state.  This can be done by using sensors to measure various
properties of the world, and then drawing inferences about the world state using these measurements.
We now turn out attention to this problem.

## Minimum cost
Given an posterior probability over the possible classes does not yet tell us what action to take. With decision theory, we can associate a cost with making the right or wrong decision.

$$a^*(Z=z) = \arg \min_a \sum_c C(a|C=c) P(C=c|Z=z)$$

In python, we just calculate the expected cost, now using the posterior instead of the prior, and take argmin using `np.argmin`:

In [6]:
posterior = np.array([0.2, 0.3, 0.1, 0.1, 0.3])
expected_cost = cost @ posterior
optimal_action = np.argmin(expected_cost)
print(f"expected_cost={expected_cost}, optimal action={optimal_action}")

expected_cost=[2.  1.1 2.4 1. ], optimal action=3
