# An Introduction to Decision Theory

This chapter of the course will present a practical example (math and code included) and introduce the elements of decision theory. These four/five (TODO) elements are all necessary to practice decision theory. Most should be familiar with situations where the correct choice is not known or obvious. The rational decision-maker has a number of actions they can take in a situation, and depending on what is really going on (the state of nature), the decision-maker experiences a certain amount of gain or loss. Then they may perform experiments to further inform their decisions and devise a strategy that is optimal for them.

## States of Nature, Actions, and Loss

Imagine you have an electrical contractor who has to decide how to wire new houses. They have options of 15, 20, and 30-amp wire. The higher the capacity of the wire, the more expensive it is for them. However, should the family moving into the new house ever exceed the rating of the wire, it will cause burnouts and they will have to eat the cost of rewiring the house. It will also damage their relationship with their business partners to have families constantly complaining about their work.

Taking this information, they construct a table with the loss that their business would experience depending on their actions and the peak amperage of a particular family. Columns represent the set of actions available to the electrical contractor, $A$. Rows represent the set of possible states of nature that a particular family can have, $\theta$.

#### Table 1: Loss Given State of Nature and Action Taken
| | | Actions | |
|:-----------:|:------------:|:--:|:--:|
| **States of Nature** | 15-amp wire | 20-amp wire | 30-amp wire |
| 15-amp peak | 1 | 2 | 3 |
| 20-amp peak | 5 | 2 | 3 |
| 30-amp peak | 7 | 6 | 3 |

In [1]:
import numpy as np
loss = np.array([
    [1, 2, 3],
    [5, 2, 3],
    [7, 6, 3]
])

## The Experiment

After deciding on the cost of each possible outcome, the contractor decides they can reduce the amount of risk they take on by giving families questionnaires. Suppose that the contractor asks each family to read the following question and check exactly one answer.

*How much electricity would you say your family draws on average?*

* 10-amp _____
* 12-amp _____
* 15-amp _____
* 20-amp _____

After doing this for some time, the electrical contractor learns the approximate frequency of a family's answer depending on  their peak amperage.

#### Table 2: Conditional Probability of Questionnaire Answers Given State of Nature
| | | Questionnaire Outcomes |  | |
|:----:|:--------:|:--------:|:--------:|:--------:|
| **States of Nature** | 10-amp answer | 12-amp answer | 15-amp answer | 20-amp answer
| 15-amp peak | 1/2 | 1/2 | 0 | 0 |
| 20-amp peak | 0 | 1/2 | 1/2 | 0 |
| 30-amp peak | 0 | 0 | 1/3 | 2/3 |

Notice that each row is a conditional distribution that sums to 1,  the probability of a family answering a certain way given their state of nature.

In [2]:
# We call this the "frequency of response table"
x_given_theta = np.array([
    [1/2, 1/2, 0,   0],
    [0,   1/2, 1/2, 0],
    [0,   0,   1/3, 2/3]
]) # P(theta|x)

## Strategies, Average Losses, and Expected Loss

The electrical contractor now has a tool to acquire information so they can determine the best action given the result of the family's questionnaire. In other words, depending on family's answer, the contractor can make some statement about the expected peak consumption of the family and what they might expect to lose for any action they take. The contractor may select any action, but through experience and preference eventually they will likely settle on a specific *strategy*, a permutation of actions to perform according to the family's questionnaire result.

In this case, there are 4 possible outcomes to the experiment and 4 possible actions to take for each outcome which leads to 16 total possible strategies. The set of strategies will be notated as $S$. These will be written as $s = (a_i, a_j, a_k, a_l)$. For example, $s_1 = (15,15,20,30)$ is notation for Strategy 1 which is:
* When the family responds with 10-amp average, wire the house with 15-amp wire
* When the family responds with 12-amp average, wire the house with 15-amp wire
* When the family responds with 15-amp average, wire the house with 20-amp wire
* When the family responds with 20-amp average, wire the house with 30-amp wire

In contrast, $s_2 = (30,30,20,15)$ tells us that Strategy 2 is:
* When the family responds with 10-amp average, wire the house with 30-amp wire
* When the family responds with 12-amp average, wire the house with 30-amp wire
* When the family responds with 15-amp average, wire the house with 20-amp wire
* When the family responds with 20-amp average, wire the house with 15-amp wire

We will also include a Strategy 3 $s_3 = (15,20,30,30)$ and leave the reader to interpret the notation as above.

In [3]:
strat1 = np.array([0, 0, 1, 2]) # Strategy 1 where the values represent the index of the actions
strat2 = np.array([2, 2, 1, 0]) # Strategy 2 ...
strat3 = np.array([0, 1, 2, 2]) # Strategy 3 ...

Now we that we have rules for acting on results from an experiment, how should we decide which strategy is best?

A start is to compute the average loss that the contractor faces from each state of nature for each of the strategies. For each state of nature and strategy, we find from Table 2 the probability of taking an action. For example, if the state of nature is a 20-amp peak and we apply Strategy 1, we will take the 15-amp wire action with 50 percent probability and we will take the 20-amp wire action with 50 percent probability (called action probabilities). To find the average loss, we find the loss of these two actions for the 20-amp state of nature from Table 1 (5 and 2). Then we multiply these losses by their respective probabilities (.5 and .5) and obtain one average loss (3.5).

Doing this for each state of nature and strategy gives us Table 3:

#### Table 3: Average Loss of Each Strategy Given State of Nature
| | | Strategies | |
|:---:|:--:|:--:|:--:|
| **States of Nature** | Strategy 1 | Strategy 2 | Strategy 3 |
| 15-amp peak | 1 | 3 | 1.5 |
| 20-amp peak | 3.5 | 2.5 | 2.5 |
| 30-amp peak | 4 | 6.67 | 3 |

In [4]:
avg_losses = []
strats = [strat1, strat2, strat3]
for strat in strats:
    avg_loss = (x_given_theta * loss[:,strat]).sum(axis=1)
    avg_losses.append(avg_loss)
avg_losses = np.array(avg_losses).T
avg_losses

array([[1.        , 3.        , 1.5       ],
       [3.5       , 2.5       , 2.5       ],
       [4.        , 6.66666667, 3.        ]])

Very quickly, we can see that Strategy 2 generates as much or more loss for the contractor than Strategy 3 no matter the state of nature. Therefore, we can say that "$s_3$ dominates $s_2$" and discard it. Now let's compare Strategies 1 and 3.

#### Table 3a: Average Losses of Strategies 1 and 3
| | Strategies | |
|:---:|:--:|:--:|
| **States of Nature** | Strategy 1 | Strategy 3 |
| 15-amp peak | 1 | 1.5 |
| 20-amp peak | 3.5 | 2.5 |
| 30-amp peak | 4 | 3 |

How can we choose between these? This question is rather difficult if the contractor does not know what frequency or percentage of their customers is. Are they all 15-amp peak families? Are 20 or 30 percent of the families 30-amp families? Supposing the three states of nature are equally likely to occur, then the calculations for the average of average losses (what we will call expected loss) for a strategy look like this:

$$E_{loss}(s_1) = 1 \cdot 1/3 + 3.5 \cdot 1/3 + 4 \cdot 1/3 = 2.83$$
$$E_{loss}(s_3) = 1.5 \cdot 1/3 + 2.5 \cdot 1/3 + 3 \cdot 1/3 = 2.33 $$

so that the expected loss for Strategy 1 is more than the expected loss of Strategy 2. However, if the contractor knew that in their community 90 percent of families were 15-amp families and 10 percent were 20-amp families, the calculations would look like this:

$$E_{loss}(s_1) = 1 \cdot .9 + 3.5 \cdot .1 = 1.25$$
$$E_{loss}(s_3) = 1.5 \cdot .9 + 2.5 \cdot .1 = 1.60 $$

and Strategy 1 would be much more attractive as a result.

In [11]:
freqs1 = np.array([1/3, 1/3, 1/3])
print(f"Strategy 1 Expected Loss: {(avg_losses[:,0] * freqs1).sum() : .2f}")
print(f"Strategy 3 Expected Loss: {(avg_losses[:,2] * freqs1).sum() : .2f}")

Strategy 1 Expected Loss:  2.83
Strategy 3 Expected Loss:  2.33


In [12]:
freqs2 = np.array([.9, .1, 0])
print(f"Strategy 1 Expected Loss: {(avg_losses[:,0] * freqs2).sum() : .2f}")
print(f"Strategy 3 Expected Loss: {(avg_losses[:,2] * freqs2).sum() : .2f}")

Strategy 1 Expected Loss:  1.25
Strategy 3 Expected Loss:  1.60


Thus, the strategy that the contractor ends up selecting may depend on the frequency of their customers' states of nature. Another way that the contractor might select a strategy is with the minmax method. Essentially, the idea is to minimize the greatest possible loss. In this case, the greatest average loss for Strategy 1 is 4 and the greatest average loss for Strategy 3 is 3. To minimize the greatest loss, the contractor would select Strategy 3.