# Decision Trees
## Introduction
Decision Trees are chronological representations of complex decision processes that can be used to find an optimal decision based on a specific criteria, using a divide-and-conquer approach.

Decision trees have three main graphical components:

- **Decision Nodes:** Represent decisions under certainty. Each branch represents a decision alternative an and is represented by a utility or cost as a measure of its performance.
- **Random nodes:** Represent possible outcomes under uncertainty. Each branch corresponds to a possible state and is represented by a probability P
- **Leafs:** Possible outcomes of the decision process. They are represented by the performance, the utility or cost value corresponding to the connected decision nodes and random nodes.

The image below shows these components and their relationships as a baseline example.

![decision tree](img/decision_tree.png "Decision tree showing decisions, states of nature and consequences"){width=50%}

The example shows just a decision node, labeled as 1, with two different alternatives, noted as $a_1$ and $a_2$. Each alternative has a different utility value ($ca_1$ for $a_1$ and $ca_2$ for $a_2$). There is some uncontrolled variable that can impact our decision, and therefore, for each of the output branches of the decision node, there are two random nodes, labeled as 2 and 5, which represent the uncontrolled variable. From each node, there are two different branches that represent the two possible states of the decision variable. Each branch is labeled with the corresponding probability of the same state of nature, $P(s_1)$ and $P(s_2)$. Finally, the leave nodes, are labeled according to the utility values of each decision under each state of nature $(u_3, u_4, u_6, u_7)$. Note that the same information can be represented in a Pay-off matrix as:

| Alternatives  | State 1      | State 2      |
|---------------|--------------|--------------|
| Probabilities | $P(s_1)$     | $P(s_2)$     |
| $a_1$         | $u_3 + ac_1$ | $u_4 + ac_1$ |
| $a_2$         | $u_6 + ac_2$ | $u_7+ac_2$   |

However, the decision tree allows to represent more complex set-ups adding more nodes and uncontrolled variable, even dealing with different combinations of uncontrolled variables and decision nodes at each branch. To define a decision tree, it is necessary to apply the following steps

1. **Structure the problem:** define decision steps, states, objectives and their performance measures. In general, a decision problem can have multiple objectives. Each objective has a performance measure to represent preferences of the decision maker (e.g. An objective can be to minimize costs and the performance metric can be a monetary unit like euros). If there are multiple objectives, it is necessary to take into account the trade-offs among different objectives to obtain a single utility function to represent preferences. However, for the rest of the unit, we will deal with single objective decision problems. Utility functions help us quantify the value of each decision. Each alternative can have a positive impact or negative impact on the value.

2. **Describe consequences:** describe the probability of each possible state of nature. In situations of certainty, each decision will have a positive or negative impact in the value of the decision. In situations of uncertainty or risk, each possible state will have a different probability of occurrence. We need an estimation of these probabilities to quantify the value of each decision. 

3. **Assign a value to each decision alternative:** Define the utility of each decision. We need to estimate the utility of each possible outcome of the decision problem.

4. **Decide which of the proposed alternatives is preferred:** To decide which of the proposed alternative is preferred, it is necessary to *solve* the decision tree, as described in the next section.

## Solving Decision Trees
Decision trees are solved from right to left, that is, from the leaves, back to the initial decision node. At every decision step, it is necessary to quantify the value of each alternative, taking into account the probability of states of nature. Moreover, the values of random nodes and decision nodes are calculated differently.

The value of a random node is calculated using a Bayesian approach, as the weighted expected value across the different states: 

$V_r = \sum_{i=1}^{n}{P(s_i)*V_i}$

Where $P(s_i)$ is the probability of state i and $V_i$ is the value at branch or leaf i.

The value of a decision node on the other hand is calculated as:

$V_n = \max\{Vi-ca_k, Vj-ca_p, ...\}$

That is, a rational decision would always select the alternative with the highest value across the different alternatives, taking into account their costs.

With this two simple rules, the value of a tree can be calculated from its leaves, no matter how complex the set-up is and regardless of the number of steps.

## Imperfect Information
Sometimes we need to decide whether to incorporate into the decision **imperfect information**. 
Imperfect information is information about uncontrolled variables that we can incorporate into the decision (possibly at a cost) to improve the performance. For instance, coming back to the previous example, imagine that our decision problem is to decide a financial product. To make a better decision, we may purchase a forecast report of the market trends for the following year. The report forecasts a given market trend, but the state is not known with certainty. It is possible to evaluate the usefulness of incorporating this information into the problem. Applying the **Bayes Theorem**, the probabilities of the different states of random nodes now will be conditioned on the outcome of the imperfect information (in the example, of the forecast report).


In these cases, it is necessary to review the different probabilities. 
After imperfect information, the probabilities of occurrence of the different states become revised (posterior) probabilities. 
The revised probabilities are calculated by applying the Bayes Theorem. 

Let us assume that there are n different states and that, consequently, the imperfect information may provide additional information about the occurrence of each of the states. The introduction of imperfect information can be modeled first as a decision node that models the decision of whether to incorporate imperfect information or not, and then as a random node in the branch of the imperfect information  state i. For instance, let us consider a basic example with two alternatives and two possible states.

![basic decision tree](img/basic_decision_tree.png "Basic decision tree showing two alternatives and two states")

Let us now introduce the acquisition of imperfect information about the states. First, the decision maker needs to decide whether not to incorporate this information (make the decision without imperfect information) or to incorporate this information (make the decision with additional information, possibly at a cost). The diagram below introduces this first decision as an additional decision node. Then, since the result of the forecast is not controlled by the decision maker, its two possible results - $i_1$ when it forecasts $s_1$ and $i_2$ when it forecasts $s_2$ - are modeled as a random node with two states. 

![additional information decision tree](img/additional_information_decision_tree.png "Decision tree with revised probabilities")
  
Note that now the probability of state $s_j$ is conditioned to the event $i_i$ in every branch. We can apply the Bayes Theorem to calculate the probability that event $s_j$ occurs given that the imperfect information forecasts $i_i$:

$𝑃(𝑠_j│𝑖_i )=\frac{(𝑃(𝑠_j)·𝑃(𝑖_i│𝑠_j ))}{𝑃(i_i)}=\frac{(𝑃(𝑠_j)·𝑃(𝑖_i│𝑠_j ))}{\sum_k𝑃(𝑠_k )·𝑃(i_i│s_k )}$

In the example below, this yields:

$P(i_1) = 𝑃(𝑠_1)·𝑃(i_1│s_1 )+𝑃(𝑠_2)·𝑃(𝑖_1│𝑠_2 )$

$𝑃(𝑠_1│𝑖_1 )=\frac{𝑃(𝑠_1)·𝑃(𝑖_1│𝑠_1 )}{P(i_1)}$

$𝑃(𝑠_2│𝑖_1 )=\frac{𝑃(𝑠_2)·𝑃(𝑖_1│𝑠_2 )}{P(i_1)}$

$P(i_2) = 𝑃(𝑠_1 )·𝑃(i_2│s_1 )+𝑃(𝑠_2)·𝑃(𝑖_2│𝑠_2 )$

$𝑃(𝑠_1│𝑖_2 )=\frac{𝑃(𝑠_1)·𝑃(𝑖_2│𝑠_1 )}{P(i_2)}$

$𝑃(𝑠_2│𝑖_2 )=\frac{𝑃(𝑠_2)·𝑃(𝑖_2│𝑠_2 )}{P(i_2)}$


$P(s_1)$ and $P(s_2)$ are the **a priori** probabilities, or the probabilities without further information. The conditional probabilities: $𝑃(𝑖_1│𝑠_1 )$, $𝑃(𝑖_1│𝑠_2)$ refer to the probability that information $i_1$ occurs under states $s_1$ and $s_2$, respectively. The same applies to $𝑃(𝑖_2│𝑠_1)$ and $𝑃(𝑖_2│𝑠_2)$. This conditional probabilities can be estimated from a historical record of events. 

For instance, connecting with our previous example, let us assume that $s_1$ represents a market raise state and $s_2$ a market fall state, and $a_1$ and $a_2$ two financial products. We may purchase a financial report to guide our investment decision, at a cost. The report has a previous record of predicting 95% of the times a market fall correctly, that is $𝑃(𝑖_1│𝑠_1 ) = 0.95$ and consequently $𝑃(𝑖_1│𝑠_2 ) = 0.05$. The record also tells us that the forecast is right 90% of the time when the market falls, therefore $𝑃(𝑖_2│𝑠_2 ) = 0.9$ and $𝑃(𝑖_2│𝑠_1 ) = 0.1$. With these values, we can calculate the revised probabilities and solve the tree to determine whether to incorporate the report or not.

### Value of Perfect Information
The previous section described how to calculate the value of a decision tree after incorporating additional information, which may come at a cost. An obvious question that arises is, how much should the decision maker be willing to pay for additional information? This quantity is known as the Value of Perfect Information. Perfect Information is an abstract concept that represents information that would make the occurrence of a state of nature certain. For instance, a forecast report which can tell us with total certainty which is going to be the trend of the market in the future. Obviously, this type of forecast does not actually exist in reality, but we can model its effect in the decision tree and solve it to calculate its final value. The **Expected Value with Perfect Information (EVPI)** is the expected final value of the tree given that we had perfect information. It is also an abstract concept to assess the value of the decision-making problem with access to perfect a priori information. 

For instance in the example above, perfect information means that the forecast report is always right, and therefore we can plug in $𝑃(𝑠_1│𝑖_1 )=1$ and $𝑃(𝑠_2│𝑖_2 )=1$ since we know that the information about the states of the report is always right, and since it is always right, then also this means that $𝑃(𝑠_1│𝑖_2 )=0$ and $𝑃(𝑠_2│𝑖_1 )=0$. This simplifies the tree and the value of the node with additional information becomes:

$\text{EVPI} = P(s_1)*\max(v_1, v_3) + P(s_2)*\max(v_2, v_4)$

Since this is the maximum value of that node provided that we had perfect a priori information, we can compare the EMV that we obtain solving the report to the EVPI to see how close we are to perfect information. This difference is known as the **Value of Perfect Information (PVI)$:

$\text{VPI} = EVPI- EMV$  
