# Decision Trees


## A detour into probability

Because Decision Trees are a method for dealing with *uncertain* choices, we first consider probability theory as a way of modeling uncertainty.

Axioms of probability, as described in the lecture:

1. Probability of an outcome in [0, 1] (*boundedness*).
2. Sum of all possible outcomes equals 1 (*normalization*).
3. If $A$ is a subset of $B$, then $P(A) \leq P(B)$ (*monotonicity*).

AFAIK, these stem from [Kolmogorov's axioms of probability](https://en.wikipedia.org/wiki/Probability_axioms), but those are slightly different:

1. *Nonnegativity*: $P(A) \geq 0$.
2. *Normalization*: $P(\Omega) = 1$ ("something must happen").
3. *(Finite) additivity*: if $A \cap B = \emptyset$ ("mutually exclusive"), then $P(A \cup B) = P(A) + P(B)$. (NB: finite additivity can be strengthened to countable additivity)

Under these axioms, *boundedness* and *monotonicity* are consequences.

Note: there are other axiomatizations of probability, such as [Cox's theorem](https://en.wikipedia.org/wiki/Cox%27s_theorem), which underpin some Bayesian interpretations of probability (e.g. [Jaynes: probability as extension of Aristotelian logic to uncertain propositions](https://bayes.wustl.edu/)).


**Types of probability** (acording to the lecture):
* Classical (eg. dice)
* Frequency (from data, eg. frequency of words)
* Subjective (eg. probability judgments)
* Model-based (eg. will prices go up this year?)

It is more typical to divide interpretations into:
* *Aleatory* (classical / frequency)
* *Epistemic* (subjective / Bayesian)

On [Stanford Encyclopedia of Philosophy](https://plato.stanford.edu/) see: [Interpretations of Probability](https://plato.stanford.edu/entries/probability-interpret/), [Philosophy of Statistics](https://plato.stanford.edu/entries/statistics/#BayStaLog), [Formal Representations of Belief](https://plato.stanford.edu/entries/formal-belief/).




## Back to Decision Trees

[Decision Trees](https://en.wikipedia.org/wiki/Decision_tree) are useful when there are uncertain contingencies (we don't know what the future state of the world is going to be). We can also use them positively - to infer other peoples' (or our own) preferences and beliefs, from their choices.

**Example**: buying a train ticket

Dilemma between buying a cheaper ticker, but for an earlier train that we might miss, or buying a more expensive ticket for a later train that we will surely catch.

1. Ticket for 3PM: \$200 (but 40\% chance of not making it, and having to buy the additional more expensive ticket)
2. Ticket for 4PM: \$400 (but 100\% chance of making it)

Buy earlier ticket?

| ![image.png](attachment:image.png) | ![image-2.png](attachment:image-2.png) |
|:---:|:---:|

Here we are dealing with costs, so we choose the branch with the lower expected cost.

**Example**: deciding whether to apply for a scholarship

* Scholarship: \$5000
* 200 applicants
* 2 page essay
* 10 finalists
* 10 page final essay

Subjective values (finding costs and putting them on the same scale as benefits):
* 2 page essay: \$20
* 10 page essay: \$40

***

**General procedure**:
1. Draw tree
2. Write down payoffs and probabilities
3. Solve backwards

***

![image-4.png](attachment:image-4.png)

![image-5.png](attachment:image-5.png)

Conclusion: write the essay because it has higher expected value (\$3 vs. \$0).


### Inferring Probabilities

**Example**: friend invests, what do they believe?

Friend informs you about an investment opportunity that pays \$50k on \$2k investment. She invests, do you? (I.e. what is her belief about the success of the investment?)

![image-6.png](attachment:image-6.png)

If we believe $p > 4\%$, then it is rational to invest. If our friend decides to invest, we can infer she believes $p > 4\%$.



### Inferring Payoffs

Inferring payoffs of other people (or even our own implied payoffs) from observed choices.

**Example**: deciding not to take a standby flight

You have a standby to go visit your parents. The airline says there is a one third chance that you'll make the flight. You decide not to go to the airport and to stay on campus. How much did you want to see your parents?

Note: *standby* is "when a passenger without a seat assignment waits at the gate to see if there is an extra seat after all scheduled passengers have boarded".

![image-7.png](attachment:image-7.png)

The decision to not go would imply that the value of seing parents ($V$) is smaller than 3 times the cost ($c$) of going to the airport (i.e. $V < 3c$).



### Value of Information

If someone gave us information that would remove uncertainty, how much this should be worth to us - i.e. what is the [Value of Information](https://en.wikipedia.org/wiki/Value_of_information)? We can answer this using decision tree analysis.

**Example**: deciding if to buy a car now, or wait for a possible cash back option

* Option A: buy car now
* Option B: rent for \$500 for a month, then buy

There is a 40\% chance that the car company offers \$1000 cash back starting next month. How much would it be worth to know with certainty whether there will be a cash back or not?

***
**General procedure** for calculating the **value of information**:
1. Calculate value without the information
2. Calculate value with the information
3. Calculate the difference

***

Back to the cash back example...

We first solve the decision tree without the information:

![image-8.png](attachment:image-8.png)

According to the decision tree, we should buy, because renting has a lower expected value.
Note: value of "buy" decision is 0 because this is the baseline value.

Now we solve the decision tree where we first receive the information (this adds a chance node at the beginning, because information is uncertain at the moment):

| ![image-9.png](attachment:image-9.png) | ![image-10.png](attachment:image-10.png) |
|:---:|:---:|

Therefore, with the information, the expected value of our choices is \$200.

To summarize:
* Value with information: \$200
* Value without information: \$0
* Value of Information = \\$200 - \\$0 = \\$200
 

## Independent work

**TODO**: Using the open source [SilverDecisions](http://silverdecisions.pl/) software for decision tree analysis, model decisions in "Who Wants to Be a Millionaire?" (E.g., participants sometimes seem to ignore that getting a question right doesn't only increase the current winnings, but also allows to answer the subsequent questions. What can be inferred about their beliefs from this?)

**Example:** Who Wants to Be a Millionaire?

In this scenario, you are anwering the 500,000 question, with no lifelines left. You believe to have a 50\% chance of correctly answering the current question and 10\% chance of knowing the 1,000,000 question. If you give up, you get 250,000, and if you get the question wrong you get 32,000. Should you answer?

I will use the SilverDecisions app to make the decision tree:

![image-11.png](attachment:image-11.png)

The green path outlines the optimal policy.
If we are 50\% sure of having the right answer (e.g., eliminating two options), we should answer the question, because the expected value is 291,000, as opposed to the 250,000 we currently have. Even if we think that there is 0\% chance we will know the answer to the 1,000,000 question (as opposed to the assumed 10\%), we should still answer. If we get the 500,000 question correct, and don't know the answer to the 1,000,000, then we should not hazard a guess.



**TODO**: Check Dennis Lindley's ["Making Decisions"](https://www.wiley.com/en-us/Making+Decisions%2C+2nd+Edition-p-9780471908081) book(should be very clear, if it's judged by ["Understanding Uncertainty"](https://www.wiley.com/en-gb/Understanding+Uncertainty%2C+Revised+Edition-p-9781118650127))
