# Introduction to Utility and Influence Diagrams

## CSCI E-82A
### Stephen Elston

In this lesson we will start our study of **planning methods**. By planning we mean methods an **agent** which uses a **model of the environment** to find an **optimal sequence of decisions** when faced with **uncertain**.  

A number of planning methods have been developed, starting in the 1940's. These early planning theories were intended to optimize the production and use of industrial resources during the Second World War.

In this lesson we will use an extended version of the probabilistic graph theory we have been studying compute optimal decision. Specifically, we will focus on the following topics:

1. **Utility theory:** which allows us to **quantify the value of a system state**. 
2. **Influence diagrams:** which are an extension of the **representation** we have used for Bayesian graphical models. 
3. **Inference:** to compute the **sequence of optimal decisions**. 

**Suggested readings:** The following reading is an optional supplement to the material presented here:
- Barber, Sections 7.1, 7.2, 7.3, 7.4, or
- Russell and Norvig, third edition, Chapter 16, or
- Kochenderfer, Sections 3.1 and 3.2.



## Planning, Agents and the Environment

A schematic diagram of our planning **agent** and its interaction with the **environment** is shown in the figure below. 

<img src="img/AgentEnvironment.JPG" alt="Drawing" style="width:500px; height:300px"/>
<center> **Interaction of agent and environment** </center>

This model of the interaction shown between the intelligent agent and the environment applies to a number of machine intelligence situations. There are three points of interaction between the agent and the environment. This separation between the intelligent agent and the environment is defined by these interactions.

1. The **state** of the environment is **observed** by the agent. The **sensors** used to observe state are part of the environment and not the agent. 
2. A method by which the agent can cause **actions** in the environment. The actuators which carry out the agent's commands are part of the environment and not the agent. 
3. Rewards provide the agent **feedback** on the **value** or **utility** of the state of the environment. The reward is generated in the environment and not the agent. 

Notice that there is strict division between the agent and the environment. The agent provides the intelligence. All other activities occur in the environment. This division is necessary so that intelligence is separate from any activity in the environment. 

Consider an example of an intelligence agent, your prefrontal cortex, and your environment. In this case you must perform a number of task to start at home, perform your job, get paid, your reward, and return home. This plan requires a great number of interactions with the environment and decisions including:

1. Your prefontal cortex must decide when to leave. This agent commands some actuators, fingers, to tap your phone and your sensors, optic nerves, observe the bus schedule. Based on past experience, the agent knows there is a high probability the bus will run late. To avoid the negative reward of arriving late an earlier bus is chosen. 
2. As you leave the house, the agent uses actuators of the hands and fingers and sensors, the optic nerves to ensure you have your keys. 
3. Many more steps involve getting to work. Another set of tasks are required to perform your job, perhaps eat some lunch with a friend, and return home. 
4. Ultimately, your prefrntal cortex uses sensors and actuators to examine a bank account and see the change of state when you get paid. 

In the above scenario notice that the performs no action and collects no state information. Rather, this agent controls the actuators and uses sensors to find the state of the environment and receive the reward.

## Basics of Utility Theory

To make optimal decisions an intelligent agent must have a way to measure the value of the outcomes. Creating functions to measure the value of outcomes is the domain of **utility theory**. 

Let's think about a simple example. Let's say that you go to a charity dinner and you buy a raffle ticket to win a prize worth \$1,000. There are 100 tickets and each ticket costs $100. Your joy of supporting this important charity is worth 200 to you. What is your utility of buying one ticket:

$$U(1) = -p(buy) * cost + p(feeling) * value + p(win) * 1000 \\
= - 1.0 * 100 + 1.0 * 200 + 0.01 * 1000 \\
= 110$$

From the foregoing example you can likely see that the general form of a utility function can be expressed:

$$U(S) = \sum_{s} p(s)\ u(s)\\
where\\
p(s) = probability\ of\ state\ s\\
u(s) = utility\ of\ state\ s $$

Let's continue with the example. There is no reason that a utility function should have linear scaling. To understand this consider what your utility will be if you buy two raffle tickets. Your cost is now $200 for the tickets, and you have doubled your chance of winning the prize. You might think that your utility might be only 20. But, perhaps not. Your joy at helping the charity might be 400 making your utility of your larger donation: 

$$U(2) = - 1.0 * 2 * 100 + 1.0 * 400 + 0.01 * 2 * 1000 \\
= 220$$

From the above, you can see that **amount of money does not equal utility**. To understand this concept consider the following situation. In his youth, your instructor took many long backpacking trips. The person who organized the food for one 5-day hike did a poor job. Meals were minimal, and by the last day, there was no food left at all. We arrived in Aspen Colorado in mid afternoon quite hungry. We found a popular hamburger restaurant and spent the very last bit of money to buy a hamburger. The utility of that hamburger was much greater than the price paid. Then as now, Aspen is a playground of the wealthy. It is very likely that the utility of a hamburger to these wealthy customers was considerably less than too a young man who has not had enough to eat for several days!  

## Actions and Expected Utility

By itself, a utility function does not tell us anything about the results of **actions** the agent might take. The **expected utility function** is the product of the value of a state multiplied by the probability of being in a state given the observation and the action:


$$E \big[ U(a\ |\ o) \big] = \sum_{s'} p(s'\ |\ a,o)\ U(s') $$

For a planning problem we want to find the **optimal action**, $a$, such that:

$$argmax_a E \big[ U(a\ |\ o) \big] = argmax_a \sum_{s'} p(s'\ |\ a,o)\ U(s') $$

While simple conceptually, directly applying the above formulation to solve for the optimal action can be difficult beyond the simplest problems. An example of such a problem can be represented as a **decision tree**, which allows relatively direct solution. However, using this representation is limited both by intellectual capacity and computational complexity. We will not go further down this path. You can see some more detail along with some examples in Section 7.2 in Barber. 

## Influence Diagram

In previous lessons we have investigated the use of Bayesian networks as a representation of probability distributions and their independencies. We can extend this representation to become **influence diagrams**.   

A representation for a decision process must preserve **causality**. A decision cannot be made until previous decisions have been made and resulting state is observed. Bayesian networks represent causality or influence of one set of variables to others. 

There are two additional elements that must be added to Bayesian networks to transform them to influence diagrams:
1. Decision nodes which have no distribution. In effect, the decision nodes are like switches which initiate actions in the environment. We illustrate decisions nodes as rectangles. 
2. Utility nodes, which measures the value of the states of the environment. We illustrate utility nodes as diamonds. 

We also need three types of directed edges to specify influence diagrams. 
- Edges that **propagate belief**, or conditional information as we used before. 
- **Informational edges** which propagate information that is not related to a distribution or belief.
- **Functional edges** which end in utility nodes which propagate the information needed for the utility calculation. 

Let's look at a few simple cases that occur in influence diagrams. As illustrated in the diagram below a random variable in an influence diagram can be dependent on both other random variables as well as decisions. We have already spent considerable time on dependencies and independencies of random variables. But consider what happens when a decision is imposed. The decision will force the probabilities of some states to 0. That is, a decision allows some states but not others to occur. 

<img src="img/RandomVariable.JPG" alt="Drawing" style="width:300px; height:100px"/>
<center> **Dependent random variable with decision** </center>

In the above diagram notice that the edge between the decision and the random variable is shown as an **information link**. This is because, decisions have no distributions associated with them. 

The diagram below shows how a utility node can be dependent on both random variables and decision nodes. We have already discussed how a probability distribution is used to compute utility. A decision will fix the set of states which are possible and therefore the total utility. 

<img src="img/Utility.JPG" alt="Drawing" style="width:300px; height:100px"/>
<center> **Utility with decision and random variable** </center>

In the above case, the diagram contains two function edges, one with distribution information and the other with decision information. 

## Consistency and Partial Ordering

As has already been mentioned causality is important in influence diagrams. This property is know as **causal consistency**. Influence in must be in causal order. This organization is called **partial ordering**. 

Let's say that we have a series of variables, $\chi_i$, separated by decision variables, $D_i$. The variable $\chi_i$ must be observed before the decision $D_i$ can be made. The variable $\chi_{i+1}$ cannot be observed before decision $D_i$ is made. We represent the partial ordering with the precedence symbol, $\prec$. We represent this situation as:

$$\chi_1 \prec D_1 \prec \chi_2 \prec D_2 \ldots \chi_n \prec D_n$$

## Inference on Influence Diagrams

In previous lessons, we explored a number of widely used exact inference methods for Bayesian belief networks. It turns out that many of the inference methods for Bayesian networks, like variable elimination, belief propagation, and the junction tree algorithm, can be applied to the case of influence diagrams as well. 

Here, we will only touch on the variable elimination method. Variable elimination for influence diagrams follows a similar process as for Bayesian networks. The difference being that we proceed in the reverse of the partial ordering.   

Given the partial ordering of a set of random variables $x_t$ and decision variables $d_t$ we can write the probability of the $T$th state:

$$p(x_{1:T}, d_{1:T}) = \prod_{t = 1}^T p(x_t\ |\ x_{t-1}, d_{1:t})$$

Multiplying through by the utility, $u(x_{1:T}, d_{1:T})$ we can maximize the sums over the variables for each of the decisions:  

$$max_{d_1} \sum_{x_1} \cdots max_{d_{T}} \sum_{x_{T}} \prod_{t = 1}^T p(x_t\ |\ x_{t-1}, d_{1:t})\ u(x_{1:T}, d_{1:T})$$

We can rearrange the sums so that we can eliminate variables, resulting in a new marginal utility variable, $\widetilde{u}(x_{1:T-1}, d_{1:T-1})$:

$$ max_{d_1} \sum_{x_1} \cdots max_{d_{T-1}} \sum_{x_{T-1}} \prod_{t = 1}^{T-1} p(x_t\ |\ x_{t-1}, d_{1:t})\ max_{d_{T}} \sum_{x_{T}} p(x_T\ |\ x_{1:T-1}, d_{1:T}) u(x_{1:T}, d_{1:T}) \\
= max_{d_1} \sum_{x_1} \cdots max_{d_{T-1}} \sum_{x_{T-1}} \prod_{t = 1}^{T-1} p(x_t\ |\ x_{t-1}, d_{1:t})\ \widetilde{u}(x_{1:T-1}, d_{1:T-1})$$

There is one factorization we can apply to variable elimination on influence diagrams to simplify the problem. We can take advantage of the fact that early decisions are **independent** of latter decisions since the later decisions are not yet known. Thus, in many cases we can simply sum the marginal utilities.

> **Note:** You can find some more information on how to apply belief propagation and the junction tree algorithm influence diagrams in sections 7.4.1 and 7.4.2 or Barber. 

## A Computational Example. 

Let's work though a computational example to make these concepts more concrete. Consider a delivery robot which must make decisions under uncertainty. The robot is required to make on-time deliveries or a credit must be given to the customer to compensate for the late arrival of the order. The utility for the robot is:

| On time | Slightly late | Very late|
|-----|----|----|
|25|0.5|-25|

The influence diagram for the delivery robot's intended trip is shown in the figure below:

<img src="img/Bridge.JPG" alt="Drawing" style="width:500px; height:300px"/>
<center> **Influence Diagram for the Delivery Robot** </center>

The partial ordering of this problem is:

$$leave\ early \prec U1 \prec P(traffic\ delay\ |\ traffic) \prec take\ bridge \prec p(bridge\ delay\ |\ bridge\ open) \prec U2$$

The Utility for arrival times, U2, has already been stated. 

In this simple example, there are two decisions:
1. The robot must determine if it should leave early or not. Leaving early, has lower utility since the robot is not available to perform other deliveries. 
2. Toward the end of its journey, the robot must decide to take a direct route over a draw bridge or a longer route with no bridge. However, with some nonzero probability the bridge may open for an extended period of time to allow marine traffic to pass. Notice that this decision is dependent on how early or late the robot is at the decision time. 

The expected arrivals for the robot give the delay at the bridge decision time. There are six possible cases for the three states of arrival time. The first two cases, correspond to the decision to take the alternate route. The other 4 cases are conditionally dependent on the state of the bridge. The CPD looks like this:

| Arrival |  alternate route, no traffic delay | alternate route, traffic delay | bridge closed - no traffic delay | bridge closed- traffic delay | bridge open - no traffic delay | bridge open - traffic delay |   
|----|----|----|----|----|----|----|----|
| on time | 0.6 | 0.5 | 0.90 | 0.6 | 0.0 | 0.00 |
| slightly late | 0.3 | 0.3 | 0.05 | 0.2 | 0.05 | 0.05 |  
| very late | 0.1 | 0.2 | 0.05 | 0.2 | 0.95 | 0.95 |

If the decision is made to avoid the bridge, the state of the bridge does not matter. Therefore we assign a probability of 1.0 to no bridge delay in these cases. The bridge is open about 10% of the time, so the CPD for the bridge looks like this:

|  | alternate route, no traffic delay | alternate route, traffic delay | bridge closed - no traffic delay | bridge closed- traffic delay | bridge open - no traffic delay | bridge open - traffic delay |   
|----|----|----|----|----|----|----|----|
| P(closed) | 1.0 | 1.0 | 0.9 | 0.9 | 0.1 | 0.1 |

Now we have all the information we need to perform the first set of variable eliminations. The variable include:
- The decision to take the bridge.
- The CPD of no delay given the state of the bridge, the decision to take the bridge and if the robot is late at the decision time. 
- The probability the bridge is closed - no delay. 
- The utility for arrival time states. 

The variable elimination will reduce these variables to the marginal utility for the best cases of arriving late at the decision point or not. The code comments in the cell below explain the details of the calculation. 

> **Note** The code in this example is intended for illustration purposes rather than production performance and quality. Apply this approach at your own risk! 

In [1]:
import numpy as np
import pandas as pd
col_names = ['alt-no', 'alt-del', 'closed-no', 'closed-del', 'open-no', 'open-del']
row_names = ['On time', 'Slightly late', 'Very late']

## payoff for on-time, slightly late, and very late delivery
U2 = [25, 0.5, -25] 
## Probability of arrival time
## Three rows = on time, slighly late, very late
delay2 = np.array([[0.6, 0.5, 0.90, 0.6, 0.0, 0.00],
                   [0.3, 0.3, 0.05, 0.2, 0.05, 0.05],
                   [0.1, 0.2, 0.05, 0.2, 0.95, 0.95]])
## Probability bridge is open
bridge_up = np.array([1.0, 1.0, 0.9, 0.9, 0.1, 0.1])

## Compute probability of arrival times given bridge 
bridge_delay = np.multiply(delay2, bridge_up)
print('Probabilities of delay by routes')
print(pd.DataFrame(bridge_delay, columns = col_names, index = row_names))

## Compute utility
print('\nUtilities by routes and delay')
bridge_utility = np.transpose(np.multiply(np.transpose(bridge_delay), U2))
print(pd.DataFrame(bridge_utility, columns = col_names, index = row_names))

## Marginal utilities for the 6 cases.
marginal_bridge_utility = np.sum(bridge_utility, axis = 0)
print('\nThe marginal utility for by route and early or late at decsion point')
print(pd.DataFrame(marginal_bridge_utility, index = col_names).transpose())

## We need the maximum utility for on time and delayed arrival at the decison point. 
## need to sum the utilites of bridge and alternate for on-time and late arrival at decision point
## This means we need to sum the utilities for using the bridge (open, closed) for each arrival time. 
total_bridge_utility = np.array([marginal_bridge_utility[0], 
                                 marginal_bridge_utility[1],
                                 np.sum(marginal_bridge_utility[[2,4]]),
                                 np.sum(marginal_bridge_utility[[3,5]])])
print('\n The total utility for each option')
print(pd.DataFrame(total_bridge_utility, index = ['alt-no', 'alt-del', 'bridge-no', 'bridge-del']).transpose())

## Finally, get the max for the options (bridge, alternate) for on time or late arrival at decision point.
max_bridge_utility = np.array([np.max(total_bridge_utility[[0,2]]), np.max(total_bridge_utility[[1,3]])])
print('\nMaximum utilities for on time and late at decision point')
print(max_bridge_utility)

Probabilities of delay by routes
               alt-no  alt-del  closed-no  closed-del  open-no  open-del
On time           0.6      0.5      0.810        0.54    0.000     0.000
Slightly late     0.3      0.3      0.045        0.18    0.005     0.005
Very late         0.1      0.2      0.045        0.18    0.095     0.095

Utilities by routes and delay
               alt-no  alt-del  closed-no  closed-del  open-no  open-del
On time         15.00    12.50    20.2500       13.50   0.0000    0.0000
Slightly late    0.15     0.15     0.0225        0.09   0.0025    0.0025
Very late       -2.50    -5.00    -1.1250       -4.50  -2.3750   -2.3750

The marginal utility for by route and early or late at decsion point
   alt-no  alt-del  closed-no  closed-del  open-no  open-del
0   12.65     7.65    19.1475        9.09  -2.3725   -2.3725

 The total utility for each option
   alt-no  alt-del  bridge-no  bridge-del
0   12.65     7.65     16.775      6.7175

Maximum utilities for on time and late 

These two utility values are the maximum for early and late arrival at the bridge decision point. In the first case, the maximum utility is to take the bridge. However, if the robot arrives late at the decision point the maximum utility is to take the alternate route. 

We are ready for the second variable elimination. In this case we want to compute the marginal utility of leaving early or not. The variables to be eliminated are:

- The decision to leave early.
- The CPD of traffic delay given the decision to leave early or not.
- The probability of heavy traffic.

We want to compute the marginal probability of early and late arrival at the bridge for two possible decisions, leaving early or leaving on time.

The CPD of arriving at the bridge decision point early or late is:    

|  | leave early, light traffic | leave on time, light traffic |  leave early, heavy traffic | leave on time, heavy traffic |
|----|----|----|----|----|
|Arrive early | 0.8 | 0.7 | 0.6 | 0.3 |
|Arrive late | 0.2 | 0.3 | 0.4 | 0.7 |

The probability distribution of light and heavy traffic is:    

| | leave early, light traffic | leave on time, light traffic |  leave early, heavy traffic | leave on time, heavy traffic |
|----|----|----|----|----|
|p(traffic) | 0.5 | 0.5 | 0.5 | 0.5 |

The comments in the code cell below explain the details of the variable elimination process.

In [2]:
col_names2 = ['early-light','on-time-light','early-heavy','on-time-heavy']
row_names2 = ['on-time', 'late']

## probability of high traffic
traffic = np.array([0.5, 0.5, 0.5, 0.5])
## Probability of arrival by tranfic
traffic_delay = np.array([[0.8, 0.7, 0.6, 0.3],
                          [0.2, 0.3, 0.4, 0.7]])
print('Probabilities of arrival by traffic delay')
print(pd.DataFrame(traffic_delay, columns = col_names2, index = row_names2))

## Compute probabilities of arrivals 
traffic_arrivals = np.multiply(traffic_delay, traffic)
print('\nProbabilities of arrivals given traffic')
print(pd.DataFrame(traffic_arrivals, columns = col_names2, index = row_names2))

## Compute marginal probability of early or late at bridge decsion
prob_early = np.sum(traffic_arrivals[:,:2], axis = 1)
print('\nMarginal probability of early or late at bridge decsion for early leaving')
print(prob_early)

## Compute marginal probability of early or late at bridge decsion
prob_on_time = np.sum(traffic_arrivals[:,2:], axis = 1)
print('\nMarginal probability of early or late at bridge decsion for on time leaving')
print(prob_on_time)

Probabilities of arrival by traffic delay
         early-light  on-time-light  early-heavy  on-time-heavy
on-time          0.8            0.7          0.6            0.3
late             0.2            0.3          0.4            0.7

Probabilities of arrivals given traffic
         early-light  on-time-light  early-heavy  on-time-heavy
on-time          0.4           0.35          0.3           0.15
late             0.1           0.15          0.2           0.35

Marginal probability of early or late at bridge decsion for early leaving
[0.75 0.25]

Marginal probability of early or late at bridge decsion for on time leaving
[0.45 0.55]


Now we need to put this all together. We have the marginal distributions of the arrival at the bridge decision point for early and on time leaving. We also have the the marginal utility for the choice or bridge or the alternative.

We can take advantage of the fact that the utility of time of leaving is independent of arrival at the customer. Thus, we can compute the total utility for both possibilities and find the maximum. 

The utility of leaving early has the following table: 

| | leave early | leave on time |
|----|----|
|Utility | -5 | 0 |

The details of the calculation are outlined in the comments in the code cell below.  

In [3]:
## Utility of leaving early or on time
U1 = [-5.0, 0] 

## Let's put this all together 
## Compute the overall utilities based on probability of on time
overall_utility = np.multiply(max_bridge_utility, prob_early) + U1
print('Expected utility for early leaving')
print(overall_utility)
print('\nTotal utility for early leaving')
print(np.sum(overall_utility))

## Compute the overall utilities based on probability of on time
overall_utility = np.multiply(max_bridge_utility, prob_on_time) 
print('\nExpected utility for leaving on time')
print(overall_utility)
print('\nTotal utility for on time leaving')
print(np.sum(overall_utility))

Expected utility for early leaving
[7.58125 1.9125 ]

Total utility for early leaving
9.49375

Expected utility for leaving on time
[7.54875 4.2075 ]

Total utility for on time leaving
11.756250000000001


In summary, the delivery robot should start on-time and take the bridge if it is on time at the decision point. However, if the robot finds itself running late, the alternative route will maximize utility. These options achieve the maximum expected utility for each decision. 

But, what would change if the bridge had a toll? The toll would reduce the expected utility of the bridge route. But, if the toll is 2 or less, the bridge would still be the better route if the robot not running late. 

#### Copyright 2018, Stephen F Elston. All rights reserved. 