# Uncertainty in Project Planning



## The Future is Uncertain
...segue from NPV lesson.



### Importance of Distributions: the Flaw of Averages
A point design for the most probable case is insufficient:

 - Ignores uncertainty, neglects possible risks and upsides.
 - Incorrectly predicts performance: performance at expected conditions is not expected performance.

i.e. consider uncertain inputs $x$, and performance $f(x)$ for some system model $f$. $x$ is a random variable.
Performance of a point design at most probable conditions: $f(E[x])$.
Expected performance: $E[f(x)]$.
But,

$$E[f(x)] \neq f(E[x])$$
unless $f$ is linear.

Simple counterexample: Condiser a system with two inputs, $x_1, x_2 \overset{iid}{\sim} \mathcal{N}_1(0,1)$. The system has a single performance metric $y$ modeled as $y = f(x_1, x_2) = x_1^2 + x_2^2$. The expected values of the inputs are $E[x_1] = E[x_2] = 0$, so the "performance at expected conditions" is $y = f(0,0) = 0$. However, $y \sim \chi^2_2$ and $E[y] = 2$. Thus the "expected performance" is $2$. For this system, the "performance at expected conditions" is different from the "expected performance", illustrating the Flaw of Averages.

[Drake Equation example](http://www.jodrellbank.manchester.ac.uk/media/eps/jodrell-bank-centre-for-astrophysics/news-and-events/2017/uksrn-slides/Anders-Sandberg---Dissolving-Fermi-Paradox-UKSRN.pdf)

## Estimating the Distribution of Future Possibilities

### de Neufville's Procedure:

#### 1. Identify Important Factors

 - Performance objectives (outputs)
 - Performance drivers (inputs)

Subtleties: timescale, level of aggregation/specificity
Modeling gets harder (superlinearly) with number of uncertain performance drivers - limit model to what's really useful (maybe sensitivity analysis to screen...)

#### 2. Analyze historical trends
How have the performance drivers varied with time in the recent past?

Subtleties:

 - Trends can look different if we fit to different time windows.
 - Data quality (e.g. systematic biases, reporting incentives).
 - Inconsistent definitions
 
#### 3. Identify trend-breakers
Unexpected, low-probability events often disrupt business as usual, and many systems are not robust to these events. See also Nasim Taleb's Black Swan Theory.

It's hard to accurately and meaningfully express the probability of trend-breakers. Instead, do a scenario analysis:
 1. Brainstorm possible trend-breaker scenarios.
 2. Configure the model to each scenario, run.
 3. Look for a design which is robust across all/many scenarios.

Hard part is brainstorming scenarios, but this is a valuable exercise. See also [murphyjitsu](https://mindlevelup.wordpress.com/mindlevelup-the-book/planning-101/):

> Say that you’re currently making a plan. As you go to act on it, you see a future version of yourself teleports in front of you saying, "No! Don’t do it! You’ll fail!"
> Then, your future self disappears.
> Given that info, what seems like the most probable way that things can go wrong?"

Example trend breakers for a new electricity plant in Spain:

1. *Unreliable Energy Markets* - War in the Middle East Disrupts supply of natural gas. LNG prices rise.
2. *Plentiful Nuclear Power* - France and Germany invest in many new nuclear power plants. Demand for LNG and EU CO2 permits drop as a result, causing the prices of these goods to fall.

#### 4. Establish forecast (in)accuracy
"to pretend a point forecast is precise is deceitful" - de Neufville

Best approach: compare past forecasts with actual outcomes.

Sometimes past forecasts are unavailable; in this case run the new forecasting model on various windows of past data (i.e. [backtesting](https://en.wikipedia.org/wiki/Backtesting) ).

Subtlety:  use a reserved validation dataset - don't evaluate the model on the data it was fit to!

#### 5. Build a dynamic model
We need a model which:

1. Predicts the performance of the system over time for a given choice of input parameters.
2. Captures the estimated distribution of the input parameters.

Desirable model attributes:

  - *Causal* - The model is not merely based on correlations, rather there is some domain knowledge which convinces us we have causality pointed the right way. Silly counterexample: a fire damage control model which assumes that the extent of fire damage is proportional to the number of firefighters responding to the fire.
  - *Socio-technical* - Includes social, economic and regulator aspects, as well as engineering considerations.
  - *Simple* - The model should be simple enough that most stakeholders can understand it, at least at a high level. This is important for building trust and adoption.

We will typically use the model in a Monte-Carlo simulation to estimate the distribution of the performance parameters.

### Hospital Example
Designing an expansion for the maternity ward of an English hospital, c.a. 2008.

#### 1. Identify Important Factors
Performance outputs:
  - Quality of clinical outcomes
  - Pleasantness of the experience for individual patients
  - Financial viability of operating the ward

Input: Number of births delivered at the hospital.

This example will focus just on modeling the input uncertainty, not the links from inputs to outputs.

#### 2. Analyze historical trends
Hospital birth records only go back 13 years. However, there is a strong link between hospital births and county births, and the county birth records go back further.

Trends: gradual increase over 30 years, more rapid increase in last 10 years.

Looking deeper - birth rate depends on two underlying variables: female population age 15-44 and fertility. County population has been steadily increasing. Fertility had been decreasing, but started increasing in 2001 due to influx of immigrants, who typically have more children than native-born English.

#### 3. Identify trend-breakers
Example trend-breaking scenario:
> A shift in UK immigration policy causes the foreign-born population to shrink. This reduces the average fertility rate, which returns to the pre-2001 declining trend.

#### 4. Establish forecast (in)accuracy
No past forecasts are available --> estimate accuracy by backtesting the current forecasting model.

Note: the forecasting model centered at time $t$ is a linear tend fit to the data in the time window $[t-10, t]$.

#### 5. Build a dynamic model
Only doing the "Capture the estimated distribution of the input parameters" part.

Get a set of error vs. time curves from backtesting.

Approach to sampling from the estimated distribution of future birth rates.
  1. Predict the "nominal case" from the 10-year linear trend.
  2. Pick one of the error vs. time curves at random.
  3. Add the error to the nominal prediction.


## Flexibility as Response to Uncertainty
### Traditional approach is flawed
Requirements / waterfall process does not handle uncertainty well. Requirements are usually set to the expected conditions, this falls prey to the Flaw of Averages.

Waterfall process does not allow for flexibility in alerting the design as more is learned about the problem.

### Solutions: Robustness and Flexibility
*Robustness* - Designing a single system configuration which can handle most of the expected range of inputs. Downside: can lead to wasteful overdesign. e.g. design a maternity ward which is large enough for the 90th percentile future birth rates.

*Flexibility* - Design a system which can be reconfigured later as inputs change. This allows us to be more sure the added capacity is needed before spending on it. e.g. design a maternity ward for 40th percentile birth rates, but make provisions to easily expand the building in the future if needed.

segue to flexibility in next lecture...


# References
[1] de Neufville, Richard and Scholtes, Stefan. *Flexibility in Engineering Design*. Cambridge, MA: MIT Press, 2011.
