# Reading Activity 1: Introduction to Predictive Modeling

## Objectives
+ To define predictive modeling.
+ To introduce the idea of structural causal models and their graphical representation.
+ To tell the difference between aleatory and epistemic uncertainties.

## Predictive Modeling

> Reports that say that something hasn't happened are always interesting to me, because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns—the ones we don't know we don't know. And if one looks throughout the history of our country and other free countries, it is the latter category that tend to be the difficult ones. Donald Rumsfeld, United States Secretary of Defense, [DoD news briefing, February 12, 2002](https://archive.defense.gov/Transcripts/Transcript.aspx?TranscriptID=2636).

*Predictive modeling* is the process of describing our state of knowledge about known unknowns in order to make informed decisions.
This is the scope of this class.

Unfortunately, there is no automated way for turning unknown unknowns to known unknowns.
This is currently done manually as it requires common sense and human intuition.
Automating this process seems to require the ability to perform induction on open-ended problems and it may require general artificial intelligence.

## Structural causal models

A *causal model* is a model that attempts to capture the mechanisms that govern a given phenomenon.
We will use the language of *structural causal models* (SCM), developed by the computer scientist Judea Pearl, to formalize the concept.
A structural causal model is a collection of three things:
+ A set of variables. These are variables that our model is trying to explain (endogenous), but also other variables that may just be needed (exogenous).
+ A set of functions that give values to each variable based on the values of all other variables.

Most physical and engineering models are causal models.

### Example: Asthma model (J. Pearl)

Suppose that we are trying to study the causal relationships between a treatment $X$ and lung function $Y$ for individuals who suffer from asthma.
However, it is plausible that $Y$ also depends on the air pollution levels $Z$.
The final ingredient is the set of function that connects $X$ and $Z$ to $Y$.
$$
Y = f(X, Z).
$$

### Graphical representation of causal models
Every SCM is corresponds to a *graphical causal model*.
These are usually *directed acyclic graphs* (DAGs).
These can be read trivially from the SCM form.
Let's look at an example.

### Example: Asthma model - Graphical causal model
Here I am representing each variable with a node.
The node at the beginning on an arrow is the direct cause of the node at the end of the arrow.

In [1]:
from graphviz import Digraph # TODO add pygraphviz to dependencies
g = Digraph('Asthma')
g.node('X', label='X (treatment)')
g.node('Y', label='Y (asthma)')
g.node('Z', label='Z (air pollution)')
g.edge('X','Y')
g.edge('Z', 'Y')
#g.render('asthma_graph', format='png') # Uncomment the line if you want to save the figure
g

ExecutableNotFound: failed to execute 'dot', make sure the Graphviz executables are on your systems' PATH

<graphviz.dot.Digraph at 0x16648f43eb0>

## Types of Uncertainty
In general, we are uncertain about something if we don't know much about it.
In particular, we can be uncertain about:
+ the value of a model parameter;
+ the initial conditions of an ordinary differential equations;
+ the boundary conditions of a partial differential equation;
+ the value of an experimental measurement we are about to perform;
+ the mathematical form of a model;
+ etc.

Uncertainty may be *aleatory* or *epistemic*.

+ Aleatory uncertainty is associated with inherent system randomness. 
+ Epistemic uncertainty is associated with lack of knowledge.

There is a long philosophical debate about this distinction.
We are going to ignore it.
The instructors view is that common sense and **probability theory are sufficient to describe both uncertainties**.