# Uncertainty
---

Very rarely do we know anything for sure. We always have some uncertainty. We'll look at probability.

## Probability
We'll look at "possible worlds", $\omega$. When we roll a die, there are 6 possible worlds. Probability is $P(\omega)$.
- $0\leq P(\omega)\leq1$
- $\sum\limits_{\omega\in\Omega} {P(\omega)} = 1$


## Conditional Probability
**Unconditional Probability**: Degree of belief in a proposition in the absence of any other evidence.

**Conditional Probability**: Degree of belief in a proposition given that some knowledge has already been revealed. $P(a|b)$.

$P(a|b) = \frac{P(a\land b)}{P(b)}$

$P(a\land b) = P(b)\times P(a|b)$

## Random Variables

**Random Variable**

**Probability Distribution**:

$P(\textit{Flight = on time}) = 0.6$

$P(\textit{Flight = delayed}) = 0.3$

$P(\textit{Flight = cancelled}) = 0.1$

is also represented as:

$\mathbf{P}(\textit{Flight}) = \left<0.6, 0.3, 0.1\right>$

**Independence**: $P(a\land b) = P(a)\times P(b)$

## Bayes' Rule

$P(b|a) = \frac{P(b)\times P(a|b)}{P(a)}$

Given clouds in the morning, what's the prob of rain in the afternoon? Maybe I know that 80% of rainy afternoons start with cloudy mornings. I know that 40% of days have cloudy mornings, and 10% of days have rainy afternoons.

$P(rain|clouds) = \frac{P(clouds|rain)\times P(rain)}{P(clouds)}$

$= \frac{0.8\times0.1}{0.4} = 0.2$.

Knowing $P(\textit{cloudy mornings}|\textit{rainy afternoons})$ means that we can calculate $P(\textit{rainy afternoons}|\textit{cloudy mornings})$.

More generally, if we know $P(\textit{visable effect}|\textit{unknown cause})$ means that we can calculate $P(\textit{unknown cause}|\textit{visible effect})$.

## Joint Probability

![img.png](img.png)

In [None]:
from IPython.display import Image

Image("img.png")

What is $\mathbf{P}(C|R)$? (The whole vector, $P(C|R)$ and $P(\neg C|R)$).

$=\left<\frac{P(C,R)}{P(R)}, \frac{P(\neg C,R)}{P(R)}\right>$

or sometimes:

$\alpha\times\left<P(C,R)\right>$

$=\alpha\left<0.08, 0.02\right>$.

We can find that $\alpha = 10$ by just making sure things sum to $1$.

## Probability Rules:

- $P(\neg a) = 1 - P(a)$
- **Inclusion Exclusion Formula**: $P(a\lor b) = P(a) + P(b) - P(a\land b)$
- **Marginalization**: $P(a) = P(a, b) + P(a, \neg b)$
- **Marginalization**: $P(X = x_i) = \sum\limits_j {P(X = x_i, Y = y_j)}$
- **Conditioning**: $P(a) = P(a|b)\times P(b) + P(a|\neg b)\times P(\neg b)$
- **Conditioning**: $P(X= x_i) = \sum\limits_j {P(X = x_i | Y = y_j)\times P(y_j)}$

## Bayesian Networks

**Bayesian network**: Data structure that represents the dependencies among random variables.

- Directed graphs
- Each node represents a random variable
- An arrow from $X$ to $Y$ means $X$ is a parent of $Y$.
- Each node $X$ has a probability distribution of $\mathbf{P}(X|Parents(X))$

Let's imagine I have an apt out of town, and I'm taking a train to get there. Getting there on time will depend on the train. But the train, for example, depends on rain, or maintenance, etc.

In [None]:
Image("img_1.png")

Rain has no arrows pointing at it. It has no conditional probability. It might look like

| None 	| Light 	| Heavy 	|
|------	|-------	|-------	|
| 0.7  	| 0.2   	| 0.1   	|

Next we have Maintenance. The heavier the rain, the less likely we have maintenance.

| $\mathbf{R}$ 	| Yes   	| No    	|
|--------------	|-------	|-------	|
| None         	| $0.4$ 	| $0.6$ 	|
| Light        	| $0.2$ 	| $0.8$ 	|
| Heavy        	| $0.1$ 	| $0.9$ 	|

Now let's look at Train-- on time or delayed. This is dependent on the TWO nodes pointing towards it.

We can construct a larger probability distribution:

In [None]:
Image("img_2.png")

Finally, we have the $appointment$- did we attend or miss? It is influenced by track $maintenance$ and $rain$, but we've already encoded this. We only want to encode direct relationships. Already knowing the train info, $maintenance$ or $rain$ won't really give me more info.

| $\mathbf{T}$ 	| attend 	| miss  	|
|--------------	|--------	|-------	|
| on time      	| $0.9$  	| $0.1$ 	|
| delayed      	| $0.6$  	| $0.4$ 	|

What if I want to know $P(light)$? This is a single value I already have access to.

What about $P(light, no)$?. Calculate this as $P(light\land no) = P(light)\times p(no|light)$.

$P(light, no, delayed) = P(light)\times P(no|light)\times P(delayed|light, no)$

$P(light, no, delayed, miss) = p(light)\times p(no|light) \times (delayed| light, no)\times p (miss|delayed)$

How can we get new pieces of info? Can we draw new conclusions? Can we figure out the probabilities of variables taking on values?

### Inference
- Query $X$: Variable for which to compute distribution.
- Evidence variables $\mathbf{E}$: observed variables for event $e$.
- Hidden variables $Y$: Non-evidence and non-query variables.

- Goal is to calculate $\mathbf{P}(X|e)$.

We want to find $\mathbf{P}(Appointment|light,no)$. The hidden variable is $train$. We know that conditional probability is proportional to the joint probability.

$=\alpha\mathbf{P}(Appointment, light, no)$

$=\alpha\left[\mathbf{P}(Appointment, light, no, on time) \\+ \mathbf{P}(Appointment, light, no, delayed)\right]$

### Inference by Enumeration

$$P(X|e) = \alpha P(X, e) = \alpha\sum\limits_y {P(x, e, y)}$$,
where

- $X$ is the query variable
- $e$ is the evidence
- $y$ ranges over values of hidden variables
- $\alpha$ normalizes the result

It's pretty annoying to do as a human, but we can program AI to do this. There are a lot of Python libraries for Bayesian networks.

> See bayesnet/*.py

This is one way to do it, but it's not efficient. There are ways to optimize, but still, as we gain variables, it's going to be a lot of work.

We don't always care about exact inference. Sometimes we care about **Approximate Inference**

## Sampling

We're going to take a sample of all variables inside the Bayesian network.

| None 	| Light 	| Heavy 	|
|------	|-------	|-------	|
| 0.7  	| 0.2   	| 0.1   	|

Using a random number generator, we'll randomly pick 1 of these 3 values. Maybe, for example, we pick $None$.

Now that we've observed that $\mathbf R$ is $None$, we'll randomly sample from the top row of this table:

| $\mathbf{R}$ 	| Yes   	| No    	|
|--------------	|-------	|-------	|
| None         	| $0.4$ 	| $0.6$ 	|
| Light        	| $0.2$ 	| $0.8$ 	|
| Heavy        	| $0.1$ 	| $0.9$ 	|

Perhaps we observe $Yes$.

For the $Train$ distribution, we'll randomly sample from the row where $\mathbf R$ is $None$ is $\mathbf M$ is $Yes$.

Finally, we'll do the same to check if we attended the appointment.

This becomes powerful when we repeat a lot.

For $P(Train = \textit{ on time})$, we could find the exact probability. But, we could just sample it and get close. We don't need to be right 100% of the time.

In [None]:
Image("img_3.png")

In $6/8$ samples, we are on time.

We calculated the unconditional probability $P(Train = \textit{ on time})$. But sometimes, we'll have conditions.

$P(Rain = \textit{light} | Train = \textit{on time})$

What we might do for this is again look at all our samples. But, now we know the train is on time. The two cases where the train is delayed are put in the trash and excuded or ignored. Looking at the same data, we can estimate $2/6$ for this probability.

> See bayesnet/sample.py

The issue shows up when you're looking at rare events. You're going to be rejecting a lot of samples. Especially if my evidence is very rare.

### Likelihood Weighting

- Start by fixing the values for evidence variables.
- Sample the non-evidence variables using conditional probabilities in the Bayesian Network.
- Weight each sample by its **likelihood**: the probability of all of the evidence.

$P(Rain = \textit{light} | Train = \textit{on time})$

Start by fixing the evidence variable. Already have in the sample $\mathbf{T} = \textit{on time}$.

Then, we'll randomly sample Rain. We'll randomly sample Maintenance. When we get to Train, we'll just skip over it. We'll move on to Appointment.

Now, how do we weight it? What's the prob of train on time given the other variables? Our train table tells us $0.6$. This sample would have a weight of $0.6$.

## Markov Models

Often times, our variables change over time. Weather is a great example.

Now, we'll have another random varibale for every single time step:

$X_t = $ the weather on time $t$.

But, there's an incredible amount of data that can go in to this over a long time.

So, we'll  make some simplifying assumptions. If they're close to accurate, it should be good.

**Markov Assumption**: The assumption that the current state depends on only a finite fixed number of previous states.

The current day's weather depends just on the last $n$ days' weather.

**Markov chain**: A sequence of random variables which follow the Markov assumption.

In [None]:
Image("img_4.png")

If it's sunny today, then it's more likely to be sunny tomorrow.

This image, this matrix, is called the **Transition Model**. We could do a similar sampling procedure.

We might form the following Markov chain:

In [None]:
Image("img_5.png")

We can answer many questions with this model.

> chain/*.py

Markov models depend on knowing each individual state. We often don't know the exact states, though.

Often, we have sensors or something, and we don't know 100% sure what's going on.

### Sensor Models

Given these audio waves, can you tell what the words spoken actually are?D

In [None]:
Image("img_6.png")

For us, hidden state is the weather, and our observation is whether or not employees are bringing umbrellas or not. Using that information, we want to predict if it's sunny or rainy.

## Hidden Markov Model

**Hidden Markov Model**: A Markov models for a system with hidden states that generates some observed event.

In [None]:
Image("img_7.png")

Often called the **Sensor Model**. Or emission model.

**Sensor Markov assumption**: The assumption that the evidence variable depends only on the corresponding state.

In [None]:
Image("img_8.png")

Instead of just hacking 1 chain, we have 2. Each of these states produces an emission- a result that it saw. ($E_i$ stands for emission).

Common tasks we'll see:

In [None]:
Image("img_9.png")

We're usually interested in most likely explanation.

> See hmm/*.py