$$
\def\nn{\nonumber}
\def\PD#1#2#3{\dfrac{\partial^{#1} #2}{\partial #3^{#1}}}
\def\eq#1{\begin{align}#1\end{align}}
\def\eqnum#1{\begin{align}#1\end{align}}
\def\dd{\text{d}}
\def\DE#1#2#3{\dfrac{\dd^{#1} #2}{\dd #3^{#1}}}
\def\bmaths#1{\boxed{{#1}}}
\def\color#1{}
\def\excolor{}
\def\large{}
\def\black{}
$$

# Probability

## Aims

* Introduce fundamental concepts of probability theory
* Understanding how to make simple probability calculations
* The application of probability theory to engineering problems.


## Introduction

Probability theory is concerned with assigning and estimating the probability of future random events.  While probability theory cannot predict the exact outcome of the next event, for example the outcome of single coin flip, it can tell us something about the long-term behaviour of events such as a 50% chance of head or tail if we consider the average result over multiple coin flips.  The characterisation of long term behaviour is very useful in engineering for planning and design of infrastructure systems, and for design of experiments.

## Definition of probability

An experiment is a situation where a random phenomenon is observed
The *sample space* $\Omega$ contains all possible outcomes of an experiment
An event is defined a subset of the sample space

A probability measure is defined on the sample space $\Omega$ as a function *P* which takes values between 0 and 1. In particular, the probability of all possible events in the sample space equals one, i.e.: 
$$\begin{equation}
P\left( \Omega\right) =1
\end{equation}$$

Examples of a sample space include:
* Toss of a fair coin: $\Omega=\left\lbrace Head, Tail\right\rbrace$ 
* roll of a dice: $\Omega=\left\lbrace 1, 2, 3, 4, 5, 6\right\rbrace $ 
* Annual rainfall in Oxford: $\Omega=\left\lbrace R\geq0mm\right\rbrace $


## Rules for calculating probabilities

Consider two events $A$ and $B$ that are both subsets of the same sample space $\Omega$.  Using set theory we can say something general about the relationships that might exists between the two events:
* Union: observing A or B, or both $\left( A\cup B\right)$, where $\cup$ symbolises _union_
* Intersection:  observing both A and B $\left( A \cap B\right)$, where $\cap$ symbolises _intersection _
* Complimentary event:  observing an event that _is not_ $A$ $\left( A^{C}\right)$
* Disjoint:  A and B have no elements in common, for example 'heads in first toss' and 'tail in first toss' are two disjoint events as they cannot both take place.

Each of these four definitions are illustrated using Venn diagrams in the following Figure:
![](Figures/venn-diagrams_crop.png)
    ```
    Upper left: Union of A and B (all of the red area);  Lower left: Intersection of A and B; Upper right: Complimentary event of A;  Lower right: A and B are disjoint events.```
  

If we denote the probability of observing an event in subset $A$ and $B$ by $P(A)>0$ and $P(B)>0$, then the following rules of calculating probabilities exists:

### Additive rule

$$\begin{equation}
P\left( A \cup B\right) = P\left( A\right) + P\left( B\right) -P\left( A\cap B\right)\qquad(1)
\end{equation}$$

If $A$ and $B$ are disjoint events, i.e. occurrence of $A$ precludes the occurrence of $B$, then the probability of either $A$ or $B$ is reduced to a simple addition:

$$\begin{equation}
P\left( A \cup B\right) = P\left( A\right) + P\left( B\right)
\end{equation}$$

### Complementary event

$$\begin{equation}
P\left( A^{C}\right)=1-P\left( A\right)  
\end{equation}$$

### Multiplicative rule

$$\begin{equation}
P\left( A\cap B\right) = P\left( A|B\right) P\left( B\right) 
\end{equation}$$

Here $P(A|B)$ is a conditional probability to be understood as *'the probability of A given that B has occurred'* If $A$ and $B$ are independent events then the probability of $A$ given $B$ is just the probability of $A$: $P\left( A|B\right) =P\left( A\right) $, in which case the equation above is reduced to

$$\begin{equation}
P\left( A\cap B\right) = P\left( A\right)  P\left( B\right)
\end{equation}$$

For example, the outcome of the second coin toss does not depend on the outcome of the first coin toss, and therefore the two events are independent. In contrast, the total volume of river runoff in one year depends on the total amount of rainfall that occurred in that same year, and therefore runoff and rainfall cannot be considered independent events. 

## Random variables

A random variable is a function from the sample space $\Omega$ to the to the real numbers, and since the events in $\Omega$ are characterised by being random, the random variable is also characterised by a probability function. Random variables are denoted by upper case italic letters such as $X$ and $Y$.
  
We distinguish between two types of random variables: discrete and continuous random variables.

* Discrete random variable: can take on only countable numbers such as $0, 1, 2, 3, \ldots$. 
    - $X$ = number of tails in 10 toss with a fair coin. Possible values $X=0, 1, 2, 3\ldots , 10$. 
    - $X$ = number of planes arriving in Heathrow Airport per hour. Possible values $X=0, 1, 2, \ldots$ 


* Continuous random variable: can take on a continuum of values such as any value in the interval $-\infty$ to $\infty$, or from $0$ to $\infty$.
    - $X$ = the annual rainfall in Oxford. Possible values include all values of $X\leq 0$mm.
    - $X$ = waiting time until next bus number 18 arrives at Bath station. Possible values include all values of $X\leq 0$ seconds.  



## Frequency distributions

A random variable is characterised by a particular frequency distribution, $f(x)$, which is a mathematical model representing the long-term probabilistic behaviour of the phenomenon described by the random variable.  There are many different probability distributions, and the choice depends on the type of problem and the data being analysed. For example, different types of distributions are normally used for problems involving discrete and continuous random variables.  However, common for all probability distributions is that they are characterised by a probability density function (pdf), which assigns a certain probability to a particular outcome of a random variable. For example, the probability that a random variable $X$ has a value of $x$ is 

$$\begin{equation}
P\left( X=x\right) =f\left( x\right)\dd x 
\end{equation}$$

where the function $f\left( x\right)$ is the probability density function. In accordance with the definition of the sample space in Eq.(1), integrating a pdf over the entire sample space should equal one (remember the probability of all possible events equals 1), i.e.

$$\begin{equation}
\int_{-\infty}^{\infty}f\left( x\right)\dd x =1
\end{equation}$$

From the pdf, the cummulative distribution function (cdf) can be defined as
$$\begin{equation}
F(x)=P\left( X\leq x\right) = \int_{-\infty}^{x}f\left( t\right) \dd t\qquad(2)
\end{equation} $$

where $t$ is a dummy-variable that is integrated out. An example of the pdf and cdf for a continuous random variable is shown in the example box below.



#### The exponential distribution, pdf and cdf


A particularly simple continuous distribution is the one-parameter exponential distribution which has a pdf defined as

$$f(x) = \left\{
                  \begin{array}{lr}
                    \lambda\exp \left( -\lambda x\right)  & : x \geq 0\\
                    0 & : x \le 0
                  \end{array}
                \right.
                $$
                
where $\lambda$ is a model parameter.  The exponential distribution is often used for describing the random nature of waiting times such as the waiting time between bus arrivals, or the waiting time between major flood or earthquake events. The cdf for the exponential distribution is obtained by integrating the pdf as per Eq.(2), i.e.

$$F(x) = \left\{
                  \begin{array}{lr}
                  1-\exp \left( -\lambda x\right)  & : x \geq 0\\
                  0 & : x \le 0
                  \end{array}
                 \right.
                 $$
                
The two graphs below show the relationship between the pdf and the cdf for an exponential distribution with $\lambda=1$. 

![](Figures/Exponential_pdf_cdf.png)
                
For $x*=1.5$ the probability that $X \leq x*$ is represented by the grey area on the top figure (the area under the pdf) and is calculated by evaluating the cdf at $x*$ which for $\lambda=1$ gives


$$P( X \leq x*)=F(x*)=1-\exp(-x*) = 0.78$$



Thus, the values of the cdf on the lower graph correspond to the corresponding area (grey area) under the pdf (top graph).


## The Normal distribution

The normal distribution is one of the most commonly used frequency distributions. It was first published by the German mathematician Johan Carl Friederich Gauss (1777-1855) in 1809, and is therefore sometime also known as the Gaussian distribution.  The pdf of the distribution is given by

$$\begin{equation}
f\left( x\right) =\frac{1}{\sigma \sqrt{2\pi}}e^{\left\{-\dfrac{(x-\mu)^{2}}{2\sigma^{2}}\right\}}
\end{equation}$$

where $\mu$ and $\sigma$ are model parameters representing the mean and standard deviation, respectively. If X is normally distributed random variable with mean $\mu$ and standard deviation $\sigma$ then this is often written as $X\sim N(\mu,\sigma^{2})$. The pdf for the normal distribution is shown in the Figure below for three different combinations of model parameters $\mu$ and $\sigma$.


![](Figures/Normal_notes_pdf.png)
```
Probability density function (pdf) for three normal distributions: $N(0,1)$ (blue); $N(1,1)$ (red); $N(0,2)$ (green);
```


No analytical expression for the cdf exists for the normal distribution, and as a result calculations involving the normal distribution involves the use of look-up tables or numerical methods embedded in most standard numerical tools used in engineering such as: EXCEL, Matlab and R.

A useful feature of the normal distribution is that linear combinations of normally distributed random variables are themselves normally distributed such that if $X\sim N(\mu,\sigma^{2})$ then the variable $Y=aX+b$ is a normally distributed random variable $Y\sim(a\mu+b,a^{2}\sigma^{2})$, where $a$ and $b$ are constants.

This ability to scale normally distributed random variables is very useful.  Consider a random variable $X\sim N(\mu,\sigma^{2})$. If we are interested in finding the probability $P(X\leq x_{0})$, then we can make the following transformation

$$\begin{equation}
P(X\leq x_{0}) = P\left( \frac{X-\mu}{\sigma}\leq \frac{x_{0}-\mu}{\sigma}\right) =P\left( Z\leq \frac{x_{0}-\mu}{\sigma}\right) 
\end{equation}$$

Considering the scaling ability, it can be shown that the transformed random variable $Z=\dfrac{X-\mu}{\sigma} \sim N(0,1)$, which is known as the standard normal distribution.

### Estimating model parameters

The two parameters $\mu$ and $\sigma$ in the normal distribution represent the mean and standard deviation, respectively.  When a number of observations of the random variable $X$ are available, then estimates of $\mu$ and $\sigma$ can be obtained through the *method of moments* which equals the parameter values to the corresponding sample values derived from the data.  For a series of $n$ data points $x_{1}, x_{2}, \ldots, x_{n}$, the sample mean, $\bar{x}$ and standard deviation $s$ are defined as

$$\begin{equation}
\bar{x}=\frac{1}{n}\sum_{i=1}^{n}x_{i}\qquad(3)
\end{equation}$$

$$\begin{equation}
s=\sqrt{\frac{1}{n-1}\sum_{i=1}^{n}\left( x_{i}-\bar{x}\right) ^{2}}\qquad(4)
\end{equation}$$

The two parameters of the normal distribution can now be estimated by equating the sample mean $\bar{x}$ to the mean of the normal distribution, and the sample standard deviation to the standard deviation of the normal distribution, i.e. $\mu=\bar{x}$ and $\sigma=s$. An example of using the normal distribution to analyse data of relevance to civil engineering is shown in the example box below.



#### The normal distribution and annual rainfall in Oxford
                
Observation of annual rainfall totals (in mm) from a rain gauge in Oxford are available for the period 1853 -2013, i.e. a total of 161 years.  The sample mean and standard deviation are estimated using Eqs.(3) and (4), and result in values of $\bar{x}=657.0$mm and $s=113.4$mm.  The Figure below shows a histogram of the annual rainfall totals. The y-axis has been scaled by dividing the number of observation in each interval with the total number of observations and the interval length (50mm in this case) to ensure that the area under the histogram equals 1. The pdf of the normal distribution (blue line) with the $\mu$ and $\sigma$ parameter values estimated above is also shown.


![](Figures/Oxford_hg_normal2.png)

The figure shows a good agreement between the shape of the histogram (red bars) and the fitted normal distribution (blue line). It is therefore reasonable to assume that the annual rainfall totals in Oxford can be described by a normal distribution. More formal tests can be made to further strengthen this conclusion, but this is beyond the scope of this note.  Having established a probabilistic model, we can now answer questions about the long-term behaviour of annual rainfall total in Oxford. For example, what is the probability of observing less than 400mm of rain in any one year?

$$\begin{align*}
P\left( X\leq 400mm\right) &= P\left( \frac{X-\mu}{\sigma} \leq \frac{400mm-\mu}{\sigma}\right) \\
&=P\left( Z\leq \frac{400mm-657.0mm}{113.4mm}\right) \\
&=P\left( Z\leq -2.27\right) = 0.0116
\end{align*}$$
               
where $Z$ is described by the standard normal distribution $N\sim N(0,1)$. Tabulated values of the standard normal distribution can be found in most statistical text books, or can be derived using the *NORMDIST* function in EXCEL as shown in the figure below.        

![](Figures/EXCEL_Normdist.png)

A slightly more complicated question could be what is the probability of observing annual rainfall between 400m and 800mm in any one year?

$$\begin{align*}
                P\left(400mm \leq X \leq 800mm\right) &= P\left( \frac{400mm-\mu}{\sigma} \leq \frac{X-\mu}{\sigma} \leq \frac{800mm-\mu}{\sigma}\right) \\
                &=P\left( \frac{400mm-657.0mm}{113.4mm} \leq Z \leq \frac{800mm-657.0mm}{113.4mm}\right) \\
                &=P\left( Z\leq 1.26\right)-P\left( Z\leq -2.27\right) \\
                &=0.8963 - 0.0116 = 0.8847
                \end{align*}$$
                
where the two probabilities related to the standard normal distribution can be obtained from either statistical tables or from numerical software such as EXCEL in the same way as shown above.

