# Examples: M248 Book A

-----

## Preamble

A collection of examples based on the topics covered by **Book A** of **M248: Analysing data**.

### Contents

#### 1. Probability distributions

1. Probability mass function
2. [Probability density function](#1.1-Probability-density-functions)

#### 2. Standard discrete distributions

1. [Binomial distribution](#2.1-Binomial-distribution)
2. [Discrete uniform distribution](#2.2-Discrete-uniform-distribution)
3. [Geometric distribution](#2.3-Geometric-distribution)
4. [Poisson distribution](#2.4-Poisson-distribution)

#### 3. Standard continuous distributions

1. Continuous uniform distribution
2. Exponential distribution

#### 4. Bernoulli and Poisson processes

1. Bernoulli process
2. [Poisson process](#4.2-Poisson-process)

#### 5. Population quantiles

1. Population quantile of a continuous distribution
2. Population quantile of a discrete distribution

-----

In [1]:
from scipy import stats
from scipy.integrate import quad

## 1 Probability distributions

-----

### 1.1 Probability density functions

-----

#### Example 1.1.1

Suppose that a random variable $X$ has range $(0,1)$ and that its p.d.f. is given by

$$f(x) = \frac{3}{4} (x^{2} + 1), \hspace{3mm} x \in (0,1).$$

What is the value of $P(1/4 \leq X \leq 1/2)$?

$$
P(1/4 \leq X \leq 1/2)
  = \int_{1/4}^{1/2} \frac{3}{4} (x^{2} + 1) \> dx
  = \cdots
$$

In [2]:
# define a new function
def pdf(x: float):
    return 0.75 * ((x ** 2) + 1)


# integrate the function using quad
quad(func=pdf, a=0.25, b=0.5)[0]

0.21484375000000003

#### Example 1.1.2

Suppose that a random variable $X$ has range $(0,4)$ and that its p.d.f. is given by

$$f(x) = \frac{3}{40} (\sqrt{x} + 2), \hspace{3mm} x \in (0,4).$$

What is the value of $P(X \geq 1)$?

$$
P(X \geq 1) = P(1 \leq X \leq 4)
  = \int_{1}^{4} \frac{3}{40} (\sqrt{x} + 2) \> dx
  = \cdots
$$

In [3]:
# define a new function
def pdf(x: float):
    return (3/40) * ((x ** (0.5)) + 2)


# integrate the function using quad
quad(func=pdf, a=1, b=4)[0]

0.8

#### Example 1.1.3

Suppose that a continuous variable $X$ has range $(-1,1)$. The following is a function $f(x)$ of $x$:

$$f(x) = 1 - x^{2}.$$

Is $f(x)$ a valid p.d.f. for $x$?

Check the properties of a valid p.d.f.

(1) $\int f(x) = 1$

(2) $f(x) > 0$

In [4]:
# define a new function
def pdf(x: float):
    return 1 - (x ** 2)


# integrate the function using quad
quad(func=pdf, a=-1, b=1)[0]  # select 0 index

1.3333333333333335

Therefore $f(x)$ is not a valid p.d.f.
But does a normalising constant, $k$, exist, such that $k \> f(x)$ would be a valid p.d.f.?

Solve the following for $k$

$$
\begin{aligned}
    1 &= k \int_{-1}^{1} (1 - x^{2}) \> dx \\
      &= k \bigg( \frac{4}{3} \bigg) \\
    k &= \frac{3}{4}.
\end{aligned}
$$

Hence, $f(x)$ is not a valid p.d.f., but $\frac{3}{4} f(x)$ is valid.

#### Example 1.1.4

Suppose that a continuous variable $X$ can only take values in the range $0$ to $1$. The following is a function $f(x)$ of $x$:

$$f(x) = \frac{3}{4} (x^{2} + 1), \hspace{3mm} x \in (0,1).$$

What is the c.d.f. associated with $X$?

$$
\begin{aligned}
F(x) = \int_{a}^{x} f(y) \> dy = \int_{0}^{x} \frac{3}{4} (x^{2} + 1) \> dy &= \frac{3}{4} \int_{0}^{x} (x^{2} + 1) \> dy \\
  &= \frac{3}{4} \bigg[ \frac{1}{3} y^{3} + y \bigg]_{0}^{x} \\
  &= \frac{3}{4} \bigg\{ \frac{1}{3} x^{3} + x - \bigg( 0 + 0 \bigg) \bigg\} \\
  &= \frac{1}{4} ( x^{3} + 3x ).
\end{aligned}
$$

### 1.2 Probability mass functions

-----

## 2 Standard discrete distributions

-----

### 2.1 Binomial distribution

-----

#### Example 2.1.1

Suppose that a dicrete random variable $X$ is distributed $X \sim B(15, 0.25)$.

Calculate

**(a)** $P(X = 5)$

**(b)** $P(X \leq 7)$

**(c)** $P(2 \leq X <9)$

**(d)** $E(X)$

**(e)** $S(X)$

In [30]:
# declare the distribution
b = stats.binom(n=15, p=0.25)

**(a)** The p.m.f. for a **binomial distribution** is

$$
p(x) = \binom{n}{x} p^{x} (1-p)^{n-x}.
$$

So for $P(X=5) = p(5) = \cdots$

In [31]:
b.pmf(k=5)

0.16514598112553383

**(b)** The probability $P(X \leq x) = F(x)$ for a **binomial distribution** is 

$$
F(x) = \sum_{k=x} \binom{n}{k} p^{k} (1-p)^{n-k}.
$$

So for $P(X \leq 7) = F(7) = \cdots$

In [32]:
b.cdf(x=7)

0.9827001616358757

**(c)** The probability $P(2 \leq X < 9) = F(8) - F(2) = \cdots$

In [33]:
b.cdf(x=8) - x.cdf(x=2)

0.40540698553472754

**(d)** The expected value of a **binomial distribution**, $E(X)$, is

$$
E(X) = np = \cdots
$$

In [34]:
b.mean()

3.75

**(e)** The standard deviation of a binomial distribution, $S(X)$, is

$$
S(X) = \sqrt{V(X)} = \sqrt{np(1-p)} = \cdots
$$

In [35]:
b.std()

1.6770509831248424

### 2.2 Discrete uniform distribution

-----

#### Example 2.2.1

Suppose that $X$ has a discrete uniform distribution, where the range of $X$ is $101, 102, \ldots, 500$.

Calculate

**(a)** $P(X = 109)$

**(b)** $P(X > 246)$

**(c)** $P(361 < X < 420)$

**(d)** $E(X)$

**(e)** $S(X)$

In [60]:
# declare the distribution
u = stats.randint(low=101, high=501)

**(a)** The p.m.f. for a **discrete uniform distribution** is

$$
p(x) = \frac{1}{n - m + 1}.
$$

So for $P(X=109) = p(109) = \cdots$

In [61]:
u.pmf(k=109)

0.0025

**(b)** The probability $P(X \leq x) = F(x)$ for a **discrete uniform distribution** is 

$$
F(x) = \frac{x-m+1}{n-m+1}.
$$

So for $P(X > 246) = 1 - F(245) = \cdots$

In [62]:
1 - u.cdf(x=245)

0.6375

**(c)** The probability $P(361 < X < 420) = F(419) - F(361) = \cdots$

In [64]:
round(u.cdf(x=419) - u.cdf(x=361), 3)

0.145

**(d)** The expected value for a **discrete uniform distribution**, $E(X)$, is

$$
E(X) = \frac{n+m}{2} = \cdots
$$

In [65]:
u.mean()

300.5

**(e)** The variance for a **discrete distribution**, $V(X)$, is

$$
V(X) = \frac{1}{12} \> (n-m)(n-m+2) = \cdots
$$

In [66]:
u.var()

13333.25

### 2.3 Geometric distribution

-----

#### Example 2.3.1

Suppose that a dicrete random variable $X$ is distributed $X \sim G(0.36)$.

Calculate

**(a)** $P(X = 3)$

**(b)** $P(X \leq 4)$

**(c)** $P(1 < X < 5)$

**(d)** $E(X)$

**(e)** $V(X)$

In [36]:
# declare the distribution
g = stats.geom(0.36)

**(a)** The p.m.f. for a **geometric distribution** is

$$
p(x) = (1 - p)^{x} \> p.
$$

So for $P(X=3) = p(3) = \cdots$

In [37]:
g.pmf(k=3)

0.147456

**(b)** The probability $P(X \leq x) = F(x)$ for a **geometric distribution** is 

$$
F(x) = 1 - (1-p)^{x}.
$$

So for $P(X \leq 4) = F(4) = \cdots$

In [38]:
g.cdf(x=4)

0.8322278399999999

**(c)** The probability $P(1 < X < 5) = F(4) - F(2) = \cdots$

In [39]:
g.cdf(x=4) - g.cdf(x=2)

0.2418278399999999

**(d)** The expected value for a **geometric distribution**, $E(X)$, is

$$
E(X) = \frac{1}{p} = \cdots
$$

In [40]:
g.mean()

2.7777777777777777

**(e)** The variance for a **geometric distribution**, $V(X)$, is

$$
V(X) = \frac{1-p}{p^{2}} = \cdots
$$

In [41]:
g.var()

4.938271604938272

### 2.4 Poisson distribution

-----

#### Example 2.3.1

Suppose that a dicrete random variable $X$ is distributed $X \sim \text{Poisson}(3.3)$.

Calculate

**(a)** $P(X = 5)$

**(b)** $P(X \leq 2)$

**(c)** $P(X > 4)$

Note, the mean, variance, and standard deviation of a Poisson distribution with parameter $\lambda$ are

$$E(X) = \lambda; \hspace{2mm} V(X) = \lambda; \hspace{2mm} S(X) = \sqrt{\lambda}.$$

In [67]:
pois = stats.poisson(mu=3.3)

**(a)** The p.m.f. of a **Poisson distribution** is

$$
p(x) = e^{-\lambda} \bigg( \frac{\lambda^{x}}{x!} \bigg)
$$

So for $P(X=5) = p(5) = \cdots$

In [68]:
pois.pmf(k=5)

0.12028643761102643

**(b)** The probability $P(X \leq x) = F(x)$ of a **Poisson distribution** is 

$$
F(x) = e^{-\lambda} \sum_{x} \frac{\lambda^{x}}{x!}.
$$

So for $P(X \leq 2) = F(2) = \cdots$

In [69]:
pois.cdf(x=2)

0.3594264663250839

**(c)** The probability $P(X > 4) = 1 - F(4) = \cdots$

In [70]:
1 - pois.cdf(x=4)

0.23740962432666435

## 3 Standard continuous distributions

-----

### 3.1 Continuous uniform distribution

-----

### 3.2 Exponential distribution

-----

## 4 Bernoulli and Poisson processes

-----

### 4.1 Bernoulli process

-----

### 4.2 Poisson process

-----

#### Example 4.2.1 (June 2018)

The arrival of incoming emails at my office computer during working hours may be modelled by a Poisson process. On average, five emails arrive per hour.

**(a)** Calculate the probability that exactly three emails arrive in an hour.

**(b)** Calculate the probability that fewer than three emails arrive in an hour.

**(c)** Calculate the probability that the interval between two successive incoming emails is less than ten minutes.

**(d)** State the distribution that models the receipt of emails between 1.30pm and 4.30pm on a typical day.


In [25]:
# number of events
x = stats.poisson(mu=5)

# waiting time between events
t = stats.expon(loc=0, scale = 1/5)

**(a)** The probability exactly three emails will arrive in an hour is

$$
p(3) = e^{-5} \bigg( \frac{5^{3}}{3!} \bigg) = \cdots
$$

In [26]:
round(x.pmf(k=3), 3)

0.14

**(b)** The probability fewer than three emails arrive in an hour is $F(2)$, so

$$
F(3) = e^{-5} \sum_{k=0}^{2} \frac{5^{k}}{k!} = \cdots
$$

In [27]:
round(x.cdf(x=2), 3)

0.125

**(c)** Let the continuous random variable $T$ represent the waiting time between emails, where $T \sim M(3)$.
Ten minutes is equivalent to $1/6$ hours, so $P(T < 1/6)$ is

$$
F\bigg( \frac{1}{6} \bigg) = 1 - e^{-5 \times \frac{1}{6}}
$$

In [28]:
round(t.cdf(x=1/6), 3)

0.565

**(d)** If $X$ is a random variable that models the number of emails received in 1 hour, where $X \sim \text{Poisson}(3)$, then let $Y$ be a random variable that models the number of emails received in $t$ hours, so $Y \sim \text{Poisson}(5t)$.

If we set $t=3$ (hours), then $Y \sim \text{Poisson}(15)$.

#### Example 4.2.2

Suppose that alpha particles are emitted from a radioactive source at random at an average rate of $0.5$ every second.
Assume that the emissions of alpha particles may be modelled by a Poisson process.

Calculate the probability that

**(a)** there are no emissions of alpha particles in a five-second interval.

**(b)** the waiting time between emissions of alpha particles exceeds ten seconds.

**(a)** Alpha particle emissions may be modelled by a Poisson process with rate $\lambda = 0.5$ per second. 
So, $X$, the number of emissions in a five-second interval, has a Poisson distribution with parameter

$$
\lambda t = 0.5(5) = 2.5.
$$

The probability that there are no alpha particle emissions in a five-second interval will be

$$
p(3) = e^{-2.5} \bigg( \frac{2.5^{0}}{0!} \bigg) = \cdots
$$

In [18]:
# number of events
x = stats.poisson(mu=2.5)

round(x.pmf(k=0), 3)

0.082

**(b)** The waiting time (in seconds) between successive alpha particle emissions, $T$, has an exponential distribution with parameter $\lambda = 0.5$.
So, the probability that the waiting time between alpha particle emissions exceeds ten seconds is given by

$$
P(T > 10) = 1 - P(T < 10) = 1 - F(10) = 1 - \{1 - e^{-5}\}= \cdots
$$

In [24]:
# waiting time between events
t = stats.expon(loc=0, scale = 1/0.5)

round(1 - t.cdf(x=10), 3)

0.007

## 5 Population quantiles

-----

### 5.1 Population quantile of a continuous distribution

### 5.2 Population quantile of a discrete distribution