# Examples: M248 Book A

-----

A collection of examples based on the topics covered by **Book A** of **M248: Analysing data**.

## Contents

### 1. General probability distributions

1. Probability mass function
2. [Probability density function](#1.1-Probability-density-functions)

### 2. Standard discrete distributions

1. Bernoulli distribution
2. [Binomial distribution](#2.2-Binomial-distribution)
3. Discrete uniform distribution
4. Geometric distribution
5. Poisson distribution

### 3. Standard continuous distributions

1. Continuous uniform distribution
2. Exponential distribution

### 4. Bernoulli and Poisson processes

1. Bernoulli process
2. [Poisson process](#4.2-Poisson-process)

### 5. Population quantiles

1. Population quantile of a continuous distribution
2. Population quantile of a discrete distribution

-----

In [1]:
from scipy import stats
from scipy.integrate import quad

## 1 Probability distributions

-----

### 1.1 Probability density functions

**Example 1.1.1.** Suppose that a random variable $X$ has range $(0,1)$ and that its p.d.f. is given by

$$f(x) = \frac{3}{4} (x^{2} + 1), \hspace{3mm} x \in (0,1).$$

What is the value of $P(1/4 \leq X \leq 1/2)$?

The probability is the solution to

$$
P(1/4 \leq X \leq 1/2)
  = \int_{1/4}^{1/2} f(x) \> dx
  = \int_{1/4}^{1/2} \frac{3}{4} (x^{2} + 1) \> dx
  = \cdots
$$

In [2]:
# define a new function
def pdf(x: float):
    return 0.75 * ((x ** 2) + 1)


# integrate the function using quad
quad(func=pdf, a=0.25, b=0.5)[0]  # select 0 index

0.21484375000000003

**Example 1.1.2.** Suppose that a random variable $X$ has range $(0,4)$ and that its p.d.f. is given by

$$f(x) = \frac{3}{40} (\sqrt{x} + 2), \hspace{3mm} x \in (0,4).$$

What is the value of $P(X \geq 1)$?

The probability is the solution to

$$
P(1/4 \leq X \leq 1/2)
  = \int_{1}^{4} f(x) \> dx
  = \int_{1}^{4} \frac{3}{40} (\sqrt{x} + 2) \> dx
  = \cdots
$$

In [3]:
# define a new function
def pdf(x: float):
    return (3/40) * ((x ** (0.5)) + 2)


# integrate the function using quad
quad(func=pdf, a=1, b=4)[0]  # select 0 index

0.8

**Example 1.1.3.** Suppose that a continuous variable $X$ has range $(-1,1)$. The following is a function $f(x)$ of $x$:

$$f(x) = 1 - x^{2}.$$

Is $f(x)$ a valid p.d.f. for $x$?

Check the properties of a valid p.d.f.

(1) $\int f(x) = 1$

(2) $f(x) > 0$


In [4]:
# define a new function
def pdf(x: float):
    return 1 - (x ** 2)


# integrate the function using quad
quad(func=pdf, a=-1, b=1)[0]  # select 0 index

1.3333333333333335

Therefore $f(x)$ is not a valid p.d.f.

But does a normalising constant, $k$, exist, such that $k \> f(x)$ would be a valid p.d.f.?

Solve the following for $k$

$$
\begin{aligned}
    1 &= k \int_{-1}^{1} (1 - x^{2}) \> dx \\
      &= k \bigg( \frac{4}{3} \bigg) \\
    k &= \frac{3}{4}.
\end{aligned}
$$

**Example 1.1.4.** Suppose that a continuous variable $X$ can only take values in the range $1$ to $4$. The following is a function $f(x)$ of $x$:

$$f(x) = (x - 7)^{2}.$$

Is $f(x)$ a valid p.d.f. for $x$?

Check the properties of a valid p.d.f.

(1) $\int f(x) = 1$

(2) $f(x) > 0$

In [5]:
# define a new function
def pdf(x: float):
    return (x - 7) ** 2


# integrate the function using quad
quad(func=pdf, a=1, b=4)[0]  # select 0 index

63.0

Therefore $f(x)$ is not a valid p.d.f.

But does a normalising constant, $k$, exist, such that $k \> f(x)$ would be a valid p.d.f.?

Solve the following for $k$

$$
\begin{aligned}
    1 &= k \int_{-1}^{1} (x - 7)^{2} \> dx \\
      &= 63k \\
    k &= \frac{1}{63}.
\end{aligned}
$$

**Example 1.1.5.** Suppose that a continuous variable $X$ can only take values in the range $0$ to $1$. The following is a function $f(x)$ of $x$:

$$f(x) = \frac{3}{4} (x^{2} + 1), \hspace{3mm} x \in (0,1).$$

What is the c.d.f. associated with $X$?

The c.d.f. of $X$, $F(x)$, is

$$
\begin{aligned}
F(x) = \int_{a}^{x} f(y) \> dy = \int_{0}^{x} \frac{3}{4} (x^{2} + 1) \> dy &= \frac{3}{4} \int_{0}^{x} (x^{2} + 1) \> dy \\
  &= \frac{3}{4} \bigg[ \frac{1}{3} y^{3} + y \bigg]_{0}^{x} \\
  &= \frac{3}{4} \bigg\{ \frac{1}{3} x^{3} + x - \bigg( 0 + 0 \bigg) \bigg\} \\
  &= \frac{1}{4} ( x^{3} + 3x ).
\end{aligned}
$$

**Example 1.1.6.** Suppose that a continuous variable $X$ with range $(0,3)$ has c.d.f. given by

$$F(x) = \frac{1}{27} x^{3}, \hspace{3mm} x \in (0,3).$$

What is $P(1 < X < 2)$?

The probability $P(1 < X < 2) = F(2) - F(1)$, so

$$
F(2) - F(1) = \cdots \\
$$

In [6]:
def cdf(x: float):
    return (1 / 27) * (x ** 3)

In [7]:
cdf(x=2) - cdf(x=1)

0.25925925925925924

**Example 1.1.7.** Suppose that a continuous variable $X$ with range $(0,1)$ has c.d.f. given by

$$F(x) = \frac{1}{5} x^{2} (2x+3), \hspace{3mm} x \in (0,1).$$

What are $P(X < 1/2)$ and $P(X \geq 1/2)$?

The $P(X < 1/2) = F(1/2)$, so

$$
F(1/2) = \frac{1}{5} \bigg(\frac{1}{2}\bigg)^{2} \bigg( 2 \bigg(\frac{1}{2}\bigg) + 3 \bigg) = \cdots
$$

In [8]:
def cdf(x: float):
    return (0.2) * (x ** 2) * ((2 * x) + 3)

In [9]:
cdf(x=0.5)

0.2

The $P(X \geq 1/4) = 1 - F(1/4)$, so

$$
1 - F(1/4) = 1 - \bigg\{ \frac{1}{5} \bigg(\frac{1}{4}\bigg)^{2} \bigg( 2 \bigg(\frac{1}{4}\bigg) + 3 \bigg) \bigg\} = \cdots
$$

In [10]:
1 - cdf(x=0.25)

0.95625

### 1.2 Probability mass functions

## 2 Standard discrete distributions

-----

### 2.1 Binomial distribution

**Example 2.1.1.** Suppose that a dicrete random variable $X$ is distributed $X \sim B(15, 0.25)$.

Calculate

**a.** $P(X = 5)$

**b.** $P(X \leq 7)$

**c.** $P(2 \leq X <9)$

**d.** $E(X)$

**e.** $S(X)$

In [12]:
# declare the distribution
x = stats.binom(n=15, p=0.25)

**a.** The $P(X = 5)$ is

$$
p(5) = \binom{15}{5} (0.25)^{5} (0.75)^{10} = \cdots
$$

In [13]:
x.pmf(k=5)

0.16514598112553383

**b.** The $P(X \leq 7)$ is

$$
F(7) = \sum_{k=0}^{7} \binom{15}{k} (0.25)^{k} (0.75)^{15-k} = \cdots
$$

In [14]:
x.cdf(x=7)

0.9827001616358757

**c.** The $P(2 \leq X < 9) = F(8) - F(2)$ is

$$
F(8) - F(2) =
\sum_{i=0}^{8} \binom{15}{i} (0.25)^{j} (0.75)^{15-i} -
\sum_{j=0}^{2} \binom{15}{j} (0.25)^{j} (0.75)^{15-j}
 = \cdots
$$

In [15]:
x.cdf(x=8) - x.cdf(x=2)

0.7597191743552685

**d.** The expected value of $X$ is

$$
E(X) = np = 15(0.25) = \cdots
$$

In [16]:
x.mean()

3.75

**e.** The standard deviation of $X$ is

$$
S(X) = \sqrt{np(1-p)} = \sqrt{15(0.25)(0.75)} = \cdots
$$

In [17]:
x.std()

1.6770509831248424

**Example 2.1.2.**

### 2.2 Discrete uniform distribution

### 2.3 Geometric distribution

### 2.4 Poisson distribution

## 3 Standard continuous distributions

-----

### 3.1 Continuous uniform distribution

### 3.2 Exponential distribution

## 4 Bernoulli and Poisson processes

-----

### 4.1 Bernoulli process

### 4.2 Poisson process

**Example 4.2.1 (June 2018).** 
The arrival of incoming emails at my office computer during working hours may be modelled by a Poisson process. On average, five emails arrive per hour.

Calculate the following probabilities

**a.** Exactly three emails arrive in an hour.

**b.** Fewer than three emails arrive in an hour.

**c.** The interval between two successive incoming emails is less than ten minutes.

and then

**d.** State the distribution that models of the number of emails arriving between 1.30 p.m. and 4.30 p.m.

State all calculations to 3dp

In [22]:
# rate of events
L = 5
# number of events
x = stats.poisson(mu=L)
# waiting time between events
t = stats.expon(loc=0, scale = 1/L)

**a.** The probability exactly three emails will arrive in an hour is

$$
p(3) = e^{-5} \bigg( \frac{5^{3}}{3!} \bigg) = \cdots
$$

In [23]:
round(x.pmf(k=3), 3)

0.14

**b.** The probability fewer than three emails arrive in an hour is $F(2)$, so

$$
F(3) = e^{-5} \sum_{k=0}^{2} \frac{5^{k}}{k!} = \cdots
$$

In [26]:
round(x.cdf(x=2), 3)

0.125

**c.** Let the continuous random variable $T$ represent the waiting time between emails, where $T \sim M(3)$.
Ten minutes is equivalent to $1/6$ hours, so $P(T < 1/6)$ is

$$
F\bigg( \frac{1}{6} \bigg) = 1 - e^{-5 \times \frac{1}{6}}
$$

In [28]:
round(t.cdf(x=1/6), 3)

0.565

**d.** If $X$ is a random variable that models the number of emails received in 1 hour, where $X \sim \text{Poisson}(3)$, then let $Y$ be a random variable that models the number of emails received in $t$ hours, so $Y \sim \text{Poisson}(5t)$.

If we set $t=3$ (hours), then $Y \sim \text{Poisson}(15)$.

## 5 Population quantiles

-----

### 5.1 Population quantile of a continuous distribution

### 5.2 Population quantile of a discrete distribution