## Sumary

Perform a one-sample $z$-test of a population mean and a two-sample $z$-test of the difference between two population means.

Data on the mean pass rate across all UK test centres during the period from April 2014 to March 2015 was obtained and analysed using an approximate normal model.
(Data were taken from the Open University, who did not provide the primary source.)

Two two-sided $z$-tests were performed:

1. A one-sample $z$-test of the null hypothesis that the mean total pass rate for the UK practical driving test in 2014/15 was the same as the 2013/2014 (which was given as 47.1%).[^1]
2. A two-sample $z$-test of the null hypothesis that the mean total pass rate of females for the UK practical driving test in 2014/15 was the same as that of males.[^1]

Normality of the three data were checked using normal probability plots.[^1]

General workflow:

1. Load the data
1. Describe the data
1. Plot the data
1. Get an interval estimate
1. Check the normality of the data
1. Perform the hypothesis test

These topics were covered in M248, Units 8 and 9.

## Dependencies

In [None]:
import pandas as pd
from scipy import stats as st
from statsmodels.stats import weightstats as ws
from matplotlib import pyplot as plt
import seaborn as sns

In [None]:
sns.set_theme()

## Constants

In [None]:
URL = ('https://raw.githubusercontent.com/ljk233/laughingrook-datasets'
       + '/main/uk_prac_driving_tests/pass_rates.csv')

## Main

### Load data

In [None]:
pass_rates = pd.read_csv(URL)
pass_rates.info()

### Test 1: Was the mean total pass rate in 2014/15 equal to that in 2013/14?

Here we test the hypotheses:

$$
H_{0}: \mu_{2014} = \mu_{2013};
\hspace{3mm} H_{1}: \mu_{2014} \ne \mu_{2013},
$$

where $\mu_{2013}=$ 47.1%.

Describe the total pass rate.

In [None]:
pass_rates['total'].describe()

Inititialise an instance of `DescrStatsW`.

In [None]:
d = ws.DescrStatsW(pass_rates['total'])

Plot the distribution of total pass rates in 2014/15.

In [None]:
_g = sns.displot(
            x=d.data,
            kind='hist',
            kde=True,
            stat='density',
            aspect=2
)

Return an interval estimate of the mean total pass rate.

In [None]:
pd.Series(data=d.zconfint_mean(), index=['lcb', 'ucb']).round(6)

Check the normality of the data.

In [None]:
_f, _ax = plt.subplots(figsize=(11.8, 6))
_res = st.probplot(x=d.data, plot=_ax)

Perform the one-sample $z$-test.

In [None]:
pd.Series(data=d.ztest_mean(value=47.1), index=['zstat', 'pvalue']).round(6)

### Test 2: Was the mean pass rate of females equal to that of males?

Here we test the hypotheses:

$$
H_{0}: \mu_{f} = \mu_{m};
\hspace{3mm} H_{1}: \mu_{f} \ne \mu_{m}.
$$

Describe the data.

In [None]:
pass_rates[['female', 'male']].describe().T

Initialise instance of `CompareMeans`.

In [None]:
cm = ws.CompareMeans(
    ws.DescrStatsW(pass_rates['female']),
    ws.DescrStatsW(pass_rates['male'])
)

Return interval estimates of the mean female and male pass rates.

In [None]:
pd.DataFrame(
    data=[cm.d1.zconfint_mean(), cm.d2.zconfint_mean()],
    columns=['lcb', 'ucb'],
    index=['female', 'male']
)

Plot the distributions of the pass rates.

In [None]:
_g = sns.displot(
            data=pass_rates[['female', 'male']].melt(),
            x='value',
            kind='hist',
            col='variable',
            hue='variable',
            legend=False,
            kde=True,
            stat='density'
)

Check the normality of both data.

In [None]:
_f, _axs = plt.subplots(1, 2, figsize=(11.8, 6), sharey=True)
st.probplot(x=cm.d1.data, plot=_axs[0])
st.probplot(x=cm.d2.data, plot=_axs[1])
_f.suptitle('Probability Plots', fontsize=16)
_axs[0].set_title('female')
_axs[1].set_title('male')
plt.show()

Perform the two-sample $z$-test.

In [None]:
pd.Series(data=cm.ztest_ind(), index=['zstat', 'pvalue']).round(6)

## Footnotes

[^1]: [statsmodels.stats.weightstats.DescrStatsW](https://www.statsmodels.org/devel/generated/statsmodels.stats.weightstats.DescrStatsW.html)
[^1]: [statsmodels.stats.weightstats.CompareMeans](https://www.statsmodels.org/devel/generated/statsmodels.stats.weightstats.CompareMeans.html)
[^1]: [scipy.stats.probplot](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.probplot.html)