# Grade Inflation: Analysis

## Meta

## Plan

1. Sample the DataFrame.
2. Calculate $\mu_{X}$ and $\sigma_{X}$.
3. Declare the sampling distribution of the mean.
4. Calculate $\mu_{F}$.
4. Calculate the 95% and 99% confidence intervals for $\mu_{X}$.
5. Test if $\mu_{F} \in CIs$.

-----

In [1]:
from scipy import stats

In [2]:
# load the df
%store -r df_population
df_population.head()

Unnamed: 0,UKPRN,PROVIDER,COUNT_2014,COUNT_2018,FIRSTS_2014,FIRSTS_2018,PROP 2014,PROP 2018
0,10007811,Bishop Grosseteste University,500,455,60,100,0.12,0.21978
5,10001883,De Montfort University,4005,6070,865,1835,0.21598,0.302306
10,10004113,Loughborough University,2875,3300,740,960,0.257391,0.290909
15,10004797,The Nottingham Trent University,5185,6305,1110,1100,0.214079,0.174465
20,10007796,The University of Leicester,2880,3415,535,815,0.185764,0.238653


## Sample the data

In [3]:
# take a 33% random sample of the dataframe
df_population["SAMPLE"] = stats.bernoulli.rvs(p=0.33, size=df_population["UKPRN"].size)

In [4]:
# select only successful trials
df_sample = df_population.query('SAMPLE == 1')

In [5]:
# get the sample size
n = df_sample["UKPRN"].size

In [6]:
# preview the sample
df_sample.head()

Unnamed: 0,UKPRN,PROVIDER,COUNT_2014,COUNT_2018,FIRSTS_2014,FIRSTS_2018,PROP 2014,PROP 2018,SAMPLE
0,10007811,Bishop Grosseteste University,500,455,60,100,0.12,0.21978,1
10,10004113,Loughborough University,2875,3300,740,960,0.257391,0.290909,1
30,10007138,The University of Northampton,2810,2670,485,620,0.172598,0.23221,1
65,10007789,The University of East Anglia,2870,3600,840,1135,0.292683,0.315278,1
80,10007147,University of Hertfordshire,4470,4755,875,1150,0.195749,0.241851,1


## Sampling Distribution of the Mean

Let $X$ represent the proportion of UG degrees awarded in 2014 by a UK HEI that were of classification First.

$\mu_{X}$ is

In [7]:
mean_2014 = round(df_population["PROP 2014"].mean(), 3)
mean_2014

0.189

and $\sigma_{X}$ is

In [8]:
std_2014 = round(df_population["PROP 2014"].std(), 4)
std_2014

0.086

Sample size:

In [9]:
n

54

So, by the **Central Limit Theorem**, the mean $\mu_{X}$ of large $n$ independent random variables each with mean $\mu$ and standard deviation $\sigma$ has a distribution

$$
\mu_{X} \sim N \bigg( \mu_{X}, \frac{\sigma_{X}}{\sqrt{n}} \bigg).
$$

Therefore, $\mu_{X}$ is

In [10]:
mean_2014

0.189

And $\sigma_{X}/ \sqrt{n}$ is the standard error of the mean

In [11]:
std_err = round(std_2014 / (n ** 0.5), 4)
std_err

0.0117

Therefore, $\mu_{X}$ is approximately distributed

In [12]:
print(("N(" + str(mean_2014) + ", "
       + str(std_err) + ")"))

N(0.189, 0.0117)


In [13]:
# declare the sampling distribution of the mean
x = stats.norm(loc=mean_2014, scale=std_err)

## Calculate 95% and 99% confidence intervals for $\mu_{X}$

The 95% and 99% confidence intervals for $\mu_{X}$ are....

In [14]:
ci_95 = [round(x.interval(alpha=0.95)[0], 3)] + [round(x.interval(alpha=0.95)[1], 3)]
ci_95

[0.166, 0.212]

In [15]:
ci_99 = [round(x.interval(alpha=0.99)[0], 3)] + [round(x.interval(alpha=0.99)[1], 3)]
ci_99

[0.159, 0.219]

## Testing the hypothesis

Let $F$ represent the proportion of UG degrees awarded in 2018 by a UK HEI that were of classification First.
From the sample we generated above, $\mu_{F}$ is....

In [16]:
mean_2018 = round(df_sample["PROP 2018"].mean(), 3)
mean_2018

0.246

Let us calculate the probability $P(\mu_{X} \leq \mu_{F} \text{ or } \mu_{X} \geq \mu_{F})$....

In [17]:
Pr = round(2 * (1 - x.cdf(x=mean_2018)),6)
Pr

1e-06

Is $\mu_{F} \in 95\% \> CI?$

In [18]:
print("Test outcome:", mean_2018 >= ci_95[0] and mean_2018 <= ci_95[1])

Test outcome: False


Is $\mu_{F} \in 99\% \> CI?$

In [19]:
print("Test outcome:", mean_2018 >= ci_99[0] and mean_2018 <= ci_99[1])

Test outcome: False
