Question 1: Suppose we have two samples $X_1,\cdots X_n$ and $Y_1,\cdots, Y_n$ collected from two distributions, one with mean $\mu_1$ and the other with mean $\mu_2$. 

Let $\delta=\mu_1 - \mu_2$ so that $H_0$ is $\delta=0$ and $H_1$ is $\delta\neq 0$.  

Our test statistic is $\hat{\theta} = \bar{X} - \bar{Y}$.

a) Compute the Wald test statistic $W=\frac{\hat\theta-\theta_0}{\hat{\rm{se}}}$. 

In [2]:
import numpy as np

mu_x = 0
sigma_x = 1
m = 30
dist_x = np.random.normal(mu_x, sigma_x, m)

mu_y = 0.2
sigma_y = 1
n = 20
dist_y = np.random.normal(mu_y, sigma_y, n)

b) On the board we computed the $p$-value of the Wald test. Compute this p-value for the observed value of the test statistic W computed above.

In [13]:
X_bar = np.mean(dist_x)
Y_bar = np.mean(dist_y)

s_x_sq = 1/(m-1) * np.sum([(x-X_bar)**2 for x in dist_x])
s_y_sq = 1/(n-1) * np.sum([(y-Y_bar)**2 for y in dist_y])

delta_hat = X_bar - Y_bar
se = np.sqrt(s_x_sq/m + s_y_sq/n)

W = delta_hat/se
W

-0.902908199705452

In [15]:
from scipy.stats import norm
p = 2*(norm.cdf(-abs(W)))
p

0.3665746184185954

c) What happens to our $p$-value as we move the second distribution from $\mu_2=-10$ to $\mu_2=10$ and recompute it? Plot a graph. 

In [16]:
p_vals = []

for mu_y in range(-10, 10):
    mu_x = 0
    sigma_x = 1
    m = 30
    dist_x = np.random.normal(mu_x, sigma_x, m)

    sigma_y = 1
    n = 20
    dist_y = np.random.normal(mu_y, sigma_y, n)

    X_bar = np.mean(dist_x)
    Y_bar = np.mean(dist_y)

    s_x_sq = 1/(m-1) * np.sum([(x-X_bar)**2 for x in dist_x])
    s_y_sq = 1/(n-1) * np.sum([(y-Y_bar)**2 for y in dist_y])

    delta_hat = X_bar - Y_bar
    se = np.sqrt(s_x_sq/m + s_y_sq/n)

    W = delta_hat/se
    p = 2*(norm.cdf(-abs(W)))
    p_vals.append(p)

# Plot p_vals

Question 2: Under the null hypothesis, can we see that the $p$-values of the Wald test are $\rm{Uniform}(0,1)$? 

In [17]:
Ws = []
ps = []
for _ in range(10000):
    mu_x = 0
    sigma_x = 1
    m = 30
    dist_x = np.random.normal(mu_x, sigma_x, m)

    mu_y = 0.2
    sigma_y = 1
    n = 20
    dist_y = np.random.normal(mu_y, sigma_y, n)

    X_bar = np.mean(dist_x)
    Y_bar = np.mean(dist_y)

    s_x_sq = 1/(m-1) * np.sum([(x-X_bar)**2 for x in dist_x])
    s_y_sq = 1/(n-1) * np.sum([(y-Y_bar)**2 for y in dist_y])

    delta_hat = X_bar - Y_bar
    se = np.sqrt(s_x_sq/m + s_y_sq/n)

    W = delta_hat/se
    Ws.append(W)
    p = 2*(norm.cdf(-abs(W)))
    ps.append(p)
    
# Plot

Question 3: In general, can we see that a cdf of a continuous distribution is uniformly distributed?