# Exercises from Think Stats, 2nd Edition

http://thinkstats2.com

Copyright 2016 Allen B. Downey

MIT License: https://opensource.org/licenses/MIT


In [1]:
from __future__ import print_function, division

%matplotlib inline

import numpy as np
import pandas as pd

import random

import thinkstats2
import thinkplot
from normal import Normal

## Analytic methods

If we know the parameters of the sampling distribution, we can compute confidence intervals and p-values analytically, which is computationally faster than resampling.

## Exercises

Exercise: In a recent paper, Stein et al. investigate the effects of an intervention intended to mitigate gender-stereotypical task allocation within student engineering teams.

Before and after the intervention, students responded to a survey that asked them to rate their contribution to each aspect of class projects on a 7-point scale.

Before the intervention, male students reported higher scores for the programming aspect of the project than female students; on average men reported a score of 3.57 with standard error 0.28. Women reported 1.91, on average, with standard error 0.32.

Compute the sampling distribution of the gender gap (the difference in means), and test whether it is statistically significant. Because you are given standard errors for the estimated means, you don’t need to know the sample size to figure out the sampling distributions.

After the intervention, the gender gap was smaller: the average score for men was 3.44 (SE 0.16); the average score for women was 3.18 (SE 0.16). Again, compute the sampling distribution of the gender gap and test it.

Finally, estimate the change in gender gap; what is the sampling distribution of this change, and is it statistically significant?

In [2]:
male_before = Normal(3.57, 0.28**2)
male_after = Normal(3.44, 0.16**2)

female_before = Normal(1.91, 0.32**2)
female_after = Normal(3.18, 0.16**2)

In [3]:
diff_before = female_before - male_before
print('mean, p-value', diff_before.mu, 1-diff_before.Prob(0))
print('CI', diff_before.Percentile(5), diff_before.Percentile(95))
print('stderr', diff_before.sigma)

mean, p-value -1.66 4.7309532320793934e-05
CI -2.3594013558039837 -0.9605986441960161
stderr 0.425205832509386


In [4]:
diff_after = female_after - male_after
print('mean, p-value', diff_after.mu, 1-diff_after.Prob(0))
print('CI', diff_after.Percentile(5), diff_after.Percentile(95))
print('stderr', diff_after.sigma)

mean, p-value -0.2599999999999998 0.1252679872068192
CI -0.6321878891765356 0.11218788917653583
stderr 0.2262741699796952


In [5]:
diff = diff_after - diff_before
print('mean, p-value', diff.mu, diff.Prob(0))
print('CI', diff.Percentile(5), diff.Percentile(95))
print('stderr', diff.sigma)

mean, p-value 1.4000000000000001 0.0018269483689769925
CI 0.6077335793117721 2.192266420688228
stderr 0.48166378315169184


In [None]:
# 1) Gender gap before intervention was 1.66 points (p-value 5e-5)
# 2) Genger gap after was 0.26 points (p-value 0.13, not significant)
# 3) Change in gender gap was 1.4 points (p-value 0.002, significant).