In [9]:
import statsmodels.api as sm
import numpy as np
import pandas as pd
import scipy.stats.distributions as dist

# One Population Proportion
Research Question

In previous years 52% of parents believed that electronics and social media was the cause of their teenager’s lack of sleep. Do more parents today believe that their teenager’s lack of sleep is caused due to electronics and social media?

Population: Parents with a teenager (age 13-18)
Parameter of Interest: p
Null Hypothesis: p = 0.52
Alternative Hypthosis: p > 0.52 (note that this is a one-sided test)

1018 Parents

56% believe that their teenager’s lack of sleep is caused due to electronics and social media.


In [16]:
n = 1018
pnull = 0.52
phat = 0.56
sm.stats.proportions_ztest(phat * n, n, pnull, alternative='larger')

(2.571067795759113, 0.005069273865860533)

Out Z-test statistic is 2.5710 and p-value is 0.0050. We have enough evidence to reject the Null Hypothesis, because p-value is less than alpha(0.05). We accept Alternative Hypothesis: parents today believe that their teenager’s lack of sleep is caused due to electronics and social media.

# Difference in Population Proportions
Research Question

Is there a significant difference between the population proportions of parents of black children and parents of Hispanic children who report that their child has had some swimming lessons?

Populations: All parents of black children age 6-18 and all parents of Hispanic children age 6-18
Parameter of Interest: p1 - p2, where p1 = black and p2 = hispanic
Null Hypothesis: p1 - p2 = 0
Alternative Hypthosis: p1 - p2 ≠ 0

91 out of 247 (36.8%) sampled parents of black children report that their child has had some swimming lessons.

120 out of 308 (38.9%) sampled parents of Hispanic children report that their child has had some swimming lessons.

In [7]:
# Sample sizes
n1 = 247
n2 = 308

p1_hat = 91 / 247
p2_hat = 120 / 308
p_hat_total = (120 + 91) / (247 + 308)
best_est = p1_hat - p2_hat

Z_test_stats = best_est / np.sqrt(p_hat_total * (1 - p_hat_total) * (1/n1 + 1/n2))
Z_test_stats

-0.5110545335044571

In [12]:
# Calculate the  P-value
pvalue = 2*dist.norm.cdf(-np.abs(Z_test_stats)) # Multiplied by two indicates a two tailed testing.
print("Computed P-value is", pvalue)

Computed P-value is 0.6093128715165157


P-value is greater than 0.05, that means we do not have enough evidence to reject the Null Hypothesis.
We can not accept the Alternative Hypothesis, that there is significant difference between the population proportions of parents of black children and parents of Hispanic children who report that their child has had some swimming lessons

# One Population Mean
Research Question

Is the average cartwheel distance (in inches) for adults more than 80 inches?
Population: All adults
Parameter of Interest: μ, population mean cartwheel distance.
Null Hypothesis: μ = 80 
Alternative Hypthosis: μ > 80
25 Adults
μ=82.46
σ=15.06

In [13]:
df = pd.read_csv("C:/Users/eli/Desktop/Cartwheeldata.csv")
df.head()

Unnamed: 0,ID,Age,Gender,GenderGroup,Glasses,GlassesGroup,Height,Wingspan,CWDistance,Complete,CompleteGroup,Score
0,1,56,F,1,Y,1,62.0,61.0,79,Y,1,7
1,2,26,F,1,Y,1,62.0,60.0,70,Y,1,8
2,3,33,F,1,Y,1,66.0,64.0,85,Y,1,7
3,4,39,F,1,N,0,64.0,63.0,87,Y,1,10
4,5,27,M,2,N,0,73.0,75.0,72,N,0,4


In [14]:
n = len(df)
mean = df["CWDistance"].mean()
sd = df["CWDistance"].std()
(n, mean, sd)

(25, 82.48, 15.058552387264855)

In [15]:
sm.stats.ztest(df["CWDistance"], value = 80, alternative = "larger")

(0.8234523266982029, 0.20512540845395266)

P-value is greater than 0.05. We do not have enough evidence to accept Alternative Hypothesis that average cartwheel distance (in inches) for adults is more than 80 inches.

# Difference in Population Means
Research Question

Considering adults in the NHANES data, do males have a significantly higher mean Body Mass Index than females?

Population: Adults in the NHANES data.
Parameter of Interest: μ1−μ2
, Body Mass Index.
Null Hypothesis: μ1=μ2
Alternative Hypthosis: μ1≠μ2

2976 Females μ1=29.94

σ1=7.75

2759 Male Adults
μ2=28.78

σ2=6.25

μ1−μ2=1.16

In [17]:
url = "C:/Users/eli/Desktop/nhanes_2015_2016.csv"
da = pd.read_csv(url)
da.head()

Unnamed: 0,SEQN,ALQ101,ALQ110,ALQ130,SMQ020,RIAGENDR,RIDAGEYR,RIDRETH1,DMDCITZN,DMDEDUC2,...,BPXSY2,BPXDI2,BMXWT,BMXHT,BMXBMI,BMXLEG,BMXARML,BMXARMC,BMXWAIST,HIQ210
0,83732,1.0,,1.0,1,1,62,3,1.0,5.0,...,124.0,64.0,94.8,184.5,27.8,43.3,43.6,35.9,101.1,2.0
1,83733,1.0,,6.0,1,1,53,3,2.0,3.0,...,140.0,88.0,90.4,171.4,30.8,38.0,40.0,33.2,107.9,
2,83734,1.0,,,1,1,78,3,1.0,3.0,...,132.0,44.0,83.4,170.1,28.8,35.6,37.0,31.0,116.5,2.0
3,83735,2.0,1.0,1.0,2,2,56,3,1.0,5.0,...,134.0,68.0,109.8,160.9,42.4,38.5,37.7,38.3,110.1,2.0
4,83736,2.0,1.0,1.0,2,2,42,4,1.0,4.0,...,114.0,54.0,55.2,164.9,20.3,37.4,36.0,27.2,80.4,2.0


In [18]:
females = da[da["RIAGENDR"] == 2]
male = da[da["RIAGENDR"] == 1]

In [19]:
n1 = len(females)
mu1 = females["BMXBMI"].mean()
sd1 = females["BMXBMI"].std()

(n1, mu1, sd1)

(2976, 29.939945652173996, 7.75331880954568)

In [20]:
n2 = len(male)
mu2 = male["BMXBMI"].mean()
sd2 = male["BMXBMI"].std()

(n2, mu2, sd2)

(2759, 28.778072111846985, 6.252567616801485)

In [21]:
sm.stats.ztest(females["BMXBMI"].dropna(), male["BMXBMI"].dropna())

(6.1755933531383205, 6.591544431126401e-10)

P-value is very, very small. We can reject The Null Hypothesis and accpet Alternative Hypothesis. Yes,
there is enough evidence to accept that males have a significantly higher mean Body Mass Index than females.