# Statistical Significance

In [1]:
import math
import scipy.stats as st

# Problem Definition

As the dean Univ. X receives a report showing that students at that Univ. get an average of 6.80 hours of sleep per night compared to the national college average of 7.02 hours. Has the dean to worry about the report? i.e. Is the study statistically significant?

1. Alternative hypotesis: The average amount of sleep (6.90 hours) by students at Univ. X is below the national average for college students (7.02 hours).

2. Null hypotesis: The average amount of sleep by students at Univ. X is NOT below the national average for college students.

3. Other data:

    3.1 Standard deviation Univ X -> 0.84

    3.2 Sample size Univ X -> 202


Alpha or reference p-value: 0.05

* Adapted form https://towardsdatascience.com/statistical-significance-hypothesis-testing-the-normal-curve-and-p-values-93274fa32687

In [2]:
college_mean_sleep_hours = 7.02
univ_x_mean_sleep_hours = 6.90
std = 0.84
sample_size = 202

alpha = 0.05

In [3]:
z_score = (univ_x_mean_sleep_hours - college_mean_sleep_hours)/(std/math.sqrt(sample_size))

In [4]:
z_score

-2.0303814862216862

In [5]:
p_value = st.norm.cdf(z_score)

In [6]:
p_value

0.021158888530490534

As the p-value obtained (0.021) is less than the reference p-value chosen (0.05) we can reject the null hypotesis.

This means that there's a 2.1% chance that the results of the study occurred because of random noise.

There's then a correlation between students at Univ. X. and less average sleep, but NOT causation! So with that study nobody can argue that going to Univ. X causes a decrease of sleep.

# Experiments Varying Input Data (Std and Samples)

In [7]:
college_mean_sleep_hours = [7.02, 7.02, 7.02, 7.02, 7.02, 7.02, 7.02, 7.02]
univ_x_mean_sleep_hours = [6.90, 6.90, 6.90, 6.90, 6.90, 6.90, 6.90, 6.90]
std = [0.84, 0.84, 1., 1., 1.05, 1.05, 2., 2., ]
sample_size = [202, 2020, 202, 2020, 202, 2020, 202, 2020]

n_of_experiments = len(college_mean_sleep_hours)

In [8]:
for exp_i in range(n_of_experiments):
    print("========= Exp {} =========".format(exp_i))
    z_score = (univ_x_mean_sleep_hours[exp_i] - college_mean_sleep_hours[exp_i])/(std[exp_i]/math.sqrt(sample_size[exp_i]))
    p_value = st.norm.cdf(z_score)
    print("The p-value is {} for a z-score of {}".format(p_value, z_score))
    print("Can we reject the null hypotesis? {}".format("Yes" if p_value < alpha else "No"))

The p-value is 0.021158888530490534 for a z-score of -2.0303814862216862
Can we reject the null hypotesis? Yes
The p-value is 6.785575208355853e-11 for a z-score of -6.42063001549831
Can we reject the null hypotesis? Yes
The p-value is 0.04404870094441166 for a z-score of -1.7055204484262165
Can we reject the null hypotesis? Yes
The p-value is 3.4582031853739905e-08 for a z-score of -5.393329213018581
Can we reject the null hypotesis? Yes
The p-value is 0.05215534517706249 for a z-score of -1.6243051889773488
Can we reject the null hypotesis? No
The p-value is 1.3994818872331327e-07 for a z-score of -5.136504012398647
Can we reject the null hypotesis? Yes
The p-value is 0.19689614289492519 for a z-score of -0.8527602242131083
Can we reject the null hypotesis? No
The p-value is 0.003501888634477499 for a z-score of -2.6966646065092905
Can we reject the null hypotesis? Yes
