## Tests - Anova, Kruskal-Wallis, and Mann Whitney U

ANOVA: Tests whether the means of two or more independent samples are significantly different.

Assumptions

Observations in each sample are independent and identically distributed (iid). Observations in each sample are normally distributed. Observations in each sample have the same variance. Interpretation

H0: the means of the samples are equal. H1: one or more of the means of the samples are unequal.

In [6]:
from scipy.stats import f_oneway
data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169]
data3 = [-0.208, 0.696, 0.928, -1.148, -0.213, 0.229, 0.137, 0.269, -0.870, -1.204]
stat, p = f_oneway(data1, data2, data3)
print('stat:', round(stat, 3))
print('p:', round(p, 3))
if p > 0.05:
    print('Probably the same distribution')
else:
    print('Probably different distribution')
    

stat: 0.096
p: 0.908
Probably the same distribution


In [7]:
x = [18, 19, 22, 25, 27, 28, 41, 45, 51, 55]
y = [14, 15, 15, 17, 18, 22, 25, 25, 27, 34]

import numpy as np
import scipy

# define F-test function
def f_test(x, y):
    x = np.array(x)
    y = np.array(y)
    f = np.var(x, ddof=1) / np.var(y, ddof=1)
    dfn = x.size - 1
    dfd = y.size - 1
    p = 1 - scipy.stats.f.cdf(f, dfn, dfd)
    return f, p

f_test(x, y)

(4.387122002085505, 0.01912653593238578)

Kruskal-Wallis H-test for independent samples. The Kruskal-Wallis H-test tests the null hypothesis that the population median of all of the groups are equal. It is a non-parametric version of ANOVA. The test works on 2 or more independent samples, which may have different sizes. Note that rejecting the null hypothesis does not indicate which of the groups differs. Post hoc comparisons between groups are required to determine which groups are different.

In [8]:
from scipy import stats
x = [1, 3, 5, 7, 9]
y = [2, 4, 6, 8, 10]

print(stats.kruskal(x, y))


KruskalResult(statistic=0.2727272727272734, pvalue=0.6015081344405895)


In [9]:
x = [1, 1, 1]
y = [2, 2, 2]
z = [2, 2]
stats.kruskal(x, y, z)

KruskalResult(statistic=7.0, pvalue=0.0301973834223185)

In [10]:
from scipy import stats

group1 = [7, 14, 14, 13, 12, 9, 6, 14, 12, 8]
group2 = [15, 17, 13, 15, 15, 13, 9, 12, 10, 8]
group3 = [6, 8, 8, 9, 5, 14, 13, 8, 10, 9]
print(stats.kruskal(group1, group2, group3))

KruskalResult(statistic=6.287801578353988, pvalue=0.043114289703508814)


## Research Question

In previous years 52% of parents believed that electronics and social media was the cause of their teenager’s lack of sleep. Do more parents today believe that their teenager’s lack of sleep is caused due to electronics and social media?Population: Parents with a teenager (age 13-18)Parameter of Interest: pNull Hypothesis: p = 0.52Alternative Hypthosis: p > 0.52 (note that this is a one-sided test)1018 Parents56% believe that their teenager’s lack of sleep is caused due to electronics and social media.


In [11]:
group1 = [20, 23, 21, 25, 18, 17, 18, 24, 20, 24, 23, 19]
group2 = [24, 25, 21, 22, 23, 18, 17, 28, 24, 27, 21, 23]

import scipy.stats

stats.mannwhitneyu(group1, group2, alternative='two-sided')

MannwhitneyuResult(statistic=50.0, pvalue=0.21138945901258455)

The test statistic is 50.0 and the corresponding two-sided p-value is 0.2114.

In this example, the Mann-Whitney U Test uses the following null and alternative hypotheses:

H0: The mpg is equal between the two groups

HA: The mpg is not equal between the two groups

Since the p-value (0.2114) is not less than 0.05, we fail to reject the null hypothesis. We do not have sufficient evidence to say that the true mean mpg is different between the two groups.