# Parametric Statistical Significance Tests

Parametric statistical tests assume that a data sample was drawn from a specific population distribution. They often refer to statistical tests that assume the Gaussian distribution. Because it is so common for data to fit this distribution, parametric statistical methods are more commonly used.

In general, each test calculates a test statistic that must be interpreted with some background in statistics and a deeper knowledge of the statistical test itself. Tests also return a p-value that can be used to interpret the result of the test. The p-value can be thought of as the probability of observing the two data samples given the base assumption (null hypothesis) that the two samples were drawn from a population with the same distribution.

   * $p$ <= alpha: reject null hypothesis, different distribution
   * $p$ > alpha: fail to reject null hypothesis, same distribution


# Contents
   * **Student's t-Test (2 continuous)**
   * **paired Student's t-Test (2 continous, pre & post)**
   * **ANOVA (>2 categorical, continuous)**
   * **Chi-square (2 categorical)**

## Test data
Lets generate some test data

In [1]:
from numpy.random import seed, randn
import numpy as np
seed(1)

# generate two sets of univariate observations
data1 = 5 * randn(100) + 50
data2 = 5 * randn(100) + 51

# summarize
print('data1: mean=%.3f stdv=%.3f' % (np.mean(data1), np.std(data1)))
print('data2: mean=%.3f stdv=%.3f' % (np.mean(data2), np.std(data2)))

data1: mean=50.303 stdv=4.426
data2: mean=51.764 stdv=4.660


## Student's t-Test ( 2 continuous)

The Student’s t-test is a statistical hypothesis test that two independent data samples known to have a Gaussian distribution, have the same Gaussian distribution

> One of the most commonly used t tests is the independent samples t test. You use this test when you want to compare the means of two independent samples on a given variable.

The assumption or null hypothesis of the test is that the means of two populations are equal. A rejection of this hypothesis indicates that there is sufficient evidence that the means of the populations are different, and in turn that the distributions are not equal.

   * Fail to reject $H_o$: Sample distributions are equal
   * Reject $H_o$: Sample distributions are not equal.

[Scipy t test doc](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html)

In [2]:
from scipy.stats import ttest_ind
stat, p = ttest_ind(data1, data2)  #stat = t-statistic
print('Statistics = %.3f, p = %.3f'%(stat,p))

alpha = 0.05
if p > alpha:
    print('fail to reject H0')
else:
    print('reject H0')

Statistics = -2.262, p = 0.025
reject H0


# Paired Student's t-test

The data samples may represent two independent measures or evaluations of the same object. These data samples are repeated or dependent and are referred to as paired samples or repeated measures. Because the samples are not independent, we cannot use the Student’s t-test. Instead, we must use a modified version of the test that corrects for the fact that the data samples are dependent, called the paired Student’s t-test.

The test is simplified because it no longer assumes that there is variation between the observations, that observations were made in pairs, before and after a treatment on the same subject or subjects. The default assumption, or null hypothesis of the test, is that there is no difference in the means between the samples. The rejection of the null hypothesis indicates that there is enough evidence that the sample means are different.

Although the samples (data1 & data2) are independent, not paired, we can pretend for the sake of the demonstration that the observations are paired and calculate the statistic.

In [3]:
from scipy.stats import ttest_rel
seed(1)

# compare samples
stat, p = ttest_rel(data1, data2)
print('Statistics=%.3f, p=%.3f' % (stat, p))
# interpret
alpha = 0.05
if p > alpha:
    print('Same distributions (fail to reject H0)')
else:
    print('Different distributions (reject H0)')

Statistics=-2.372, p=0.020
Different distributions (reject H0)


The interpretation of the result suggests that the samples have different means and therefore different distributions.

## Analysis of Variance Test (ANOVA, >2 categorical and a continuous)

ANOVA can be used when we have multiple independent data samples. We can do multiple t-test on each combination but this is too onerous if we only interested in whether all samples have the same distribution or not. ANOVA assumes that the mean across 2 or more groups are equal. If the evidence suggests that this is not the case, the null hypothesis is rejected and at least one data sample has a different distribution.

   * Fail to reject $H_o$: All sample distributions are equal
   * Reject $H_o$: One or more sample distributions are not equal
Importantly, the test can only comment on whether all samples are the same or not; **it cannot quantify which samples differ or by how much.**

**Difference between ANOVA vs T Test**
A t-test compares means, while the ANOVA compares variances between populations.

*One way ANOVA*: You have a group of individuals randomly split into smaller groups and completing different tasks. For example, you might be studying the effects of tea on weight loss and form three groups: green tea, black tea, and no tea.

*Two way ANOVA*: For example, you might want to find out if there is an interaction between income and gender for anxiety level at job interviews. The anxiety level is the outcome, or the variable that can be measured. Gender and Income are the two categorical variables. These categorical variables are also the independent variables, which are called factors in a Two Way ANOVA

In [4]:
from scipy.stats import f_oneway

# generate three independent samples
data1 = 5 * randn(100) + 50
data2 = 5 * randn(100) + 50
data3 = 5 * randn(100) + 52
# compare samples
stat, p = f_oneway(data1, data2, data3)
print('Statistics=%.3f, p=%.3f' % (stat, p))
# interpret
alpha = 0.05
if p > alpha:
	print('Same distributions (fail to reject H0)')
else:
	print('Different distributions (reject H0)')

Statistics=3.655, p=0.027
Different distributions (reject H0)


The interpretation of the p-value correctly rejects the null hypothesis indicating that one or more sample means differ.

However, if i want to dive further to figure out which group is significantly different from the other, how do i go about? Well, you will have to perform the Tukey's range test! Lets use a different example to see the magic of post hoc test!

## One way ANOVA with post hoc

[source](http://cleverowl.uk/2015/07/01/using-one-way-anova-and-tukeys-test-to-compare-data-sets/)

In [5]:
from scipy.stats import f_oneway
data = np.rec.array([
('Pat', 5),
('Pat', 4),
('Pat', 4),
('Pat', 3),
('Pat', 9),
('Pat', 4),
('Jack', 4),
('Jack', 8),
('Jack', 7),
('Jack', 5),
('Jack', 1),
('Jack', 5),
('Alex', 9),
('Alex', 8),
('Alex', 8),
('Alex', 10),
('Alex', 5),
('Alex', 10)], dtype = [('Archer','|U5'),('Score', '<i8')])

In [6]:
data # the score for each player, pat jack alex

rec.array([('Pat',  5), ('Pat',  4), ('Pat',  4), ('Pat',  3),
           ('Pat',  9), ('Pat',  4), ('Jack',  4), ('Jack',  8),
           ('Jack',  7), ('Jack',  5), ('Jack',  1), ('Jack',  5),
           ('Alex',  9), ('Alex',  8), ('Alex',  8), ('Alex', 10),
           ('Alex',  5), ('Alex', 10)],
          dtype=[('Archer', '<U5'), ('Score', '<i8')])

In [7]:
f, p = f_oneway(data[data['Archer'] == 'Pat'].Score,
                data[data['Archer'] == 'Jack'].Score,
                data[data['Archer'] == 'Alex'].Score)

print ('One-way ANOVA')
print ('=============')
 
print ('F value:', f)
print ('P value:', p, '\n')

One-way ANOVA
F value: 4.999999999999998
P value: 0.021683749320078414 



As 0.02≤0.05 we reject the null hypothesis and we conclude that at least one of the means is different from at least one other population mean (i.e. not all archers perform equally).

### Tukey's range test
It is a common method used as post hoc analysis after one-way ANOVA. This test compares all possible pairs and we can use it to precisely identify difference between two means that's greater than the expected standard error.

In [8]:
from statsmodels.stats.multicomp import pairwise_tukeyhsd, MultiComparison

mc = MultiComparison(data['Score'], data['Archer'])
result = mc.tukeyhsd()

print(result)
print(mc.groupsunique)

Multiple Comparison of Means - Tukey HSD, FWER=0.05 
group1 group2 meandiff p-adj   lower   upper  reject
----------------------------------------------------
  Alex   Jack  -3.3333 0.0435 -6.5755 -0.0911   True
  Alex    Pat     -3.5 0.0337 -6.7422 -0.2578   True
  Jack    Pat  -0.1667    0.9 -3.4089  3.0755  False
----------------------------------------------------
['Alex' 'Jack' 'Pat']


The results above reveal that Alex (group 0) significantly differs from the other two archers. The third column tells us that there is significant evidence to reject the null hypothesis for the groups Alex-Jack (0-1) and Alex-Pat(0-2).

### 2 way ANOVA using anova_lm() from statsmodel

A botanist wants to know whether or not plant growth is influenced by sunlight exposure and watering frequency. She plants 30 seeds and lets them grow for two months under different conditions for sunlight exposure and watering frequency. After two months, she records the height of each plant, in inches.

In [16]:
import pandas as pd
df = pd.DataFrame({'water': np.repeat(['daily', 'weekly'], 15),
                   'sun': np.tile(np.repeat(['low', 'med', 'high'], 5), 2),
                   'height': [6, 6, 6, 5, 6, 5, 5, 6, 4, 5,
                              6, 6, 7, 8, 7, 3, 4, 4, 4, 5,
                              4, 4, 4, 4, 4, 5, 6, 6, 7, 8]})

df.head()

Unnamed: 0,water,sun,height
0,daily,low,6
1,daily,low,6
2,daily,low,6
3,daily,low,5
4,daily,low,6


In [17]:
import statsmodels.api as sm
from statsmodels.formula.api import ols

#perform two-way ANOVA
model = ols('height ~ C(water) + C(sun) + C(water):C(sun)', data=df).fit()
sm.stats.anova_lm(model, typ=2)

Unnamed: 0,sum_sq,df,F,PR(>F)
C(water),8.533333,1.0,16.0,0.000527
C(sun),24.866667,2.0,23.3125,2e-06
C(water):C(sun),2.466667,2.0,2.3125,0.120667
Residual,12.8,24.0,,


Since the p-values for water and sun are both less than .05, this means that both factors have a statistically significant effect on plant height. Although the ANOVA results tell us that watering frequency and sunlight exposure have a statistically significant effect on plant height, we would need to perform post-hoc tests to determine exactly how different levels of water and sunlight affect plant height.

## Chi-square Test  (data from 2 categorical)

The Pearson’s chi-squared statistical hypothesis is an example of a test for independence between categorical variables.

   * Pairs of categorical variables can be summarized using a contingency table.
   * The chi-squared test can compare an observed contingency table to an expected table and determine if the categorical variables are independent.
   
The Chi-Squared test is a statistical hypothesis test that assumes (the null hypothesis) that the observed frequencies for a categorical variable match the expected frequencies for the categorical variable. It first calculates the expected frequencies for the groups, then determining whether the division of the groups, called the observed frequencies, matches the expected frequencies.

> When observed frequency is far from the expected frequency, the corresponding term in the sum is large; when the two are close, this term is small. Large values of X^2 indicate that observed and expected frequencies are far apart. Small values of X^2 mean the opposite: observeds are close to expecteds. So X^2 does give a measure of the distance between observed and expected frequencies.

   * if p-value <= alpha: significant result, reject null hypothesis ($H_o$), dependent
   * if p-value > alpha: not significant result, fail to reject null hypothesis ($H_o$), independent.
   
degrees of freedom = degrees of freedom: (rows - 1) * (cols - 1)

[source](https://machinelearningmastery.com/chi-squared-test-for-machine-learning/)<br>
Statistics in Plain english 3rd edition page 164 (quick glance at results)

In [9]:
# assume the rows represent sex (male, female) and the columns represent subjects (psychology, english, biology)
table = [[10,20,30],
         [6,9,17]]
print(table)

[[10, 20, 30], [6, 9, 17]]


In [10]:
from scipy.stats import chi2_contingency, chi2

stat, p, dof, expected = chi2_contingency(table)

# interpret test-statistic
prob = 0.95
critical = chi2.ppf(prob, dof)
print('probability=%.3f, critical=%.3f, stat=%.3f' % (prob, critical, stat))
if abs(stat) >= critical:
    print('Dependent (reject H0)')
else:
    print('Independent (fail to reject H0)')

probability=0.950, critical=5.991, stat=0.272
Independent (fail to reject H0)


In [11]:
alpha = 1 - prob
print('significance=%.3f, p=%.3f' % (alpha, p))
if p <= alpha:
    print('Dependent (reject H0)')
else:
    print('Independent (fail to reject H0)')

significance=0.050, p=0.873
Independent (fail to reject H0)


In [12]:
# the expected frequency
print(expected)

[[10.43478261 18.91304348 30.65217391]
 [ 5.56521739 10.08695652 16.34782609]]


Failing to reject the null hypothesis means that gender and choice of major are independent, or unrelated

**Different numbers for chi-square test**

In [13]:
# columns represent psychology, english, biology
# rows represent sex
table = [[35,50,15],
         [30,25,45]]

stat, p, dof, expected = chi2_contingency(table)

# interpret test-statistic
prob = 0.95
critical = chi2.ppf(prob, dof)
print('probability=%.3f, critical=%.3f, stat=%.3f' % (prob, critical, stat))
if abs(stat) >= critical:
    print('Dependent (reject H0)')
else:
    print('Independent (fail to reject H0)')

probability=0.950, critical=5.991, stat=23.718
Dependent (reject H0)


In [14]:
alpha = 1 - prob
print('significance=%.3f, p=%.3f' % (alpha, p))
if p <= alpha:
    print('Dependent (reject H0)')
else:
    print('Independent (fail to reject H0)')

significance=0.050, p=0.000
Dependent (reject H0)


In [33]:
print(expected)

[[32.5 37.5 30. ]
 [32.5 37.5 30. ]]


In this example, we reject the null hypothesis and claim that the choice of major depends on, or is contingent upon, gender