
# <b>  参数检验 </b>
参数检验要求数据满足方差齐性和正态分布；

##  ANOVA

Anova：单方差分析

H0 ： 多组数据来自同一分布（均值相同）   H1：多组数据不是来自同一分布（至少一组均值不同）

当p值大于α，接受H0；p值小于α ，不能接受H0

In [7]:
import pandas as pd
import numpy as np
import scipy.stats as stats
from statsmodels.stats.multicomp import (pairwise_tukeyhsd,
                                         MultiComparison)

# Create four random groups of data with a mean difference of 1

mu, sigma = 10, 3 # mean and standard deviation
group1 = np.random.normal(mu, sigma, 50)

mu, sigma = 11, 3 # mean and standard deviation
group2 = np.random.normal(mu, sigma, 50)

mu, sigma = 12, 3 # mean and standard deviation
group3 = np.random.normal(mu, sigma, 50)

mu, sigma = 13, 3 # mean and standard deviation
group4 = np.random.normal(mu, sigma, 50)

# Show the results for Anova

F_statistic, pVal = stats.f_oneway(group1, group2, group3, group4)

print ('P value:')
print (pVal)

P value:
1.9305265713143341e-10


P等于10^10，不能接受H0，即多组数据中至少有一组数据的均值与其他组不相同。下面将学习Tukey's和Holm-Bonferroni检验方法两两比较检验。
##  Tukey’s  test

See https://en.wikipedia.org/wiki/Tukey’s_range_test

H0 ： group1_mean - group2_mean = 0      H1 : group1_mean - group2_mean != 0

当p值大于α，接受H0；p值小于α ，不能接受H0

In [8]:
df = pd.DataFrame()
df['treatment1'] = group1
df['treatment2'] = group2
df['treatment3'] = group3
df['treatment4'] = group4

# Stack the data (and rename columns):

stacked_data = df.stack().reset_index()
stacked_data = stacked_data.rename(columns={'level_0': 'id',
                                            'level_1': 'treatment',
                                            0:'result'})
# Show the first 8 rows:

print (stacked_data.head(8))

   id   treatment     result
0   0  treatment1  11.543098
1   0  treatment2  12.759410
2   0  treatment3  17.083981
3   0  treatment4  12.117404
4   1  treatment1   8.476924
5   1  treatment2  11.067406
6   1  treatment3  13.800480
7   1  treatment4  10.641495


In [10]:
# Set up the data for comparison (creates a specialised object)
MultiComp = MultiComparison(stacked_data['result'],
                            stacked_data['treatment'])

# Show all pair-wise comparisons:

# Print the comparisons

print(MultiComp.tukeyhsd().summary())

 Multiple Comparison of Means - Tukey HSD,FWER=0.05 
  group1     group2   meandiff  lower  upper  reject
----------------------------------------------------
treatment1 treatment2  1.6106   0.0842 3.137   True 
treatment1 treatment3  2.1447   0.6182 3.6711  True 
treatment1 treatment4  4.2982   2.7718 5.8246  True 
treatment2 treatment3  0.534   -0.9924 2.0605 False 
treatment2 treatment4  2.6876   1.1611 4.214   True 
treatment3 treatment4  2.1535   0.6271 3.6799  True 
----------------------------------------------------


     meandiff :两组均值差
     lower ： 置信区间下限
     upper :  置信区间上限
     reject ： 是否接受H0 ，即meandiff是否处于置信区间上下限内，True表示接受H0，False表示无法接受H0

## Holm-Bonferroni Method

See: https://en.wikipedia.org/wiki/Holm%E2%80%93Bonferroni_method

和Tukey’s检验方法等同。

In [11]:
comp = MultiComp.allpairtest(stats.ttest_rel, method='Holm')
print (comp[0])

Test Multiple Comparison ttest_rel 
FWER=0.05 method=Holm
alphacSidak=0.01, alphacBonf=0.008
  group1     group2     stat   pval  pval_corr reject
-----------------------------------------------------
treatment1 treatment2 -2.5166 0.0152   0.0303   True 
treatment1 treatment3 -3.9663 0.0002   0.001    True 
treatment1 treatment4 -7.6241  0.0      0.0     True 
treatment2 treatment3 -0.8519 0.3984   0.3984  False 
treatment2 treatment4 -4.8897  0.0     0.0001   True 
treatment3 treatment4 -3.4572 0.0011   0.0034   True 
-----------------------------------------------------


# <b> 非参数检验 </b>
Kruskal-Wallace:用于对非正态分布的数据的检验。假设推断和ANOVA 检验相同：
    
    H0 ： 多组数据来自同一分布（均值相同）   H1：多组数据不是来自同一分布（至少一组均值不同）

当p值大于α，接受H0；p值小于α ，不能接受H0
    

In [14]:
h, p = stats.kruskal(group1, group2, group3, group4)

print ('P value of there being a signficant difference:')
print (p)

P value of there being a signficant difference:
8.938032905604823e-09
