# 频数表

$$
\chi^{2}=\sum \frac{(A-E)^{2}}{E}=\sum_{i=1}^{k} \frac{\left(A_{i}-E_{i}\right)^{2}}{E_{i}}=\sum_{i=1}^{k} \frac{\left(A_{i}-n p_{i}\right)^{2}}{n p_{i}} \quad(i=1,2,3, \ldots, k)
$$

+ 其中，$A_i$为i水平的观察频数，$E_i$为i水平的期望频数，n为总频数，$p_i$为i水平的期望频率。i水平的期望频数$E_i$等于总频数n×i水平的期望概率$p_i$，k为单元格数。当n比较大时，$\chi^{2}$统计量近似服从k-1(计算Ei时用到的参数个数)个自由度的卡方分布。

## 单因素卡方检验

+ 卡方检验是一种用途很广的计数资料的假设检验方法。它属于非参数检验的范畴，主要是比较两个及两个以上样本率( 构成比）以及两个分类变量的关联性分析。其根本思想就是在于比较理论频数和实际频数的吻合程度或拟合优度问题。

+ 它在分类资料统计推断中的应用，包括：两个率或两个构成比比较的卡方检验；多个率或多个构成比比较的卡方检验以及分类资料的相关分析等。
+ 零假设H0：观察分布等于期望分布

In [33]:
data = np.array([10, 6, 5, 4, 5, 3])
np.mean(data)

5.5

In [34]:
st.chisquare(data)

Power_divergenceResult(statistic=5.363636363636364, pvalue=0.37313038594870584)

In [40]:
tval = np.sum((data - np.mean(data)) ** 2) / np.mean(data)
tval

5.363636363636363

## 卡方列联表检验

In [47]:
data = np.array([[43, 9],
            [44, 4]])
data

array([[43,  9],
       [44,  4]])

In [63]:
st.chi2_contingency(data, correction=True) #V, p, dof, expected
st.chi2_contingency(data, correction=False) #V, p, dof, expected

(1.0724852071005921, 0.300384770390566, 1, array([[45.24,  6.76],
        [41.76,  6.24]]))

(1.7774150400145103, 0.1824670652605479, 1, array([[45.24,  6.76],
        [41.76,  6.24]]))

## fisher

In [64]:
obs = np.array([[1, 5],
               [8, 2]])
obs

array([[1, 5],
       [8, 2]])

In [66]:
fisher_result = st.fisher_exact(obs)
fisher_result

(0.05, 0.03496503496503495)

In [68]:
print('\nFISHER --------------------------------------------------------')
print(('The probability of obtaining a distribution at least as extreme '
+ 'as the one that was actually observed, assuming that the null ' +
'hypothesis is true, is: {0:5.3f}.'.format(fisher_result[1])))


FISHER --------------------------------------------------------
The probability of obtaining a distribution at least as extreme as the one that was actually observed, assuming that the null hypothesis is true, is: 0.035.


## McNemar检验（配对卡方检验）
+ 零假设：治疗方式是一样的

||after:出现|after:未出现|总数|
|:----|:----:|:----:|:----:|
|before:出现| 101  | 121  | 222  |
|before:未出现| 59  | 33  | 92  |
|总数| 160  | 154  | 314  |

In [88]:
data = np.array([[101, 121], [59, 33]])
data

array([[101, 121],
       [ 59,  33]])

In [89]:
from statsmodels.sandbox.stats.runs import mcnemar
chi2, p = mcnemar(data)
chi2, p

(59, 4.43444926375551e-06)

## Cochran's Q检验
+ 零假设：变量之间没有差别

$$12个对象在3个任务上成功（1）或者失败（0）$$  

|对象|任务1|任务2|任务3|
|:----|:----:|:----:|:----:|
|1|0|1|0|
|2|1|1|0|
|3|1|1|1|
|4|0|0|0|
|5|1|0|0|
|6|0|1|1|
|7|0|0|0|
|8|1|1|0|
|9|0|1|0|
|10|0|1|0|
|11|0|1|0|
|12|0|1|0|

In [90]:
tasks = np.array([[0,1,1,0,1,0,0,1,0,0,0,0],
          [1,1,1,0,0,1,0,1,1,1,1,1],
          [0,0,1,0,0,1,0,0,0,0,0,0]])

In [91]:
from statsmodels.sandbox.stats.runs import cochrans_q
q_stat, p = cochrans_q(tasks)
q_stat, p

(13.784810126582279, 0.24513037169064417)

# question

## 品茶实验

$H_0:没有办法辨别不同调制的牛奶$

In [92]:
data = np.array([[3, 1], [1, 3]])
data

array([[3, 1],
       [1, 3]])

In [94]:
st.fisher_exact(data, alternative='greater')

(9.0, 0.24285714285714263)

## 卡方列联表检验（1个自由度）

In [106]:
data = np.array([[36, 14], [30, 25]])
data
st.chi2_contingency(data, correction=True) #Yates校正

array([[36, 14],
       [30, 25]])

(2.710942466624286, 0.09966209595851808, 1, array([[31.42857143, 18.57142857],
        [34.57142857, 20.42857143]]))

In [107]:
data = np.array([[36, 14], [29, 26]])
data
st.chi2_contingency(data, correction=True) #Yates校正
st.chi2_contingency(data, correction=False)

array([[36, 14],
       [29, 26]])

(3.3483435314685317, 0.06727267795922476, 1, array([[30.95238095, 19.04761905],
        [34.04761905, 20.95238095]]))

(4.125104895104895, 0.04225140122445083, 1, array([[30.95238095, 19.04761905],
        [34.04761905, 20.95238095]]))

## 单向卡方检验（>1自由度）

In [108]:
data = np.array([4, 6, 14, 10, 16])

In [109]:
st.chisquare(data)

Power_divergenceResult(statistic=10.4, pvalue=0.03420269940871678)

## McNemar检验

In [110]:
data = np.array([[19, 1], [6, 14]])
data

array([[19,  1],
       [ 6, 14]])

In [111]:
from statsmodels.sandbox.stats.runs import mcnemar

In [112]:
mcnemar(data)

(1, 0.125)

In [114]:
data = np.array([[20, 0], [6, 14]])
data
mcnemar(data)

array([[20,  0],
       [ 6, 14]])

(0, 0.03125)