# Two Sample Tests of Proportion - Python

## Fisher's Exact Test

* **Samples:** `2`
* **Response Categories:** `>=2`
* **Exact?:** Yes, use with `N<=200`
* **Reporting:** "Table 1 shows the counts of the ‘x’, ‘y’, and ‘z’ outcomes for each of ‘a’ and ‘b’. Fisher’s exact test indicated a statistically significant association between X and Y (p < .0001)."

In [1]:
import pandas as pd

df = pd.read_csv("data/1F2LBs_multinomial.csv")
df.head(20)

Unnamed: 0,S,X,Y
0,1,a,y
1,2,b,x
2,3,a,x
3,4,b,y
4,5,a,y
5,6,b,x
6,7,a,y
7,8,b,x
8,9,a,y
9,10,b,z


In [2]:
pd.crosstab(df["X"], df["Y"])

Y,x,y,z
X,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
a,3,26,1
b,14,9,7


In [5]:
# from scipy.stats import fisher_exact
# fisher_exact(pd.crosstab(df["X"], df["Y"]))

## G-Test

* **Samples:** `2`
* **Response Categories:** `>=2`
* **Exact?:** No, use with `N>200`
* **Reporting:** "Table 1 shows the counts of the ‘x’, ‘y’, and ‘z’ outcomes for each of ‘a’ and ‘b’. A G-test indicated a statistically significant association between X and Y (G(2) = 21.40, p < .0001)."

In [6]:
import pandas as pd

df = pd.read_csv("data/1F2LBs_multinomial.csv")
df.head(20)

Unnamed: 0,S,X,Y
0,1,a,y
1,2,b,x
2,3,a,x
3,4,b,y
4,5,a,y
5,6,b,x
6,7,a,y
7,8,b,x
8,9,a,y
9,10,b,z


In [8]:
xt = pd.crosstab(df["X"], df["Y"])
xt

Y,x,y,z
X,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
a,3,26,1
b,14,9,7


In [12]:
from scipy.stats import chi2_contingency

g_stat, p, dof, exp_freq = chi2_contingency(xt, lambda_="log-likelihood")
g_stat, p, dof

(21.402062415325055, 2.252170138338781e-05, 2)

## Two-Sample Pearson Chi-Squared Test

* **Samples:** `2`
* **Response Categories:** `>=2`
* **Exact?:** No, use with `N>200`
* **Reporting:** "Table 1 shows the counts of the ‘x’, ‘y’, and ‘z’ outcomes for each of ‘a’ and ‘b’. A two-sample Pearson Chi-Squared test indicated a statistically significant association between X and Y (χ2 (2, N=60) = 19.88, p < .0001)."

In [10]:
import pandas as pd

df = pd.read_csv("data/1F2LBs_multinomial.csv")
df.head(20)

Unnamed: 0,S,X,Y
0,1,a,y
1,2,b,x
2,3,a,x
3,4,b,y
4,5,a,y
5,6,b,x
6,7,a,y
7,8,b,x
8,9,a,y
9,10,b,z


In [11]:
xt = pd.crosstab(df["X"], df["Y"])
xt

Y,x,y,z
X,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
a,3,26,1
b,14,9,7


In [13]:
from scipy.stats import chi2_contingency

g_stat, p, dof, exp_freq = chi2_contingency(xt)
g_stat, p, dof

(19.87478991596639, 4.8333050401877814e-05, 2)