# Th·ª±c hi·ªán 9 lo·∫°i ki·ªÉm ƒë·ªãnh cho b·∫£ng ch√©o

**BS. L√™ Ng·ªçc Kh·∫£ Nhi**

Trong b√†i n√†y, Nhi t·∫°o ra 1 c√¥ng c·ª• ph√¢n t√≠ch b·∫£ng ch√©o, cho ph√©p th·ª±c hi·ªán ƒë·ªìng th·ªùi 6 lo·∫°i ki·ªÉm ƒë·ªãnh theo ph∆∞∆°ng ph√°p ph·ªï qu√°t l√† Power divergence statistic v√† goodness of fit test, do 2 t√°c gi·∫£ Noel Cressie v√† Timothy R. C. Read l·∫≠p ra v√†o nƒÉm 1984.

Ph∆∞∆°ng ph√°p Power divergence c·ªßa Cressie v√† Read c√≥ tham s·ªë $\lambda$, g·ªìm c√°c gi√° tr·ªã c√≥ th·ªÉ:

+ $\lambda$ = 1 t∆∞∆°ng ƒë∆∞∆°ng v·ªõi ki·ªÉm ƒë·ªãnh Chi squared c·ªï ƒëi·ªÉn c√πa Pearson

+ $\lambda$ = 0 t∆∞∆°ng ƒë∆∞∆°ng v·ªõi G-test (log-likelihood ratio test)

+ $\lambda$ = -0.5 t∆∞∆°ng ƒë∆∞∆°ng v·ªõi tr·ªã s·ªë th·ªëng k√™ c·ªßa Freeman v√† Tukey

+ $\lambda$ = -1 t∆∞∆°ng ƒë∆∞∆°ng v·ªõi ki·ªÉm ƒë·ªãnh log-likelihood ratio hi·ªáu ch·ªânh

+ $\lambda$ = -2 t∆∞∆°ng ƒë∆∞∆°ng v·ªõi tr·ªã s·ªë th·ªëng k√™ c·ªßa Neyman

+ $\lambda$ = 2/3 l√† gi√° tr·ªã ƒë·ªÅ xu·∫•t b·ªüi Cressie v√† Read nƒÉm 1984

Ngo√†i ra, n·∫øu l√† b·∫£ng ch√©o 2x2, ta c√≥ th·ªÉ th·ª±c hi·ªán 3 lo·∫°i ki·ªÉm ƒë·ªãnh ch√≠nh x√°c (Exact tests) bao g·ªìm :

+ Ki·ªÉm ƒë·ªãnh Fisher (Fisher exact test, nƒÉm 1922)

+ Ki·ªÉm ƒë·ªãnh Boschloo (1970) 

+ Ki·ªÉm ƒë·ªãnh Barnard (1947)

In [1]:
warning_status = "ignore"
import warnings
warnings.filterwarnings(warning_status)
with warnings.catch_warnings():
    warnings.filterwarnings(warning_status, category=DeprecationWarning)

import numpy as np
import pandas as pd

from scipy.stats.contingency import expected_freq
from scipy.stats import power_divergence, boschloo_exact, fisher_exact, barnard_exact

from dataclasses import dataclass, field

def check_variable(data: pd.DataFrame, x: str, y:str) -> bool:

    return all(i in data.columns for i in [x, y])

@dataclass
class Contingency_test:

    data: pd.DataFrame = field(init=True)
    x: str = field(init=True)
    y: str = field(init=True)

    def __post_init__(self):
        if not check_variable(self.data, self.x, self.y):
            raise ValueError("x or y not in data columns")

        self.observed = pd.crosstab(self.data[self.x], self.data[self.y])

        self.expected = pd.DataFrame(
            expected_freq(self.observed),
            index=self.observed.index,
            columns=self.observed.columns,
        )

        for xtab, name in zip([self.observed, self.expected], ["Quan s√°t", "Gi·∫£ ƒë·ªãnh"]):

            if (xtab < 5).any(axis=None):
                print(f"L∆∞u √Ω: c√≥ gi√° tr·ªã nh·ªè h∆°n 5 trong b·∫£ng {name}")

        print("B·∫£ng ph√¢n b·ªë t·∫ßn su·∫•t")
        print("-" * 20)
        print(self.observed)

    def power_divergence(self, correction: bool = False) -> pd.DataFrame:
        """
        Cressie-Read power divergence statistic and goodness of fit test.
        """
        dof = float(
            self.expected.size - sum(self.expected.shape) + self.expected.ndim - 1
        )

        n = self.observed.values.sum()

        if dof == 1 and correction:
            self.observed = self.observed + 0.5 * np.sign(self.expected - self.observed)

        ddof = self.observed.size - 1 - dof

        methods = [
            "Pearson Chi2",
            "Cressie-Read",
            "G-test",
            "Freeman-Tukey",
            "mod-log-LR",
            "Neyman",
        ]

        stats = []

        for met, lambda_ in zip(methods, [1.0, 2 / 3, 0.0, -1 / 2, -1.0, -2.0]):

            if dof == 0:
                chi2, p, cramer = 0.0, 1.0, "NA"
            else:
                chi2, p = power_divergence(
                    self.observed, self.expected, ddof=ddof, axis=None, lambda_=lambda_
                )

                cramer = np.sqrt(chi2 / (n * (min(self.expected.shape) - 1)))

            stats.append(
                {
                    "Ph∆∞∆°ng ph√°p": met,
                    "df": dof,
                    "\u03C7" + "\u00b2": chi2,
                    "Cramer's V": cramer,
                    "Gi√° tr·ªã p": p,
                    "Ph·ªß ƒë·ªãnh H0": "C√≥" if p < 0.05 else "Kh√¥ng",
                }
            )

        out = pd.DataFrame(stats)

        return out.style.hide_index()

    def exact_test(self) -> pd.DataFrame:
        """
        Exact tests for 2x2 cross-table
        """
        # verify that the table is 2x2
        if self.observed.shape != (2, 2):
            raise ValueError("Ch·ªâ √°p d·ª•ng cho b·∫£ng ch√©o 2x2")

        methods = ["Boschloo", "Fisher", "Barnard"]
        stats = []

        for i, met in enumerate(methods):

            if i == 0:
                res = boschloo_exact(self.observed)
                stat, p = res.statistic, res.pvalue
            elif i == 1:
                stat, p = fisher_exact(self.observed)
            elif i == 2:
                res = barnard_exact(self.observed)
                stat, p = res.statistic, res.pvalue

            stats.append(
                {
                    "Ph∆∞∆°ng ph√°p": met,
                    "Tr·ªã s·ªë tk": stat,
                    "Gi√° tr·ªã p": p,
                    "Ph·ªß ƒë·ªãnh H0": "C√≥" if p < 0.05 else "Kh√¥ng",
                }
            )

        out = pd.DataFrame(stats)

        return out.style.hide_index()

# Th√≠ d·ª• minh h·ªça:  B·ªô d·ªØ li·ªáu Sinh non v√† Covid-19

In [2]:
df = pd.read_excel('Covid_pretermbirth.xlsx')

df

Unnamed: 0,ID,Tu·ªïi,COVID,Ti·ªÅn s·ª≠ SN,Ti·ªÅn s·ª≠ PT,Tu·ªïi thai,Sinh non,ƒêTƒêTK
0,1,30-35,1,0,0,260,0,0
1,2,30-35,1,0,0,274,0,0
2,3,30-35,1,1,0,268,0,0
3,4,30-35,1,0,0,277,0,1
4,5,30-35,1,0,0,277,0,0
...,...,...,...,...,...,...,...,...
224,225,35-40,0,0,0,244,1,0
225,226,35-40,0,0,0,240,1,0
226,227,30-35,0,0,0,253,1,0
227,228,30-35,0,0,0,251,1,0


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 229 entries, 0 to 228
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   ID          229 non-null    int64 
 1   Tu·ªïi        229 non-null    object
 2   COVID       229 non-null    int64 
 3   Ti·ªÅn s·ª≠ SN  229 non-null    int64 
 4   Ti·ªÅn s·ª≠ PT  229 non-null    int64 
 5   Tu·ªïi thai   229 non-null    int64 
 6   Sinh non    229 non-null    int64 
 7   ƒêTƒêTK       229 non-null    int64 
dtypes: int64(7), object(1)
memory usage: 14.4+ KB


In [4]:
df.columns

Index(['ID', 'Tu·ªïi', 'COVID', 'Ti·ªÅn s·ª≠ SN', 'Ti·ªÅn s·ª≠ PT', 'Tu·ªïi thai',
       'Sinh non', 'ƒêTƒêTK'],
      dtype='object')

In [5]:
test = Contingency_test(df, 'COVID','Sinh non')

B·∫£ng ph√¢n b·ªë t·∫ßn su·∫•t
--------------------
Sinh non   0   1
COVID           
0         95  81
1         32  21


In [6]:
test.power_divergence(correction = True)

Ph∆∞∆°ng ph√°p,df,œá¬≤,Cramer's V,Gi√° tr·ªã p,Ph·ªß ƒë·ªãnh H0
Pearson Chi2,1.0,0.441202,0.043894,0.506543,Kh√¥ng
Cressie-Read,1.0,0.441892,0.043928,0.506211,Kh√¥ng
G-test,1.0,0.443406,0.044003,0.505483,Kh√¥ng
Freeman-Tukey,1.0,0.444661,0.044065,0.504881,Kh√¥ng
mod-log-LR,1.0,0.446018,0.044133,0.504232,Kh√¥ng
Neyman,1.0,0.44905,0.044282,0.502786,Kh√¥ng


In [7]:
test.exact_test()

Ph∆∞∆°ng ph√°p,Tr·ªã s·ªë tk,Gi√° tr·ªã p,Ph·ªß ƒë·ªãnh H0
Boschloo,0.302446,0.516395,Kh√¥ng
Fisher,0.804435,0.528414,Kh√¥ng
Barnard,-0.679029,0.609462,Kh√¥ng


In [8]:
from IPython.display import display

for k in ['COVID', 'Ti·ªÅn s·ª≠ SN', 'Ti·ªÅn s·ª≠ PT', 'ƒêTƒêTK']:
    
    test = Contingency_test(df, k,'Sinh non')
    out1 = test.power_divergence()
    display(out1)
    out2 = test.exact_test()
    display(out2)
    print('-'*50)

B·∫£ng ph√¢n b·ªë t·∫ßn su·∫•t
--------------------
Sinh non   0   1
COVID           
0         95  81
1         32  21


Ph∆∞∆°ng ph√°p,df,œá¬≤,Cramer's V,Gi√° tr·ªã p,Ph·ªß ƒë·ªãnh H0
Pearson Chi2,1.0,0.675448,0.05431,0.411159,Kh√¥ng
Cressie-Read,1.0,0.676795,0.054364,0.410693,Kh√¥ng
G-test,1.0,0.679807,0.054485,0.409653,Kh√¥ng
Freeman-Tukey,1.0,0.682349,0.054587,0.408779,Kh√¥ng
mod-log-LR,1.0,0.685137,0.054698,0.407824,Kh√¥ng
Neyman,1.0,0.691471,0.05495,0.405665,Kh√¥ng


Ph∆∞∆°ng ph√°p,Tr·ªã s·ªë tk,Gi√° tr·ªã p,Ph·ªß ƒë·ªãnh H0
Boschloo,0.253958,0.434415,Kh√¥ng
Fisher,0.769676,0.434737,Kh√¥ng
Barnard,-0.821856,0.420205,Kh√¥ng


--------------------------------------------------
L∆∞u √Ω: c√≥ gi√° tr·ªã nh·ªè h∆°n 5 trong b·∫£ng Quan s√°t
L∆∞u √Ω: c√≥ gi√° tr·ªã nh·ªè h∆°n 5 trong b·∫£ng Gi·∫£ ƒë·ªãnh
B·∫£ng ph√¢n b·ªë t·∫ßn su·∫•t
--------------------
Sinh non      0   1
Ti·ªÅn s·ª≠ SN         
0           125  95
1             2   7


Ph∆∞∆°ng ph√°p,df,œá¬≤,Cramer's V,Gi√° tr·ªã p,Ph·ªß ƒë·ªãnh H0
Pearson Chi2,1.0,4.189359,0.135256,0.040678,C√≥
Cressie-Read,1.0,4.192511,0.135307,0.040603,C√≥
G-test,1.0,4.310916,0.137204,0.037869,C√≥
Freeman-Tukey,1.0,4.510856,0.14035,0.03368,C√≥
mod-log-LR,1.0,4.825607,0.145164,0.02804,C√≥
Neyman,1.0,5.917844,0.160755,0.014988,C√≥


Ph∆∞∆°ng ph√°p,Tr·ªã s·ªë tk,Gi√° tr·ªã p,Ph·ªß ƒë·ªãnh H0
Boschloo,0.043706,0.066852,Kh√¥ng
Fisher,4.605263,0.081649,Kh√¥ng
Barnard,2.046792,0.041546,C√≥


--------------------------------------------------
B·∫£ng ph√¢n b·ªë t·∫ßn su·∫•t
--------------------
Sinh non     0   1
Ti·ªÅn s·ª≠ PT        
0           94  83
1           33  19


Ph∆∞∆°ng ph√°p,df,œá¬≤,Cramer's V,Gi√° tr·ªã p,Ph·ªß ƒë·ªãnh H0
Pearson Chi2,1.0,1.744379,0.087278,0.186585,Kh√¥ng
Cressie-Read,1.0,1.750666,0.087435,0.185793,Kh√¥ng
G-test,1.0,1.765492,0.087804,0.183941,Kh√¥ng
Freeman-Tukey,1.0,1.778644,0.088131,0.182316,Kh√¥ng
mod-log-LR,1.0,1.793604,0.0885,0.180488,Kh√¥ng
Neyman,1.0,1.829218,0.089375,0.17622,Kh√¥ng


Ph∆∞∆°ng ph√°p,Tr·ªã s·ªë tk,Gi√° tr·ªã p,Ph·ªß ƒë·ªãnh H0
Boschloo,0.122303,0.199034,Kh√¥ng
Fisher,0.652063,0.206827,Kh√¥ng
Barnard,-1.320749,0.202961,Kh√¥ng


--------------------------------------------------
B·∫£ng ph√¢n b·ªë t·∫ßn su·∫•t
--------------------
Sinh non    0   1
ƒêTƒêTK            
0         115  85
1          12  17


Ph∆∞∆°ng ph√°p,df,œá¬≤,Cramer's V,Gi√° tr·ªã p,Ph·ªß ƒë·ªãnh H0
Pearson Chi2,1.0,2.664568,0.107869,0.102606,Kh√¥ng
Cressie-Read,1.0,2.655831,0.107692,0.103171,Kh√¥ng
G-test,1.0,2.648715,0.107547,0.103634,Kh√¥ng
Freeman-Tukey,1.0,2.652425,0.107623,0.103392,Kh√¥ng
mod-log-LR,1.0,2.663943,0.107856,0.102646,Kh√¥ng
Neyman,1.0,2.710933,0.108803,0.099663,Kh√¥ng


Ph∆∞∆°ng ph√°p,Tr·ªã s·ªë tk,Gi√° tr·ªã p,Ph·ªß ƒë·ªãnh H0
Boschloo,0.07636,0.122885,Kh√¥ng
Fisher,1.916667,0.113344,Kh√¥ng
Barnard,1.63235,0.109453,Kh√¥ng


--------------------------------------------------


# T√†i li·ªáu tham kh·∫£o

Noel Cressie and Timothy R. C. Read. Multinomial goodness‚Äêof‚Äêfit tests. Journal of the Royal Statistical Society: Series B (Methodological), 1984; 46(3), 440-464.

Freeman, M.F. and J.W. Tukey (1950), Transformations related to the angular and the square root, Ann. Math. Statist. 21, 607-611.

Campbell B. Read. Freeman--Tukey chi-squared goodness-of-fit statistics. Statistics & Probability Letters, 1993, vol. 18, issue 4, 271-278.

J. Neyman and E. S. Pearson, On the use and interpretation of certain test criteria for purposes of statistical inference: Part i, Biometrika 20A (1928) 175‚Äì240.

Xiangpan Ji, Wenqiang Gu, Xin Qian, Hanyu Wei, Chao Zhang. Combined Neyman-Pearson Chi-square: An Improved Approximation to the Poisson-likelihood Chi-square. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 2020; 961:163677

Frank Yates. Contingency Tables Involving Small Numbers and the  ùúí2ùë°ùëíùë†ùë° . Supplement to the Journal of the Royal Statistical Society, 1934; 1: 217-235.


Fisher, R. A. (1922). "On the interpretation of œá2 from contingency tables, and the calculation of P". Journal of the Royal Statistical Society. 85 (1): 87‚Äì94. doi:10.2307/2340521. JSTOR 2340521.

R.D. Boschloo. ‚ÄúRaised conditional level of significance for the 2 x 2-table when testing the equality of two probabilities‚Äù, Statistica Neerlandica, 24(1), 1970

https://en.wikipedia.org/wiki/Fisher%27s_exact_test
https://en.wikipedia.org/wiki/Barnard%27s_test
https://en.wikipedia.org/wiki/Boschloo%27s_test

Barnard, G. A. ‚ÄúSignificance Tests for 2x2 Tables‚Äù. Biometrika. 34.1/2 (1947): 123-138. DOI:dpgkg3