## Summary

Perform a stratified analyses on the results of a stratified case-control study.

Data was taken from investigating the possible association between alcohol consumption and fatal car accidents in New York (J.R. McCarroll and W. Haddon Jr, 1962).
The data was stratified by marital status, which was believed to be a possible confounder.
The exposure was blood alcohol level of 100mg% or greater.
Cases were drivers who were killed in car accidents for which they were considered to be responsible, and controls were selected drivers passing the locations where the accidents of the cases occurred, at the same time of day and on the same day of the week.

The results were as follows.

| Married             | cases (+)   | controls (-) |
| ------------------- | ----------: | -----------: |
| **exposed (+)**     | 4           | 5            |
| **not exposed (-)** | 5           | 103          |

| Not married         | cases (+)   | controls (-) |
| ------------------- | ----------: | -----------: |
| **exposed (+)**     | 10          | 3            |
| **not exposed (-)** | 5           | 43           |

The tables were initialised as two NumPy `NDArrays`, one for each stratum/level.
The analysis was performed using two classes from StatsModels: `StratifiedTable`[^1] and `Table2x2`.[^2]
The results were outputted to either a Pandas `Series` or `DataFrame`, depending on the dimensionality of the result.
(This is optional, and done so to provide a standardised output.)

These topics are covered in M249, Book 1, Part 2.

## Dependencies

In [1]:
import numpy as np
import pandas as pd
from statsmodels import api as sm

## Global constants

These are the results from the study.

In [2]:
MARRIED = np.array([[4, 5], [5, 103]])
NOT_MARRIED = np.array([[10, 3], [5, 43]])

## Main

### Initialise the contingency tables

In [3]:
ctables = sm.stats.StratifiedTable([MARRIED, NOT_MARRIED])
ctables.table

array([[[  4.,  10.],
        [  5.,   3.]],

       [[  5.,   5.],
        [103.,  43.]]])

### Odds ratios

#### Stratum-specific odds ratios 

In [4]:
res = pd.DataFrame(index=['odds ratio', 'lcb', 'ucb'])
for level, arr in zip(['married', 'not married'], [MARRIED, NOT_MARRIED]):
    ctable = sm.stats.Table2x2(arr)
    res[level] = (ctable.oddsratio,
                  ctable.oddsratio_confint()[0],
                  ctable.oddsratio_confint()[1])
res.T

Unnamed: 0,odds ratio,lcb,ucb
married,16.48,3.354211,80.969975
not married,28.666667,5.856619,140.31607


#### Crude odds ratio

In [5]:
ctable = sm.stats.Table2x2(MARRIED + NOT_MARRIED)
pd.Series(
    data={
        'point': ctable.oddsratio,
        'lcb': ctable.oddsratio_confint()[0],
        'ucb': ctable.oddsratio_confint()[1]
     },
    name='crude odds ratio'
)

point    25.550000
lcb       8.682174
ucb      75.188827
Name: crude odds ratio, dtype: float64

#### Adjusted odds ratio (Mantel–Haenszel odds ratio)

In [6]:
pd.Series(
    data={
        'point': ctables.oddsratio_pooled,
        'lcb': ctables.oddsratio_pooled_confint()[0],
        'ucb': ctables.oddsratio_pooled_confint()[1]
     },
    name='Mantel-Haeszel odds ratio'
)

point    23.000610
lcb       7.465154
ucb      70.866332
Name: Mantel-Haeszel odds ratio, dtype: float64

### Test for no association

This is the Mantel–Haenszel test.

In [7]:
res = ctables.test_null_odds(correction=True)
pd.Series(
    data={'statistic': res.statistic, 'pval': res.pvalue},
    name='test for no association'
).round(5)

statistic    36.60431
pval          0.00000
Name: test for no association, dtype: float64

### Test for homogeneity

This is Tarone's test.

In [8]:
res = ctables.test_equal_odds(adjust=True)
pd.Series(
    data={'statistic': res.statistic.round(5), 'pval': res.pvalue.round(5)},
    name='test for homogeneity'
)

statistic    0.23557
pval         0.62742
Name: test for homogeneity, dtype: float64

## References

McCarroll, J.R. and Haddon Jr, W., 1962. A controlled study of fatal automobile accidents in New York City. Journal of chronic diseases, 15(8), pp.811-826.

## Footnotes

[^1]: [statsmodels.stats.contingency_tables.StratifiedTable](https://www.statsmodels.org/v0.13.0/generated/statsmodels.stats.contingency_tables.StratifiedTable.html#statsmodels.stats.contingency_tables.StratifiedTable)

[^2]: [statsmodels.stats.contingency_tables.Table2x2](https://www.statsmodels.org/v0.13.0/generated/statsmodels.stats.contingency_tables.Table2x2.html#statsmodels.stats.contingency_tables.Table2x2)