## Summary notes

Perform an epidemiological study on the results of a 1-1 matched case-control study.

Data was taken from a case-control study undertaken to identify some of the riskfactors associated with death during heatwave in Chicago that occured from 12 July 1995 to 16 July 1995.
Cases were persons aged 24+ years who died between 14-17 July 1995, with a cause mentioned on the death certificate that was possibly heat related. For each case, a matched control was selected of the same age and living in the same neighbourhood.
The risk factor of interest was participation in group activities involving social interactions.
(Semenza, Rubin, Falter, *et al* (1996))

The results were as follows.

| Cases / Control             | Participated (+) | Did not participate (-) |
| --------------------------- | ---------------: | ----------------------: |
| **Participated (+)**        | 77               | 63                      |
| **Did not participate (-)** | 90               | 74                      |

A function was defined to return the Mantel-Haenszel odds ratio.
The results were initialised as a NumPy `array`.[^2]
A Mantel-Haenszel odds ratio for the association between participation in group activities and dying of heat-related disease was calculated, including a 95% confidence interval estimate.
Finally, McNemar's test[^1] was performed to test the null hypothesis of no association between participation in group activities and dying of heat-related disease.

These topics are covered by M249 Book 1, Part 2.

## Dependencies

In [1]:
import numpy as np
import pandas as pd
from scipy import stats as st
from statsmodels import api as sm
from numpy.typing import ArrayLike

## Constants

These are the results from the study.

In [2]:
OBS = np.array([[77, 63], [90, 74]])

## Functions

In [3]:
def mh_odds_ratio(ctable: ArrayLike, alpha: float = 0.05) -> tuple:
    """Return point and 100(1-alpha)% intervel estimates of the
    Mantel-Haenszel odds ratio.

    Pre-conditions:
    - arr represents the results of a 1-1 matched case-control study
        - shape(arr) = 2, 2
        - rows represent cases, columns represent controls
        - row 0, col 0 represent (+)
        - row 1, col 1 represent (-)
    - 0 < alpha < 1

    Post-conditions:
    - tuple of float estimates, (point, lcb, ucb)
    """

    f, g = ctable[0, 1], ctable[1, 0]
    ste = (st.norm.ppf(1 - (alpha/2)) * np.sqrt(1/f + 1/g))
    return (
        f / g,
        f / g * np.exp(-ste),
        f / g * np.exp(ste)
    )

## Main

### Matel-Haenszel odds ratio

In [4]:
res = mh_odds_ratio(OBS)
pd.Series(
    data=[res[0], res[1], res[2]],
    index=['point', 'lcb', 'ucb'],
    name='mantel-haenszel odds ratio'
)

point    0.700000
lcb      0.507309
ucb      0.965881
Name: mantel-haenszel odds ratio, dtype: float64

### McNemar's test

In [5]:
res = sm.stats.mcnemar(OBS, exact=False)
pd.Series(
    data=[res.statistic, res.pvalue],
    index=['statistc', 'pvalue'],
    name='mcnemar''s test'
)

statistc    4.418301
pvalue      0.035555
Name: mcnemars test, dtype: float64

## References

Semenza, J. C., Rubin, C. H., Falter, K. H., Selanikio, J. D., Flanders, W. D., Howe, H. L., & Wilhelm, J. L. (1996). Heat-related deaths during the July 1995 heat wave in Chicago. New England journal of medicine, **335**, 84-90.
[https://doi.org/10.1056/NEJM199607113350203](https://doi.org/10.1056/NEJM199607113350203).

[^1]: See [statsmodels.stats.contingency_tables.mcnemar](https://www.statsmodels.org/dev/generated/statsmodels.stats.contingency_tables.mcnemar.html)
[^2]: We did try to represent the data using StatsModels's `Table2x2` class, but unfortunately the `mcnemar` function does not support an instance of `Table2x2`

In [6]:
%load_ext watermark
%watermark --iversions

scipy      : 1.9.0
pandas     : 1.4.3
sys        : 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
numpy      : 1.23.2
statsmodels: 0.13.2

