## Summary notes

Perform an epidemiological study on the results of a cohort study analysing the possible association between compulsory redundancies and incidents of serious self-inflicted injury (SSI) (Keefe, V., et al (2002)).
The exposure is being made compulsorily redundant;
and the disease is incidents of serious self-inflicted injury.

The study results were as follows.

|                            | SSI (+) | no SSI (-) |
| -------------------------- | ------: | ---------: |
| **made redundant (+)**     | 14      | 1931       |
| **not made redundant (-)** | 4       | 1763       |


The results were initialised as a NumPy `array`.
Measures of association (odds ratio and relative risk) were calculated, including confidence interval estimates.
A chi-squared test of no association was used to test the strength of evidence of an association.
For completeness, we rounded-off the analysis by performing Fisher's exact test.

Note, some of the results were outputted as Pandas `Series`, rather than using the default return type.
This is optional, and only done to provide a more standardised results output.

These topics are covered by M249, Book 1, Part 1.

## Dependencies

In [1]:
import numpy as np
import pandas as pd
from scipy import stats as st
from statsmodels import api as sm

## Global constants

These are the results from the study.

In [2]:
OBS = np.array([[14, 1931], [4, 1763]])

## Main

### Initialise the results

In [3]:
ctable = sm.stats.Table2x2(OBS)
print(ctable)

A 2x2 contingency table with counts:
[[  14. 1931.]
 [   4. 1763.]]


### Measures of association

#### Odds ratio

Return the point and interval estimates of the odds ratio.

In [4]:
pd.Series(
    data={
        'point': ctable.oddsratio,
        'lcb': ctable.oddsratio_confint()[0],
        'ucb': ctable.oddsratio_confint()[1],
    },
    name='odds ratio'
)

point    3.195495
lcb      1.049877
ucb      9.726081
Name: odds ratio, dtype: float64

#### Relative risk

Return a point and interval estimate for the relative risk.

In [5]:
pd.Series(
    data={
        'point': ctable.riskratio,
        'lcb': ctable.riskratio_confint()[0],
        'ucb': ctable.riskratio_confint()[1],
    },
    name='relative risk'
)

point    3.179692
lcb      1.048602
ucb      9.641829
Name: relative risk, dtype: float64

### Chi-squared test for no association

The expected frequencies under the null hypothesis of no association.

In [6]:
ctable.fittedvalues

array([[   9.43157328, 1935.56842672],
       [   8.56842672, 1758.43157328]])

The differences between the observed and expected frequencies.

In [7]:
OBS - ctable.fittedvalues

array([[ 4.56842672, -4.56842672],
       [-4.56842672,  4.56842672]])

The contributions to the chi-squared test statistic.

In [8]:
ctable.chi2_contribs

array([[2.21283577, 0.01078263],
       [2.43574736, 0.01186883]])

The results of the chi-squared test.

In [9]:
res = ctable.test_nominal_association()
pd.Series(
    data={'statistic': res.statistic, 'pval': res.pvalue, 'df': int(res.df)},
    name='chi-squared test',
    dtype=object
)

statistic    4.671235
pval         0.030672
df                  1
Name: chi-squared test, dtype: object

### Fisher's exact test

This study would not need Fisher's exact test, given all expected frequencies are greater than 5, but we show it for completeness.
There is no version of Fisher's exact test in StatsModels, so we use SciPy instead.

In [10]:
_, pval = st.fisher_exact(ctable.table)
pd.Series(data={'pval': pval}, name='fisher''s exact')

pval    0.033877
Name: fishers exact, dtype: float64

## References

Vera Keefe, Papaarangi Reid, Clint Ormsby, Bridget Robson, Gordon Purdie, Joanne Baxter, Ngäti Kahungunu Iwi Incorporated, Serious health events following involuntary job loss in New Zealand meat processing workers, *International Journal of Epidemiology*, Volume 31, Issue 6, December 2002, Pages 1155–1161, https://doi.org/10.1093/ije/31.6.1155