## Summary notes

Perform an epidemiological study on the results of a cohort study analysing the possible association between compulsory redundancies and incidents of serious self-inflicted injury (SSI) (Keefe, V., et al (2002)).
The exposure is being made compulsorily redundant, and the disease is incidents of serious self-inflicted injury.

The study results were as follows.

|                            | SSI (+) | no SSI (-) |
| -------------------------- | ------: | ---------: |
| **made redundant (+)**     | 14      | 1931       |
| **not made redundant (-)** | 4       | 1763       |

The results were initialised as a NumPy `array`, and the analysis was undertaken using (mainly) StatsModels' `Table2x2`[^1] class and SciPy.
Measures of association[^2] were calculated, including confidence interval estimates.
A chi-squared test of no association was used to test the strength of evidence of an association.
We rounded-off the analysis by performing Fisher's exact test.[^3] [^4]

Note, some of the results were outputted as Pandas `Series`, rather than using the default return type.
This was done done to provide a more standardised results output.

These topics are covered by M249 Book 1, Part 1.

## Dependencies

In [13]:
import numpy as np
import pandas as pd
from scipy import stats as st
from statsmodels import api as sm
%load_ext watermark

The watermark extension is already loaded. To reload it, use:
  %reload_ext watermark


## Constants

These are the results from the study.

In [14]:
OBS = np.array([[14, 1931], [4, 1763]])

## Main

### Initialise the contingency table

In [15]:
ctable = sm.stats.Table2x2(OBS)
print(ctable)

A 2x2 contingency table with counts:
[[  14. 1931.]
 [   4. 1763.]]


### Measures of association

Return point and interval estimates of the odds ratio.

In [16]:
pd.Series(
    data=[ctable.oddsratio,
          ctable.oddsratio_confint()[0],
          ctable.oddsratio_confint()[1]],
    index=['point', 'lcb', 'ucb'],
    name='odds ratio'
)

point    3.195495
lcb      1.049877
ucb      9.726081
Name: odds ratio, dtype: float64

Return point and interval estimates for the relative risk.

In [17]:
pd.Series(
    data=[ctable.riskratio,
          ctable.riskratio_confint()[0],
          ctable.riskratio_confint()[1]],
    index=['point', 'lcb', 'ucb'],
    name='relative risk'
)

point    3.179692
lcb      1.048602
ucb      9.641829
Name: relative risk, dtype: float64

### Chi-squared test for no association

Return the expected frequencies under the null hypothesis of no association.

In [18]:
ctable.fittedvalues

array([[   9.43157328, 1935.56842672],
       [   8.56842672, 1758.43157328]])

Return the differences between the observed and expected frequencies.

In [19]:
OBS - ctable.fittedvalues

array([[ 4.56842672, -4.56842672],
       [-4.56842672,  4.56842672]])

Return the contributions to the chi-squared test statistic.

In [20]:
ctable.chi2_contribs

array([[2.21283577, 0.01078263],
       [2.43574736, 0.01186883]])

Return the results of the chi-squared test.

We pass the argument `dtype=object`, so the `Series` can handle both `float` and `int` data types.

In [21]:
res = ctable.test_nominal_association()
pd.Series(
    data=[res.statistic, res.pvalue, int(res.df)],
    index=['statistc', 'pvalue', 'df'],
    name='chi-squared test',
    dtype=object
)

statistc    4.671235
pvalue      0.030672
df                 1
Name: chi-squared test, dtype: object

### Fisher's exact test

In [22]:
_, pvalue = st.fisher_exact(ctable.table)
pd.Series(
    data=[pvalue],
    index=['pvalue'],
    name='fisher''s exact'
)

pvalue    0.033877
Name: fishers exact, dtype: float64

## References

Vera Keefe, Papaarangi Reid, Clint Ormsby, Bridget Robson, Gordon Purdie, Joanne Baxter, Ngäti Kahungunu Iwi Incorporated, Serious health events following involuntary job loss in New Zealand meat processing workers, *International Journal of Epidemiology*, Volume 31, Issue 6, December 2002, Pages 1155–1161, https://doi.org/10.1093/ije/31.6.1155

[^1]: See [statsmodels.stats.contingency_tables.Table2x2](https://www.statsmodels.org/stable/generated/statsmodels.stats.contingency_tables.Table2x2.html)
[^2]: Odds ratio and relative risk.
[^3]: Technically this is not needed, given all expected values are greater than five, but we include it for completeness.
[^4]: There is no version of Fisher's exact test in StatsModels, so we use SciPy instead. See [scipy.stats.fisher_exact](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.fisher_exact.html)

In [23]:
%watermark --iversions

statsmodels: 0.13.2
scipy      : 1.9.0
numpy      : 1.23.2
sys        : 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
pandas     : 1.4.3

