# Stratified Analysis: Fluoridated Water and Dental Caries

## Summary

## Introduction

## Analysis

In [1]:
from opyn.generic.pandasloader import PandasLoader
from opyn.stats import epidemeology as epyn
from pandas.api.types import CategoricalDtype

### Load the data

In [2]:
f = "dentalcaries"
pdloader = PandasLoader()
# pdloader.get_description(f)

In [3]:
dat = pdloader.get(f)
dat

Unnamed: 0,count,exposure,outcome,level
0,5,fluoridated,caries,age 8 years
1,25,fluoridated,no caries,age 8 years
2,8,not fluoridated,caries,age 8 years
3,23,not fluoridated,no caries,age 8 years
4,0,fluoridated,caries,age 9 years
5,17,fluoridated,no caries,age 9 years
6,17,not fluoridated,caries,age 9 years
7,33,not fluoridated,no caries,age 9 years
8,5,fluoridated,caries,age 10 years
9,13,fluoridated,no caries,age 10 years


Copy the dataframe to preserve the immutable nature of the data, and then recode the `exposure`, `outcome`, and `level` columns as ordered categorical data.

In [4]:
copieddat = dat.copy(deep=True)
copieddat["exposure"] = copieddat["exposure"].astype(
    CategoricalDtype(["fluoridated", "not fluoridated"], True)
)
copieddat["outcome"] = copieddat["outcome"].astype(
    CategoricalDtype(["no caries", "caries"], True)
)
copieddat["level"] = copieddat["level"].astype(
    CategoricalDtype(["age 8 years", "age 9 years", "age 10 years", "age 11-12 years"], True)
)

Sort the dataframe to ensure the data is as expected.

In [5]:
sorteddat = copieddat.sort_values(by=["level", "exposure", "outcome"])
sorteddat

Unnamed: 0,count,exposure,outcome,level
1,25,fluoridated,no caries,age 8 years
0,5,fluoridated,caries,age 8 years
3,23,not fluoridated,no caries,age 8 years
2,8,not fluoridated,caries,age 8 years
5,17,fluoridated,no caries,age 9 years
4,0,fluoridated,caries,age 9 years
7,33,not fluoridated,no caries,age 9 years
6,17,not fluoridated,caries,age 9 years
9,13,fluoridated,no caries,age 10 years
8,5,fluoridated,caries,age 10 years


Extract the `count` column as a `2x2x2` `ndarray`.

In [6]:
resarr = sorteddat["count"].to_numpy().reshape((4, 2, 2))
resarr

array([[[25,  5],
        [23,  8]],

       [[17,  0],
        [33, 17]],

       [[13,  5],
        [14, 24]],

       [[16,  5],
        [25, 29]]], dtype=int64)

It is this new reshaped `ndarray` that we will pass to the various functions for analysis.

### Stratum-specific odds ratio

In [7]:
epyn.oddsratio(resarr[0])  # age 8

Unnamed: 0,oddsratio,stderr,lower,upper
Exposed1 (-),1.0,0.0,,
Exposed2 (+),1.73913,0.639123,0.496947,6.086318


In [8]:
epyn.oddsratio(resarr[2])  # age 10

Unnamed: 0,oddsratio,stderr,lower,upper
Exposed1 (-),1.0,0.0,,
Exposed2 (+),4.457143,0.624514,1.310596,15.158081


In [9]:
epyn.oddsratio(resarr[3])  # age 11-12

Unnamed: 0,oddsratio,stderr,lower,upper
Exposed1 (-),1.0,0.0,,
Exposed2 (+),3.712,0.580502,1.189826,11.580633


### Unadjusted odds ratio

In [10]:
epyn.crude_oddsratio(resarr)

Unnamed: 0,oddsratio,stderr,lower,upper
Exposed1 (-),1.0,0.0,,
Exposed2 (+),3.886316,0.322642,2.064926,7.314281


### Tarone's test of homogeneity

In [11]:
epyn.test_equalodds(resarr)

Unnamed: 0,chisq,pval
result,3.960342,0.265778


### Adjusted odds ratio

In [12]:
epyn.adjusted_oddsratio(resarr)

Unnamed: 0,oddsratio,stderr,lower,upper
result,4.029689,0.34046,2.067622,7.853659


### Test of no association

In [13]:
epyn.test_nullodds(resarr)

Unnamed: 0,chisq,pval
result,17.725273,2.6e-05


## Discussion