# Dose-Response Analysis: Smoking and Lung Cancer

## Summary

## Introduction

## Analysis

In [1]:
from opyn.generic.pandasloader import PandasLoader
from opyn.stats import epidemeology as epyn
from pandas.api.types import CategoricalDtype

### Load the data

In [2]:
f = "smoking2"
pdloader = PandasLoader()
# pdloader.get_description(f)

In [3]:
dat = pdloader.get(f)
dat

Unnamed: 0,count,dose,outcome
0,32,50.0,case
1,13,50.0,control
2,136,37.0,case
3,71,37.0,control
4,196,19.5,case
5,190,19.5,control
6,250,9.5,case
7,293,9.5,control
8,35,2.0,case
9,82,2.0,control


Copy the dataframe to preserve the immutable nature of the data, and then recode the `exposure`, `outcome`, and `level` columns as ordered categorical data.

In [4]:
copieddat = dat.copy(deep=True)
catoutcomes = CategoricalDtype(["control", "case"], True)
copieddat["outcome"] = copieddat["outcome"].astype(catoutcomes)

Sort the dataframe to ensure the data is as expected.

In [5]:
sorteddat = copieddat.sort_values(by=["dose", "outcome"])
sorteddat

Unnamed: 0,count,dose,outcome
9,82,2.0,control
8,35,2.0,case
7,293,9.5,control
6,250,9.5,case
5,190,19.5,control
4,196,19.5,case
3,71,37.0,control
2,136,37.0,case
1,13,50.0,control
0,32,50.0,case


Extract the `count` column as a `2x2x2` `ndarray`.

In [6]:
resarr = sorteddat["count"].to_numpy().reshape((5, 2))
resarr

array([[ 82,  35],
       [293, 250],
       [190, 196],
       [ 71, 136],
       [ 13,  32]], dtype=int64)

### Dose-specific odds ratio

In [7]:
epyn.oddsratio(resarr)

Unnamed: 0,oddsratio,stderr,lower,upper
Exposed1 (-),1.0,0.0,,
Exposed2 (+),1.999025,0.219498,1.300112,3.073658
Exposed3 (+),2.416842,0.226123,1.551571,3.764652
Exposed4 (+),4.487726,0.249407,2.75252,7.316818
Exposed5 (+),5.767033,0.385927,2.706767,12.287232


### Dose-specific odds

In [8]:
epyn.doseexposure_odds(resarr)

Unnamed: 0,odds,log-odds
Exposed1,0.426829,-0.851371
Exposed2,0.853242,-0.158712
Exposed3,1.031579,0.031091
Exposed4,1.915493,0.649975
Exposed5,2.461538,0.900787


### Chi-squared test of no linear trend

In [9]:
epyn.chisq_lineartrend(resarr)

Unnamed: 0,chisq,pval
result,43.83024,3.581287e-11


## Discussion