# Measures
This tutorial describes some of the basic measures used in epidemiology. For the purpose of this tutorial, we will refer to these measures as measures of association, since we will not believe that these observation measures reflect measure of effect. 

In the following example, we will use a sample data set included with *zEpid*. We will be interested in antiretroviral therapy (``art``) on all-cause mortality (``dead``). This data set will be referred to throughout the remainder of the guide. The specific measures of association we will calculate will refer to the measure at 45-weeks. We will defined ``art`` as $A$ where $A=1$ for treated with ART and $A=0$ for not treated with ART, and ``dead`` as $Y$ where $Y=1$ is died by 45-weeks and $Y=0$ is survived until 45-weeks. $\Pr(.)$ denotes the probability function, with $\Pr(C|D)$ as the conditional probability of $C$ given $D$, and $\hat{\Pr}(.)$ is the estimated probability

In [1]:
from zepid import load_sample_data

df = load_sample_data(timevary=False)
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 547 entries, 0 to 546
Data columns (total 9 columns):
id          547 non-null int64
male        547 non-null int64
age0        547 non-null int64
cd40        547 non-null int64
dvl0        547 non-null int64
art         547 non-null int64
dead        517 non-null float64
t           547 non-null float64
cd4_wk45    460 non-null float64
dtypes: float64(3), int64(6)
memory usage: 42.7 KB


As you can see, there are 30 missing the ``dead`` variable. We will ignore missing data throughout this tutorial. For how to deal with missing data, please refer to the guide on inverse probability of missing weights

## Risk Ratio
First we will calculate the risk ratio. The risk ratio is defined as

$$\widehat{RR} = \frac{\widehat{\Pr}(Y=1|A=1)}{\widehat{\Pr}(Y=1|A=0)}$$

To calculate this quantity in *zEpid* we will use the following code

In [2]:
from zepid import RiskRatio

rr = RiskRatio()
rr.fit(df, exposure='art', outcome='dead')
rr.summary()  # Prints the summary data

Comparison:0 to 1
+-----+-------+-------+
|     |   D=1 |   D=0 |
| E=1 |    10 |    67 |
+-----+-------+-------+
| E=0 |    77 |   363 |
+-----+-------+-------+ 

                            Risk Ratio                                
        Risk  SD(Risk)  Risk_LCL  Risk_UCL
Ref:0  0.175     0.018     0.139     0.211
1      0.130     0.038     0.055     0.205
----------------------------------------------------------------------
       RiskRatio  SD(RR)  RR_LCL  RR_UCL
Ref:0      1.000     NaN     NaN     NaN
1          0.742   0.313   0.402    1.37
----------------------------------------------------------------------
Missing E:    0
Missing D:    30
Missing E&D:  0


From the corresponding output, we see that the 45-week risk of all-cause among those given ART was 0.74 (95% CL: 0.40, 1.4) times that of those not given ART. In addition, the output provides a table, the risks by exposure, and a count of missing data

## Risk Difference
Similarly, we can calculate the risk difference as 

$$\widehat{RD} = \widehat{\Pr}(Y=1|A=1) - \widehat{\Pr}(Y=1|A=0)$$

To calculate the risk difference, we use the following code

In [3]:
from zepid import RiskDifference

rd = RiskDifference()
rd.fit(df, exposure='art', outcome='dead')
rd.summary()  # Prints the summary data

Comparison:0 to 1
+-----+-------+-------+
|     |   D=1 |   D=0 |
| E=1 |    10 |    67 |
+-----+-------+-------+
| E=0 |    77 |   363 |
+-----+-------+-------+ 

                            Risk Ratio                                
        Risk  SD(Risk)  Risk_LCL  Risk_UCL
Ref:0  0.175     0.018     0.139     0.211
1      0.130     0.038     0.055     0.205
----------------------------------------------------------------------
       RiskDifference  SD(RD)  RD_LCL  RD_UCL
Ref:0           0.000     NaN     NaN     NaN
1              -0.045   0.042  -0.128   0.038
----------------------------------------------------------------------
       RiskDifference    CLD  LowerBound  UpperBound
Ref:0           0.000    NaN         NaN         NaN
1              -0.045  0.166       -0.87        0.13
----------------------------------------------------------------------
Missing E:    0
Missing D:    30
Missing E&D:  0


The 45-week risk of all-cause mortality among those given ART was -4.5 percentage points (95% CL: -12.8, 4.8) compared to those not treated with ART. Again, we can see the two-by-two table, corresponding risk estimates, and information on missing data

You may have noticed that `RiskDifference` produces some additional output. Specifically, it generates something called `LowerBound` and `UpperBound`. These are the Frechet probability bounds. Their width will always be 1, but they are useful conceptually. These bounds containing the true risk difference, *without needing the exchangeability assumption*. They do assume no measurement error and causal consistency

## Odds Ratio
The odds ratio is defined as

$$\widehat{OR} = \frac{\frac{\widehat{\Pr}(Y=1|A=1)}{\widehat{\Pr}(Y=0|A=1)}}{\frac{\widehat{\Pr}(Y=1|A=0)}{\widehat{\Pr}(Y=0|A=0)}}$$

To calculate the odds ratio, we use the following code

In [4]:
from zepid import OddsRatio

oddr = OddsRatio()
oddr.fit(df, exposure='art', outcome='dead')
oddr.summary()  # Prints the summary data

Comparison:0 to 1
+-----+-------+-------+
|     |   D=1 |   D=0 |
| E=1 |    10 |    67 |
+-----+-------+-------+
| E=0 |    77 |   363 |
+-----+-------+-------+ 

                           Odds Ratio                                 
       OddsRatio  SD(OR)  OR_LCL  OR_UCL
Ref:0      1.000     NaN     NaN     NaN
1          0.704   0.361   0.346   1.429
----------------------------------------------------------------------
Missing E:    0
Missing D:    30
Missing E&D:  0


The 45-weeks odds of all-cause mortality among those treated with ART was 0.70 (95% CL: 0.35, 1.43) times that of those not treated with ART.

## Number Needed to Treat
Number needed to treat (NNT) is a specialized measure that is meant to be more interpretable. This measure is different from the above specifically because it implies a causal effect. For the purposes of this tutorial, we will proceed with the calculation. However, in practice you would need to believe that your association is truly causation. To calculate the NNT, you take the inverse of the risk difference

$$\widehat{NNT} = \widehat{RD}^{-1} = \left(\widehat{\Pr}(Y=1|A=1) - \widehat{\Pr}(Y=1|A=0)\right)^{-1}$$

To calculate the NNT, we use the following code

In [5]:
from zepid import NNT

nnt = NNT()
nnt.fit(df, exposure='art', outcome='dead')
nnt.summary()

                     Number Needed to Treat/Harm                      
Number Needed to Treat:  22.158
----------------------------------------------------------------------
95.0% two-sided CI: 
NNT  7.801 to infinity to NNH  26.368
----------------------------------------------------------------------
Missing E:    0
Missing D:    30
Missing E&D:  0


To prevent one death by 45-weeks, I would need to treat 23 individuals with ART. Notice that this interpretation inherently implies a causal effect. As such, NNT should be restricted to scenarios where you believe the association is actually causation (see the parts of the guide on causal inference).

In the above outcome, you will note that the confidence interval goes from NNT 7.80 to infinity to NNH (number needed to harm) 26.47. This confidence interval occurs because the risk difference crosses the null value ($RD=0$). *zEpid* produces confidence intervals as advocated by the late Douglas Altman (Altman, DG. BMJ 1998). Infinity occurs because at a risk difference of zero, the NNT becomes $\frac{1}{0}$. 

## Incidence Rate Ratio
In the previous measures, the denominator has been some form of person counts. For incidence rates, the denominator becomes the person-time contributed. In this example, we will go through the incidence rate ratio. The incidence ratio ratio is defined as 

$$\widehat{IRR} = \frac{\frac{a}{T_1}}{\frac{b}{T_0}}$$

where $a$ is the number of individuals given ART and died, $T_1$ is the person-time contributed by individuals treated with ART, $b$ is the number of individuals not given ART and died, and $T_0$ is the person-time contributed by individuals not treated with ART. The incidence rate ratio assumes that hazards follow an exponential distribution, meaning they are constant over time. This assumption may be more or less reasonable. If unreasonable, survival analysis methods like Kaplan-Meier, used be instead.

To calculate the incidence rate ratio, we use the following code

In [6]:
from zepid import IncidenceRateRatio

irr = IncidenceRateRatio()
irr.fit(df, exposure='art', outcome='dead', time='t')
irr.summary()

Comparison:0 to 1
+-----+-------+---------------+
|     |   D=1 |   Person-time |
| E=1 |    10 |       3094.05 |
+-----+-------+---------------+
| E=0 |    77 |      17962.4  |
+-----+-------+---------------+ 

                      Incidence Rate Ratio                            
       IncRate  SD(IncRate)  IncRate_LCL  IncRate_UCL
Ref:0    0.004        0.000        0.003        0.005
1        0.003        0.001        0.001        0.005
----------------------------------------------------------------------
       IncRateRatio  SD(IRR)  IRR_LCL  IRR_UCL
Ref:0         1.000      NaN      NaN      NaN
1             0.754    0.336     0.39    1.457
----------------------------------------------------------------------
Missing E:    0
Missing D:    30
Missing E&D:  0
Missing T:    0


Notice that incidence rate ratio additionally requires the ``time`` argument which is the variable. This variable is the person time contributed by person $i$.

## Incidence Rate Difference
Similarly, the incidence rate difference is defined in terms of person-time as well. It is defined as 

$$\widehat{IRD} = \frac{a}{T_1} - \frac{b}{T_0}$$

To calculate the incidence rate difference, we use the following code

In [7]:
from zepid import IncidenceRateDifference

ird = IncidenceRateDifference()
ird.fit(df, exposure='art', outcome='dead', time='t')
ird.summary()

Comparison:0 to 1
+-----+-------+---------------+
|     |   D=1 |   Person-time |
| E=1 |    10 |       3094.05 |
+-----+-------+---------------+
| E=0 |    77 |      17962.4  |
+-----+-------+---------------+ 

                    Incidence Rate Difference                         
       IncRate  SD(IncRate)  IncRate_LCL  IncRate_UCL
Ref:0    0.004        0.000        0.003        0.005
1        0.003        0.001        0.001        0.005
----------------------------------------------------------------------
       IncRateDiff  SD(IRD)  IRD_LCL  IRD_UCL
Ref:0        0.000      NaN      NaN      NaN
1           -0.001    0.001   -0.003    0.001
----------------------------------------------------------------------
Missing E:    0
Missing D:    30
Missing E&D:  0
Missing T:    0


## Conclusion
In this tutorial I demonstrated the calculation of several common epidemiology measures. This concludes the tutorial on measures. Please view other tutorials for more information on functions in *zEpid*