# Odds Ratios

![](https://i.ytimg.com/vi/wJXaroDs9oo/maxresdefault.jpg)

http://dept.stat.lsa.umich.edu/~kshedden/Python-Workshop/stats_calculations.html

The odds ratio is a measure of dependence bvetween two binary values. Suppose X and Y are tow binary data values, jointly observed on each observe dunit. For example X could be a person's gender (coded 0/1) and Y could be the person's political affiliation (coded 0/1).

If nij is the number of units observed with X=i and Y=j (for i,j=0,1), then the odds ratio is

(n00n11) / (n10n01).

In [1]:
import numpy as np
from scipy.stats.distributions import norm

In [6]:
N = np.array([[35,23], [20,29]]) 
LOR = np.log(N[0,0]) + np.log(N[1,1]) - np.log(np.log(N[0,1])) - np.log(N[1,0])
LOR

2.7841248116271244

sqrt(1/n00 + 1/n01 + 1/n10 + 1/n11),



In [7]:
SE = np.sqrt(np.sum(1/N.astype(np.float64)))
SE

0.39564181788795205

Here is a simulation study assessing the performance of the 95% confidence interval based on this standard error. We also look at the bias of the estimated log odds ratio.

In [12]:
## Simulation study for the log odds ratio

from scipy.stats import rv_discrete

## Cell probabilities
P = np.array([[0.3,0.2],[0.1,0.4]])

## The population log odds ratio
PLOR = np.log(P[0,0]) + np.log(P[1,1]) - np.log(P[0,1]) - np.log(P[1,0])

## Sample size
n = 100

## Number of simulated data sets
nrep = 1000

## ravel vectorizes by row
m = rv_discrete(values=((0,1,2,3), P.ravel()))

## Generate the data
D = m.rvs(size=(nrep,n))
D

array([[0, 1, 1, ..., 1, 2, 3],
       [3, 0, 1, ..., 3, 0, 3],
       [0, 3, 1, ..., 3, 1, 2],
       ...,
       [3, 3, 1, ..., 3, 3, 3],
       [3, 3, 3, ..., 0, 3, 1],
       [3, 1, 0, ..., 0, 3, 0]])

In [15]:
## Convert to cell counts
Q = np.zeros((nrep,4))
for j in range(4):
    Q[:,j] = (D == j).sum(1)
Q

array([[25., 18., 11., 46.],
       [34., 15., 10., 41.],
       [29., 14., 12., 45.],
       ...,
       [26., 17., 13., 44.],
       [29., 21.,  7., 43.],
       [34., 20.,  6., 40.]])

In [16]:
## Calculate the log odds ratio
LOR = np.log(Q[:,0]) + np.log(Q[:,3]) - np.log(Q[:,1]) - np.log(Q[:,2])

## The standard error
SE = np.sqrt((1/Q.astype(np.float64)).sum(1))

print( "The mean estimated standard error is %.3f" % SE.mean())
print( "The standard deviation of the estimates is %.3f" % LOR.std())


The mean estimated standard error is 0.473
The standard deviation of the estimates is 0.504


In [17]:
## 95% confidence intervals
LCL = LOR - 2*SE
UCL = LOR + 2*SE

## Coverage probability
CP = np.mean((PLOR > LCL) & (PLOR < UCL))

print( "The population LOR is %.2f" % PLOR)
print( "The expected value of the estimated LOR is %.2f" % LOR[np.isfinite(LOR)].mean())
print( "The coverage probability of the 95%% CI is %.3f" % CP)

The population LOR is 1.79
The expected value of the estimated LOR is 1.86
The coverage probability of the 95% CI is 0.950
