-
Notifications
You must be signed in to change notification settings - Fork 86
/
likelihood_ratio_simple_continuous_data.Rmd
127 lines (90 loc) · 6.39 KB
/
likelihood_ratio_simple_continuous_data.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
---
title: "The likelihood ratio for continuous data"
author: "Matthew Stephens"
date: 2016-01-07
---
**Last updated:** `r Sys.Date()`
**Code version:** `r system("git log -1 --format='%H'", intern = TRUE)`
```{r chunk-options, include=FALSE}
source("chunk-options.R")
```
# Summary
This document introduces the likelihood ratio for continuous data and models, and explains its connection with discrete models.
# Pre-requisites
Be familiar with the [likelihood ratio for discrete data](likelihood_ratio_simple_models.html)
# Definition
Recall that if models $M_0$ and $M_1$ are fully-specified
model for discrete data $X=x$,
with probability mass functions $p(\cdot | M_0)$ and $p(\cdot | M_1)$, then
the likelihood ratio for $M_1$ vs $M_0$ is defined as
$$LR(M_1,M_0) := p(x | M_1)/p(x | M_0).$$
Now suppose that the data and models are continuous. So instead of a probability *mass* function, each model has a probability
*density* function. Then the likelihood ratio
for $M_1$ vs $M_0$ is usually defined as the ratio of the probability density functions. That is, we have exactly the same expression for the LR,
$$LR(M_1,M_0) := p(x | M_1)/p(x | M_0)$$
but now $p(\cdot | M_1)$ and $p(\cdot | M_0)$ are probability density functions instead of probability mass functions.
# Example
A medical screening test for a disease involves measuring the concentration ($X$) of a protein in the blood. In normal individuals $X$ has a Gamma distribution with mean 1 and shape 2.
In diseased individuals the protein becomes elevated, and $X$ has a Gamma distribution with mean 2 and shape 2. Plotting the probability density functions of these distributions yields:
```{r}
x = seq(0,10,length=100)
plot(x, dgamma(x,scale = 0.5,shape = 2), type="l", xlab="protein concentration")
lines(x, dgamma(x,scale = 1,shape = 2), type="l", col="red")
```
Suppose that for a particular patient we observe $X=4.02$. Then the likelihood ratio for
the model that this patient is from the normal group ($M_n$) vs the model
that the patient is from the diseased group ($M_d$) is ``dgamma(4.02,scale=0.5,shape=2)/dgamma(4.02,scale=1,shape=2)``
which is ``r signif(dgamma(4.02,scale=0.5,shape=2)/dgamma(4.02,scale=1,shape=2),3)``.
That is, the data favour this individual being diseased by a factor of approximately
``r signif(dgamma(4.02,scale=1,shape=2)/dgamma(4.02,scale=0.5,shape=2),2)``.
# Connection with Discrete Models
Often the likelihood ratio for continuous models is simply *defined* as the ratio of the
densities, as above. However, an alternative approach, which can yield greater insight, is instead to *derive* this result as an approximation, from the definition of likelihood ratio for discrete models, as follows.
The first step is to recognize that in practice all observations are actually discrete, because
of finite precision. Sometimes the measurement precision is made explicit, but often
it is implicit in the number of decimal places used to report an observation. For example, in the example above, where we were told that we observed a protein concentration of $X=4.02$, it would be reasonable to think that the measurement precision
is 2 decimal places, and that this observation actually corresponds to "$X$ lies in the interval $[4.015,4.025)$". The probability of this observation, under a continuous model for $X$, is the integral of the probability density function from $4.015$ to $4.025$. In other words, it is$F_X(4.025)-F_X(4.015)$ where $F_X$ denotes the cumulative distribution function for $X$.
With this view, the likelihood for the "observation" $X=4.02$ under $M_n$ is actually
``pgamma(4.025,scale=0.5,shape=2)-pgamma(4.015,scale=0.5,shape=2)`` = ``r pgamma(4.025,scale=0.5,shape=2)-pgamma(4.015,scale=0.5,shape=2)``.
Similarly, the likelihood under $M_d$ is ``r pgamma(4.025,scale=1,shape=2)-pgamma(4.015,scale=1,shape=2)``, and the likelihood ratio is
``r (pgamma(4.025,scale=0.5,shape=2)-pgamma(4.015,scale=0.5,shape=2))/(pgamma(4.025,scale=1,shape=2)-pgamma(4.015,scale=1,shape=2))``.
As you can see, this approach yields a LR that is numerically very close to that obtained using the ratio of the densities, as above. This is not a coincidence! Here is why we should expect this to happen more generally. Suppose we assume that measurement precision is $\epsilon$. So the "observation" $X=x$ really means $X \in [x-\epsilon,x+\epsilon]$.
Then the likelihood for a model $M$, given this observation,
is $\Pr(X \in [x-\epsilon,x+\epsilon] | M)$.
Provided that the density $p(x|M)$ is approximately constant in the region within radius $\epsilon$ around $x$, then this probability is approximately $2\epsilon p(x | M)$. Thus the LR for two models $M_1$ vs $M_0$, is given by
$$LR = \Pr(X \in [x-\epsilon,x+\epsilon] | M_1)/ \Pr(X \in [x-\epsilon,x+\epsilon] | M_0)
\approx 2\epsilon p(x | M_1)/2\epsilon p(x|M_0) = p(x|M_1)/p(x|M_0).$$
# An example where the approximation breaks down
The approximation usually works well, but
here is a simple example to illustrate how the approximation could break down in principle.
Consider observing a single datapoint $X$ and we compare the models that
$M_0: X \sim N(0,\sigma_0)$ vs $M_1: X \sim N(0,\sigma_1)$. Suppose that we observe $X=0.00$, assumed to be correct to the nearest 0.01. So the ``true" LR is
given by
```{r}
trueLR = function(s0,s1){
L0= pnorm(0.005,sd=s0)-pnorm(-0.005,sd=s0)
L1= pnorm(0.005,sd=s1)-pnorm(-0.005,sd=s1)
return(L0/L1)
}
```
and the approximation is given by
```{r}
approxLR = function(s0,s1){
return(dnorm(0,sd=s0)/dnorm(0,sd=s1))
}
```
Now, if $\sigma_0$ and $\sigma_1$ are both not too small the the approximation works fine. For example, for $\sigma_0,\sigma_1 = 0.5,1$ we have the truth and approximation as `r trueLR(0.5,1)` and `r approxLR(0.5,1)`.
But if one of the $\sigma_j$ is small, we have the problem that the density is not approximately constant within the region $[-0.005,0.005]$. For example,
at $\sigma_0,\sigma_1 = 0.001,1$ we have the
truth and approximation as `r trueLR(0.001,1)` and `r approxLR(0.001,1)`.
# Summary
In most cases, the Likelihood ratio for model $M_1$ vs model $M_0$ for a continuous
random variable $X$, given an observation
$X=x$, can we well approximated by
the ratio of the model densities of $X$, evaluated at $x$.
This approximation comes from assuming that the model density functions are
approximately constant within the neighborhood of $x$ that has radius equal to the measurement precision.
## Session information
```{r info}
sessionInfo()
```