-
Notifications
You must be signed in to change notification settings - Fork 2
/
ash-depletion.Rmd
41 lines (30 loc) · 1.5 KB
/
ash-depletion.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
---
title: "ash-depletion"
author: "stephens999"
date: "2018-08-30"
output: workflowr::wflow_html
---
## Introduction
Let $J$ denote the number of classes (eg 4 for DNA data).
Suppose we have observed $J$-vectors of counts $x =(x_1,\dots,x_J)$ and $y=(y_1,\dots,y_J)$, with
$$x|p \sim Mult(n_p,p)$$ and
$$y|q \sim Mult(n_q,q).$$
If $n_p,n_q$ are large it is natural to use a Poisson approximation:
$$x_j \sim Poi(n_p p_j); y_j \sim Poi(n_q q_j)$$
from which we have:
$$x_j | (x_j+y_j) \sim Bin(x_j+y_j, \rho_j) \quad [*]$$
where
$$\rho_j = n_p p_j / (n_p p_j + n_q q_j).$$
Now note that
$$\log[\rho_j/(1-\rho_j)] = n_p p_j / n_q q_j = \log[n_p/n_q] + \log[p_j/q_j] \quad [**]$$
So estimating $\log(p_j/q_j)$ is effectively the same problem as estimating $\log(\rho_j/(1-\rho_j))$.
Now a natural esimate of $\log(\rho_j/(1-\rho_j))$ from [*] is $\log(x_j/y_j)$, but that does not work when either $x_j$ or $y_j$
is 0.
We had exactly this problem in smash (Xing and Stephens).
In that paper (section B.1) we developed a solution, which gives
an estimator for $\log(\rho_j/(1-\rho_j))$ and its standard error.
So the idea is we can use that estimator (subtracting $\log[n_p/n_q]$ as in [**]) as an estimator of $\log[p_j/q_j]$.
We also have standard errors, and can thus shrink these using ashr (estimating the mode using `mode="estimate"`).
This gives us shrunken estimates of $\log[p_j/q_j]$, and note
that the shrinkage will be strongest for those with large se,
which is the ones with small counts (especially 0s!)