Joint probabilities for pair emission? #29

psathyrella · 2014-12-19T20:12:31Z

The basic situation

We have a per-site mutation frequency, f (the fraction of observed sequences that have a mutation at the site), and we want to fill in the 4x4 table of pair emission probabilities in the HMM.

Independent emissions

The simplest way to do this is to make the emission probabilities a product of two factors (one for each sequence), where each factor is f/3 (if the sequence is not germline) and 1-f (if the sequence is germline).

virtues: simple, and it appears empirically to be approximately correct, i.e. bayes factors of related (unrelated) sequences are greater (less) than zero.
but: ignores the fact that related sequences have correlations in their mutations
looks like this (symmetric entries omitted for clarity)

	germline	mutated	mutated	mutated
germline	(1-f)(1-f)	f(1-f)/3	f(1-f)/3	f(1-f)/3
mutated		f*f / 9	f*f / 9	f*f / 9
mutated			f*f / 9	f*f / 9
mutated				f*f / 9

Joint emissions

So all we need to implement joint emission is fill in the entries in the matrix so they take into account that if the two sequences are mutated to the same base, they're more likely to be clonally related. Except I haven't worked out a good way to do this. All the things I've tried require assumptions about branch lengths and tree topology which are not always true, so empirically they end up not being that great.

Erick and I talked about this a few months ago. If memory serves we got as far as he was actually convinced it was non-trivial, but didn't work out how to do it.

This is quite related to #8.

The text was updated successfully, but these errors were encountered:

psathyrella · 2014-12-20T01:01:04Z

parametrize matrix with a bunch of parameters and estimate with baum-welch
- should be sum of four matrices: all four bases

matsen · 2014-12-20T01:46:31Z

For those of you, such as @vnminin , who might be following along, the idea is that we could have the emission probabilities be a mixture of two cases: the two sequences are derived from a common mutant from germline, or they are not. We can just assume that the common mutant from germline has a uniform base (which explains @psathyrella's "sum of four matrices").

psathyrella · 2014-12-23T01:38:23Z

Another thing I just realized is that if we stick with independent emissions the k-hmm is pretty easy, while if we do joint emission it'd be a collosal fisterclick.

matsen · 2014-12-23T03:40:26Z

Amen to that.

On Mon, Dec 22, 2014 at 5:38 PM, Duncan Ralph notifications@github.com
wrote:

Another thing I just realized is that if we stick with independent
emissions the k-hmm is pretty easy, while if we do joint emission it'd be a
collosal fisterclick.

—
Reply to this email directly or view it on GitHub
#29 (comment).

Frederick "Erick" Matsen, Assistant Member
Fred Hutchinson Cancer Research Center
http://matsen.fhcrc.org/

matsen · 2016-03-17T23:14:31Z

Similar mutations should indicate shared ancestry, but Duncan correctly points out that clonal lineages may not have many shared mutations if the "trunk" is short.

psathyrella · 2016-04-06T20:22:40Z

very similar to #175

psathyrella · 2016-11-04T02:09:16Z

2bae2ea

may turn out to be the better way to handle this

psathyrella added prio:middling clustering labels Dec 20, 2014

psathyrella changed the title ~~Decide how to calculate pair emission probabilities~~ Do we want to use joint probabilities for pair emission? Dec 28, 2014

psathyrella added prio:low and removed clustering prio:middling labels Dec 28, 2014

psathyrella changed the title ~~Do we want to use joint probabilities for pair emission?~~ Joint probabilities for pair emission? May 28, 2015

matsen self-assigned this Mar 17, 2016

psathyrella mentioned this issue Apr 6, 2016

Investigate potential for using shared mutations for clustering #175

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Joint probabilities for pair emission? #29

Joint probabilities for pair emission? #29

psathyrella commented Dec 19, 2014

psathyrella commented Dec 20, 2014

matsen commented Dec 20, 2014

psathyrella commented Dec 23, 2014

matsen commented Dec 23, 2014

matsen commented Mar 17, 2016

psathyrella commented Apr 6, 2016

psathyrella commented Nov 4, 2016

Joint probabilities for pair emission? #29

Joint probabilities for pair emission? #29

Comments

psathyrella commented Dec 19, 2014

The basic situation

Independent emissions

Joint emissions

psathyrella commented Dec 20, 2014

matsen commented Dec 20, 2014

psathyrella commented Dec 23, 2014

matsen commented Dec 23, 2014

matsen commented Mar 17, 2016

psathyrella commented Apr 6, 2016

psathyrella commented Nov 4, 2016