[hail] Fix crash in ld_prune because we weren't imputing missing GTs#7653
[hail] Fix crash in ld_prune because we weren't imputing missing GTs#7653danking merged 3 commits intohail-is:masterfrom
Conversation
|
Unfortunately, I'm not sure. @jbloom22 or @alexb-3 -- do you remember if you have a matrix of genotypes and you want to compute the pearson correlation between variants, what is the right thing to do with missing values? Can you just mean impute for those? I looked at the local prune algorithm, and it looks like the missing values are mean imputed: val gtMean = gtSum.toDouble / nPresent
val gtSumAll = gtSum + nMissing * gtMean
val gtSumSqAll = gtSumSq + nMissing * gtMean * gtMean
val gtCenteredLengthRec = 1d / math.sqrt(gtSumSqAll - (gtSumAll * gtSumAll / nSamples)) |
Yeah, that's why I went with this solution; it seemed consistent./ |
|
although, wait, it looks like this also standardizes... |
|
I think this will deflate the value; the better thing would be to omit
terms where either genotype is missing, and then normalize by N_nonmissing.
…On Wed, Dec 4, 2019 at 10:03 Tim Poterba ***@***.***> wrote:
although, wait, it looks like this also standardizes...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7653?email_source=notifications&email_token=ACC577RCHOMU4HDJ5OTORD3QW7BDPA5CNFSM4JVAFXT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF5KDBQ#issuecomment-561684870>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACC577W2MXDIB5MSVMFJZLLQW7BDPANCNFSM4JVAFXTQ>
.
|
This is a big redesign -- we use matrix multiplication to compute the correlations in parallel. It looks like the local prune stuff does mean-center and standardize, so I'll change it to match that. That sound OK? |
|
Makes sense, but what about the following: replace missing values with 0
*after* standardization, so you can still use matrix multiplication; the
only extra thing is computing N_nonmissing for each pair.
…On Wed, Dec 4, 2019 at 4:11 PM Tim Poterba ***@***.***> wrote:
I think this will deflate the value; the better thing would be to omit
terms where either genotype is missing, and then normalize by N_nonmissing.
This is a big redesign -- we use matrix multiplication to compute the
correlations in parallel.
It looks like the local prune stuff does mean-center and standardize, so
I'll change it to match that. That sound OK?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7653?email_source=notifications&email_token=ACC577TF6BFXU3TS7VAFXODQXAMH5A5CNFSM4JVAFXT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF6QCBY#issuecomment-561840391>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACC577W6QWDBDJMC72SCWA3QXAMH5ANCNFSM4JVAFXTQ>
.
|
Yep, that's what we do. |
|
bump |
|
I don't feel qualified to look at this change. |
|
I’ll look for closely once I get to the retreat, but first impression is that centering and normalizing are redundant. |
|
Maybe no longer relevant, but zeroing missings *after* centering is
equivalent to using non-missing terms only rather than mean imputing,
provided you then use N_nonmissing for the final normalization.
…On Tue, Dec 10, 2019 at 8:50 AM Jon Bloom ***@***.***> wrote:
I’ll look for closely once I get to the retreat, but first impression is
that centering and normalizing are redundant.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7653?email_source=notifications&email_token=ACC577VJUORGGYMDZUE72IDQX6NDNA5CNFSM4JVAFXT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGPJKXQ#issuecomment-564041054>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACC577UWWGRAMQHAOV7TGADQX6NDNANCNFSM4JVAFXTQ>
.
|
|
I fixed this; it's a much more obvious change (the unfilter comes before the or_else). Should be reviewable now. |
johnc1231
left a comment
There was a problem hiding this comment.
Jon Bloom signed off at retreat, and code looks fine
Assigned Jackie because I want to make sure this is correct.