-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[query] Lower linear SKAT #12637
[query] Lower linear SKAT #12637
Conversation
CHANGELOG: Query-on-Batch now supports `hl.skat(..., logistic=False)`. I also added actual tests for `hl.skat`, which were lost at some point. I am somewhat not confident in my documentation and comments, because the SKAT paper is terse and unclear. I would really apprecaiate strong criticism of the documentation and the code comments.
2892148
to
6f5e795
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some notes after a first pass
hail/python/hail/methods/statgen.py
Outdated
4. Multiplying an orthogonal matrix by a vector of independent normal variables produces a new | ||
vector of independent normal variables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only true if you replace "independent" with "i.i.d."
hail/python/hail/methods/statgen.py
Outdated
.. math:: | ||
|
||
\begin{align*} | ||
U \Lambda U^T &= G W G^T \quad\quad U \textrm{ orthonormal, } \Lambda \textrm{ diagonal} \\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
U
orthogonal
hail/python/hail/methods/statgen.py
Outdated
ht = ht.annotate( | ||
Q = (((ht.y_residual @ ht.G) * ht.weight) @ ht.G.T) @ ht.y_residual.T | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More efficient:
((ht.y_residual @ ht.G).map(lambda x: x**2) * ht.weight).sum(0)
hail/python/hail/methods/statgen.py
Outdated
# | ||
# We avoid the square root in order to avoid complex numbers. | ||
|
||
Q, _ = hl.nd.qr(ht.covmat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already computed this
hail/python/hail/methods/statgen.py
Outdated
|
||
Q, _ = hl.nd.qr(ht.covmat) | ||
C0 = Q.T @ ht.G | ||
singular_values = hl.nd.svd(ht.G.T @ (ht.G * ht.weight) - C0.T @ (C0 * ht.weight), compute_uv=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in zulip, I think a better way to compute this is:
R = nd.qr(ht.G - Q @ (Q.T @ G), mode="r")
singular_values = hl.nd.svd(R, compute_uv=False)
eigenvalues = singular_values.map(lambda x: x**2)
(replace singular_values
with eigenvalues
below)
Comments addressed. Still gotta work on the verbiage to clearly explain the present of P_0. |
03a8191
to
a45a1b7
Compare
a45a1b7
to
ce65d52
Compare
ce65d52
to
5bd1f52
Compare
@patrick-schultz Alright. This passes tests locally. I'm happy with the docs verbiage. I am eager for your review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few docs comments. Everything else looks great!
hail/python/hail/methods/statgen.py
Outdated
.. math:: | ||
|
||
\begin{align*} | ||
X &: R^{N \times K} \\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might mention that X
is covariates. I was momentarily confused since in the args x
is the genotypes.
hail/python/hail/methods/statgen.py
Outdated
h &\sim N(0, 1) \\ | ||
h &= \frac{1}{\widehat{\sigma}} r \\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what the standard stats idiom is, but these feel backwards to me. I want to read this as "if we define h = ..., then h is distributed as N(0, 1)". The other way sounds like "draw a random normal variate h, then h = this other thing we already computed".
hail/python/hail/methods/statgen.py
Outdated
.. math:: | ||
|
||
\begin{align*} | ||
U \Lambda U &= B \quad\quad \Lambda \textrm{ orthogonal } U \textrm{ diagonal} \\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be U \Lambda U^T
, Lambda diagonal, U orthogonal
hail/python/hail/methods/statgen.py
Outdated
# B = A A.T | ||
# Q = h.T B h | ||
# | ||
# This is called a "quadratic form". It is a weighted sum of the squares of the entries of h, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite. It's a weighted sum of products of pairs of entries of h (w_{00} h_0^2 + w_{01} h_0 h_1 + ....
). The eigendecomposition converts it to a sum of squares of normals.
hail/python/hail/methods/statgen.py
Outdated
# Since B is a real symmetric matrix, U is orthogonal. U and W are not necessarily the same | ||
# matrix but their determinants are +-1 so the squared singular values and eigenvalues differ by | ||
# at most a sign. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
U and W are the same, in the sense that eigenvectors / singular vectors are only defined up to sign anyways. But we don't care about the eigenvectors, do we?
The determinant bit is confusing (their determinants are +-1 because they're orthogonal, but that doesn't tell us anything), and the squared singular values are equal to the eigenvalues; no sign ambiguity there. B
is pos. def., it has positive eigenvalues.
I think you can just drop this paragraph.
).or_error(hl.format('hl._linear_skat: every weight must be positive, in group %s, the weights were: %s', | ||
ht.group, weights_arr)) | ||
singular_values = hl.nd.svd(A, compute_uv=False) | ||
# SVD(M) = U S V. U and V are unitary, therefore SVD(k M) = U (k S) V. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what this is about
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the next line, we scale the singular values by \sigma^2
instead of multiplying A by sqrt(\sigma^2)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a blank line to clarify to which expression the comment applies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. That doesn't have anything to do with unitarity. Scalars commute with all matrices.
hail/python/hail/methods/statgen.py
Outdated
# I *think* the reasoning for taking the complement of the CDF value is: | ||
# | ||
# 1. Q is a measure of variance and thus positive. | ||
# | ||
# 2. We want to know the probability of obtaining a variance even larger ("more extreme") | ||
# | ||
# Ergo, we want to check the right-tail of the distribution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly
I addressed every comment except for the one about the SVD(k M) comment; let me know if that makes sense now.
hail/python/hail/methods/statgen.py
Outdated
.. math:: | ||
|
||
\begin{align*} | ||
U \Lambda U &= B \quad\quad \Lambda \textrm{ diagonal } U \textrm{ orthogonal} \\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still missing a transpose: U \Lambda U.T
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah oops
fixed that one and a couple other ones where the transpose was missing
CHANGELOG: Query-on-Batch now supports
hl.skat(..., logistic=False)
.I also added actual tests for
hl.skat
, which were lost at some point.I am somewhat not confident in my documentation and comments, because the SKAT paper is terse and unclear. I would really apprecaiate strong criticism of the documentation and the code comments.