Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[query] Lower linear SKAT #12637

Merged
merged 23 commits into from
Feb 3, 2023
Merged

[query] Lower linear SKAT #12637

merged 23 commits into from
Feb 3, 2023

Conversation

danking
Copy link
Contributor

@danking danking commented Jan 30, 2023

CHANGELOG: Query-on-Batch now supports hl.skat(..., logistic=False).

I also added actual tests for hl.skat, which were lost at some point.

I am somewhat not confident in my documentation and comments, because the SKAT paper is terse and unclear. I would really apprecaiate strong criticism of the documentation and the code comments.

CHANGELOG: Query-on-Batch now supports `hl.skat(..., logistic=False)`.

I also added actual tests for `hl.skat`, which were lost at some point.

I am somewhat not confident in my documentation and comments, because the SKAT paper is terse and unclear. I would really apprecaiate strong criticism of the documentation and the code comments.
@danking
Copy link
Contributor Author

danking commented Jan 30, 2023

These are the rendered docs. They are not included in the actual docs because I do not want people using this function yet.

Screen Shot 2023-01-30 at 11 31 24

Screen Shot 2023-01-30 at 11 31 29

Screen Shot 2023-01-30 at 11 31 33

Screen Shot 2023-01-30 at 11 31 39

Copy link
Collaborator

@patrick-schultz patrick-schultz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some notes after a first pass

Comment on lines 1580 to 1581
4. Multiplying an orthogonal matrix by a vector of independent normal variables produces a new
vector of independent normal variables.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only true if you replace "independent" with "i.i.d."

.. math::

\begin{align*}
U \Lambda U^T &= G W G^T \quad\quad U \textrm{ orthonormal, } \Lambda \textrm{ diagonal} \\
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

U orthogonal

Comment on lines 1839 to 1841
ht = ht.annotate(
Q = (((ht.y_residual @ ht.G) * ht.weight) @ ht.G.T) @ ht.y_residual.T
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More efficient:

((ht.y_residual @ ht.G).map(lambda x: x**2) * ht.weight).sum(0)

#
# We avoid the square root in order to avoid complex numbers.

Q, _ = hl.nd.qr(ht.covmat)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already computed this


Q, _ = hl.nd.qr(ht.covmat)
C0 = Q.T @ ht.G
singular_values = hl.nd.svd(ht.G.T @ (ht.G * ht.weight) - C0.T @ (C0 * ht.weight), compute_uv=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in zulip, I think a better way to compute this is:

R = nd.qr(ht.G - Q @ (Q.T @ G), mode="r")
singular_values = hl.nd.svd(R, compute_uv=False)
eigenvalues = singular_values.map(lambda x: x**2)

(replace singular_values with eigenvalues below)

@danking
Copy link
Contributor Author

danking commented Jan 31, 2023

Comments addressed. Still gotta work on the verbiage to clearly explain the present of P_0.

@danking
Copy link
Contributor Author

danking commented Feb 1, 2023

@patrick-schultz Alright. This passes tests locally. I'm happy with the docs verbiage. I am eager for your review!

Screen Shot 2023-02-01 at 17 54 00

Screen Shot 2023-02-01 at 17 54 10

Copy link
Collaborator

@patrick-schultz patrick-schultz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few docs comments. Everything else looks great!

.. math::

\begin{align*}
X &: R^{N \times K} \\
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might mention that X is covariates. I was momentarily confused since in the args x is the genotypes.

Comment on lines 1582 to 1583
h &\sim N(0, 1) \\
h &= \frac{1}{\widehat{\sigma}} r \\
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what the standard stats idiom is, but these feel backwards to me. I want to read this as "if we define h = ..., then h is distributed as N(0, 1)". The other way sounds like "draw a random normal variate h, then h = this other thing we already computed".

.. math::

\begin{align*}
U \Lambda U &= B \quad\quad \Lambda \textrm{ orthogonal } U \textrm{ diagonal} \\
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be U \Lambda U^T, Lambda diagonal, U orthogonal

# B = A A.T
# Q = h.T B h
#
# This is called a "quadratic form". It is a weighted sum of the squares of the entries of h,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite. It's a weighted sum of products of pairs of entries of h (w_{00} h_0^2 + w_{01} h_0 h_1 + ....). The eigendecomposition converts it to a sum of squares of normals.

Comment on lines 1942 to 1944
# Since B is a real symmetric matrix, U is orthogonal. U and W are not necessarily the same
# matrix but their determinants are +-1 so the squared singular values and eigenvalues differ by
# at most a sign.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

U and W are the same, in the sense that eigenvectors / singular vectors are only defined up to sign anyways. But we don't care about the eigenvectors, do we?

The determinant bit is confusing (their determinants are +-1 because they're orthogonal, but that doesn't tell us anything), and the squared singular values are equal to the eigenvalues; no sign ambiguity there. B is pos. def., it has positive eigenvalues.

I think you can just drop this paragraph.

).or_error(hl.format('hl._linear_skat: every weight must be positive, in group %s, the weights were: %s',
ht.group, weights_arr))
singular_values = hl.nd.svd(A, compute_uv=False)
# SVD(M) = U S V. U and V are unitary, therefore SVD(k M) = U (k S) V.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what this is about

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the next line, we scale the singular values by \sigma^2 instead of multiplying A by sqrt(\sigma^2).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a blank line to clarify to which expression the comment applies.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. That doesn't have anything to do with unitarity. Scalars commute with all matrices.

Comment on lines 1976 to 1982
# I *think* the reasoning for taking the complement of the CDF value is:
#
# 1. Q is a measure of variance and thus positive.
#
# 2. We want to know the probability of obtaining a variance even larger ("more extreme")
#
# Ergo, we want to check the right-tail of the distribution.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly

@danking danking dismissed patrick-schultz’s stale review February 2, 2023 19:51

I addressed every comment except for the one about the SVD(k M) comment; let me know if that makes sense now.

.. math::

\begin{align*}
U \Lambda U &= B \quad\quad \Lambda \textrm{ diagonal } U \textrm{ orthogonal} \\
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still missing a transpose: U \Lambda U.T

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah oops

@danking danking dismissed patrick-schultz’s stale review February 3, 2023 21:36

fixed that one and a couple other ones where the transpose was missing

@danking danking merged commit 743db89 into hail-is:main Feb 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants