Skip to content

Conversation

@karenfeng
Copy link
Collaborator

@karenfeng karenfeng commented Jun 25, 2020

What changes are proposed in this pull request?

Adds the following input validation steps to GloWGR.

  • During the creation of the block genotype matrix, filters out any variants with uniform values. We issue a warning if any rows are dropped. This serves as a preventative measure to the current error that is raised if sig == 0 during assemble_block.
  • During any of the ridge model functions:
    • Raise an error if any values are missing in the covariate or label DataFrames
    • Issue a warning if the labels are not mean-centered at 0 with unit variance.

To cut down slightly on runtime, I also wrote out the level 1-reduced GT matrix to use in tests that don't require the original block GT matrix.

How is this patch tested?

  • Unit tests
  • Integration tests
  • Manual tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
@codecov
Copy link

codecov bot commented Jun 25, 2020

Codecov Report

Merging #240 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #240   +/-   ##
=======================================
  Coverage   93.75%   93.75%           
=======================================
  Files          90       90           
  Lines        4339     4340    +1     
  Branches      406      379   -27     
=======================================
+ Hits         4068     4069    +1     
  Misses        271      271           
Impacted Files Coverage Δ
...ckvariantsandsamples/VariantSampleBlockMaker.scala 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3b8f10b...3a915e9. Read the comment docs.

Signed-off-by: Karen Feng <karen.feng@databricks.com>
@karenfeng karenfeng changed the title Perform GlowGR input validation Perform GloWGR input validation Jun 25, 2020
…ut-validation

Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
"""
__assert_all_present(labeldf, 'label')
__check_standardized(labeldf, 'label')
__assert_all_present(covdf, 'covariate')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also check standardization of covariates here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks for the review! Can you take a second look?

Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Copy link
Contributor

@LelandBarnard LelandBarnard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Copy link
Contributor

@henrydavidge henrydavidge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a few comments

Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Copy link
Contributor

@henrydavidge henrydavidge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit, LGTM after addressing.

Signed-off-by: Karen Feng <karen.feng@databricks.com>
@karenfeng karenfeng merged commit 232f02d into projectglow:master Jun 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants