Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-phenotype logistic regression #5072

Merged
merged 9 commits into from Jan 5, 2019

Conversation

Projects
None yet
3 participants
@bcajes
Copy link
Contributor

commented Jan 4, 2019

First crack at supporting multi phenotype logistic regression. No matrix optimizations, as is implemented in multi phenotype linear regression, but I attempt to follow a similar approach as far as far as API and single call of mapPartitions.

@tpoterba tpoterba self-assigned this Jan 4, 2019

@tpoterba

This comment has been minimized.

Copy link
Collaborator

commented Jan 4, 2019

awesome! will look tomorrow.

@tpoterba
Copy link
Collaborator

left a comment

This is great! Just a few minor comments.

>>> result_ht = hl.logistic_regression_rows(
... test='wald',
... y=[dataset.pheno1, dataset.pheno2], #where pheno1 and pheno2 values are 0, 1, or missing

This comment has been minimized.

Copy link
@tpoterba

tpoterba Jan 4, 2019

Collaborator

style: two spaces before the # and one space after

pheno2], # where pheno...

This comment has been minimized.

Copy link
@bcajes

bcajes Jan 4, 2019

Author Contributor

thanks, fixed

y_field = list(f'__y_{i}' for i in range(len(y))) if y_is_list else "__y"

y_dict = dict(zip(y_field, y)) if y_is_list else {y_field: y}
func = Env.hail().methods.LogisticRegression

This comment has been minimized.

Copy link
@tpoterba

tpoterba Jan 4, 2019

Collaborator

no need to abstract this here - you overloaded the method in scala, which is fine.

This comment has been minimized.

Copy link
@bcajes

bcajes Jan 4, 2019

Author Contributor

My goal here was to preserve much of the original code path for single phenotype logistic regression. My rational is that this function is probably widely used and this approach seems less risky. The alternative looks like it will require more refactoring, which probably should be done after some burn in time.

This comment has been minimized.

Copy link
@tpoterba

tpoterba Jan 4, 2019

Collaborator

I just mean that func is only used once below (is that right?) so it could be inlined there. There aren't two different function names being called like in linear regression.

This is fine though.




class LogisticRegressionTest extends SparkSuite {

This comment has been minimized.

Copy link
@tpoterba

tpoterba Jan 4, 2019

Collaborator

These tests look like they're covered by the Python tests. Is that true?

This comment has been minimized.

Copy link
@bcajes

bcajes Jan 4, 2019

Author Contributor

Yes, end-to-end functionality is covered by the python tests, but adding scala tests helped with quickly isolating lower level issues.

This comment has been minimized.

Copy link
@tpoterba

tpoterba Jan 4, 2019

Collaborator

Ah, got it. I'm happy to add them for now, but we may remove those tests when we refactor the scala side.

The lack of easy debug-ability of the Python tests is the biggest problem 😦

This comment has been minimized.

Copy link
@bcajes

bcajes Jan 4, 2019

Author Contributor

sounds good!

@tpoterba

This comment has been minimized.

Copy link
Collaborator

commented Jan 4, 2019

looks like there's a weird merge commit in here - mind rebasing to clean up the diff before I take another look?

y_field = list(f'__y_{i}' for i in range(len(y))) if y_is_list else "__y"

y_dict = dict(zip(y_field, y)) if y_is_list else {y_field: y}
func = Env.hail().methods.LogisticRegression

This comment has been minimized.

Copy link
@tpoterba

tpoterba Jan 4, 2019

Collaborator

I just mean that func is only used once below (is that right?) so it could be inlined there. There aren't two different function names being called like in linear regression.

This is fine though.

@@ -1,9 +1,26 @@
.PHONY: build build-test \
push push-test \
run-docker run \
test test-local deploy
test test-local deploy clean

This comment has been minimized.

Copy link
@tpoterba

tpoterba Jan 4, 2019

Collaborator

the merge commit seemed to cause problems

@bcajes bcajes force-pushed the bcajes:master branch from e3ea70d to 19f14f9 Jan 4, 2019

rvb.addFields(fullRowType, rv, copiedFieldIndices)
rvb.startArray(_yVecs.cols)
logregAnnotations.foreach(stats => {
//rvb.addFields(_resultSchema.physicalType,rv,) //TODO How to add strcut here?

This comment has been minimized.

Copy link
@tpoterba

tpoterba Jan 5, 2019

Collaborator

oops, sorry, remove this.

That's the last commend and we're ready to merge!

@danking danking merged commit 390742b into hail-is:master Jan 5, 2019

1 check passed

hail-ci-0-1 successful build
Details
@tpoterba

This comment has been minimized.

Copy link
Collaborator

commented Jan 5, 2019

Thanks for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.