Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-phenotype logistic regression #5072

Merged
merged 9 commits into from Jan 5, 2019
Merged

Multi-phenotype logistic regression #5072

merged 9 commits into from Jan 5, 2019

Conversation

@bcajes
Copy link
Contributor

@bcajes bcajes commented Jan 4, 2019

First crack at supporting multi phenotype logistic regression. No matrix optimizations, as is implemented in multi phenotype linear regression, but I attempt to follow a similar approach as far as far as API and single call of mapPartitions.

@tpoterba tpoterba self-assigned this Jan 4, 2019
@tpoterba
Copy link
Collaborator

@tpoterba tpoterba commented Jan 4, 2019

awesome! will look tomorrow.

Copy link
Collaborator

@tpoterba tpoterba left a comment

This is great! Just a few minor comments.

>>> result_ht = hl.logistic_regression_rows(
... test='wald',
... y=[dataset.pheno1, dataset.pheno2], #where pheno1 and pheno2 values are 0, 1, or missing
Copy link
Collaborator

@tpoterba tpoterba Jan 4, 2019

style: two spaces before the # and one space after

pheno2], # where pheno...

Copy link
Contributor Author

@bcajes bcajes Jan 4, 2019

thanks, fixed

y_field = list(f'__y_{i}' for i in range(len(y))) if y_is_list else "__y"

y_dict = dict(zip(y_field, y)) if y_is_list else {y_field: y}
func = Env.hail().methods.LogisticRegression
Copy link
Collaborator

@tpoterba tpoterba Jan 4, 2019

no need to abstract this here - you overloaded the method in scala, which is fine.

Copy link
Contributor Author

@bcajes bcajes Jan 4, 2019

My goal here was to preserve much of the original code path for single phenotype logistic regression. My rational is that this function is probably widely used and this approach seems less risky. The alternative looks like it will require more refactoring, which probably should be done after some burn in time.

Copy link
Collaborator

@tpoterba tpoterba Jan 4, 2019

I just mean that func is only used once below (is that right?) so it could be inlined there. There aren't two different function names being called like in linear regression.

This is fine though.




class LogisticRegressionTest extends SparkSuite {
Copy link
Collaborator

@tpoterba tpoterba Jan 4, 2019

These tests look like they're covered by the Python tests. Is that true?

Copy link
Contributor Author

@bcajes bcajes Jan 4, 2019

Yes, end-to-end functionality is covered by the python tests, but adding scala tests helped with quickly isolating lower level issues.

Copy link
Collaborator

@tpoterba tpoterba Jan 4, 2019

Ah, got it. I'm happy to add them for now, but we may remove those tests when we refactor the scala side.

The lack of easy debug-ability of the Python tests is the biggest problem 😦

Copy link
Contributor Author

@bcajes bcajes Jan 4, 2019

sounds good!

@tpoterba
Copy link
Collaborator

@tpoterba tpoterba commented Jan 4, 2019

looks like there's a weird merge commit in here - mind rebasing to clean up the diff before I take another look?

y_field = list(f'__y_{i}' for i in range(len(y))) if y_is_list else "__y"

y_dict = dict(zip(y_field, y)) if y_is_list else {y_field: y}
func = Env.hail().methods.LogisticRegression
Copy link
Collaborator

@tpoterba tpoterba Jan 4, 2019

I just mean that func is only used once below (is that right?) so it could be inlined there. There aren't two different function names being called like in linear regression.

This is fine though.

batch/Makefile Outdated
@@ -1,9 +1,26 @@
.PHONY: build build-test \
push push-test \
run-docker run \
test test-local deploy
test test-local deploy clean
Copy link
Collaborator

@tpoterba tpoterba Jan 4, 2019

the merge commit seemed to cause problems

rvb.addFields(fullRowType, rv, copiedFieldIndices)
rvb.startArray(_yVecs.cols)
logregAnnotations.foreach(stats => {
//rvb.addFields(_resultSchema.physicalType,rv,) //TODO How to add strcut here?
Copy link
Collaborator

@tpoterba tpoterba Jan 5, 2019

oops, sorry, remove this.

That's the last commend and we're ready to merge!

@danking danking merged commit 390742b into hail-is:master Jan 5, 2019
1 check passed
@tpoterba
Copy link
Collaborator

@tpoterba tpoterba commented Jan 5, 2019

Thanks for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants