Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for importing dosages as call values #5077

Merged
merged 13 commits into from
Jan 14, 2019
Merged

Added support for importing dosages as call values #5077

merged 13 commits into from
Jan 14, 2019

Conversation

tmwong2003
Copy link
Contributor

This PR adds support for optionally loading a dosage values as the call value for genotype loci. This is to support VCF files (such as the ones we use at 23andMe) that have dosages (DS) instead of genotype call information (typically GT).

@tmwong2003 tmwong2003 changed the title Added support for importing dosages into entries Added support for importing dosages as call values Jan 4, 2019
Copy link
Contributor

@tpoterba tpoterba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the ability to load entry floats as float32 is a great change, but I think all the language about "dosage" is a bit use-case-specific and misleading. My proposal would be to add a parameter to import_vcf called entry_float_type or something, which defaults to hl.tfloat64.

This would get passed in as a config param, and would let us do the following in the headerField calculation:

    case (VCFHeaderLineType.Float, false) => entryFloatType

I don't think this will be a huge change from what you have, and will mostly involve deleting code.

hail/python/test/hail/stats/test_linear_mixed_model.py Outdated Show resolved Hide resolved
@tpoterba tpoterba dismissed their stale review January 8, 2019 23:10

looking again

Copy link
Contributor

@tpoterba tpoterba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few more small changes, and good to go.

hail/python/hail/methods/impex.py Outdated Show resolved Hide resolved
hail/python/test/hail/methods/test_impex.py Outdated Show resolved Hide resolved
hail/python/test/hail/methods/test_statgen.py Outdated Show resolved Hide resolved
hail/python/test/hail/methods/test_statgen.py Outdated Show resolved Hide resolved
hail/src/main/scala/is/hail/io/vcf/LoadVCF.scala Outdated Show resolved Hide resolved
.gitignore Outdated Show resolved Hide resolved
@tpoterba tpoterba self-assigned this Jan 9, 2019
@tpoterba
Copy link
Contributor

looks like a compile error in TestUtils.scala.

The rest looks good, will approve when tests pass

@tmwong2003
Copy link
Contributor Author

looks like a compile error in TestUtils.scala.

The rest looks good, will approve when tests pass

I fixed TestUtils.scala; the issue was a missing parameter in a call to the (changed) MatrixVCFReader.

Should I rebase and squash all of my commits into a single commit, or are you OK with merging my branch with the discrete feature commits (i.e., non-"Merge remote-tracking branch" commits)?

Copy link
Contributor

@tpoterba tpoterba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me!

@danking danking merged commit 8097463 into hail-is:master Jan 14, 2019
@tmwong2003 tmwong2003 deleted the tmwong2003/merge-dosage-patch branch January 14, 2019 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants