-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"upgrade" adam to forked "0.20.0" #581
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I'd say just wait for/ask for a new ADAM release then managing one though
@@ -41,6 +41,9 @@ case class CalledSomaticAllele(sampleName: SampleName, | |||
.newBuilder | |||
.setAlleles(seqAsJavaList(Seq(GenotypeAllele.Ref, GenotypeAllele.Alt))) | |||
.setSampleId(sampleName) | |||
.setContigName(contigName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird, what was happening before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't know, otomh!
the semantics of the downstream adam and htsjdk models here have changed in this upgrade, so a few things have to specified differently than they were before.
it's also possible that these aren't needed now either; I added them while debugging an issue where variant's INFO fields were coming out blank, which turned out to be due to the sampleId
not being set consistently throughout.
I can dig in further if you want.
@@ -16,16 +16,11 @@ | |||
<name>guacamole: variant caller</name> | |||
|
|||
<properties> | |||
<adam.version>0.19.0</adam.version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason you don't like these as properties?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going with the heuristic that if it's used twice, it should be factored out, otherwise it should just be on the dependency-declaration.
Factoring out versions that only appear once seems more verbose and less clear, to me.
lmk if you disagree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd agree that it's not worth it to factor out for a single dependency, but once there seems useful, easy to get a read on the major dependencies and their version ... but probably fine as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I don't think any of these qualify as "major dependencies".
The two that do, spark and hadoop, coincidentally (or not!) are used in multiple places.
@@ -16,7 +16,92 @@ | |||
##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias"> | |||
##INFO=<ID=VQSLOD,Number=1,Type=Float,Description="Log odds ratio of being a true variant versus being false under the trained gaussian mixture model"> | |||
##INFO=<ID=culprit,Number=1,Type=String,Description="The annotation which was the worst performing in the Gaussian mixture model, likely the reason why the variant was filtered out"> | |||
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT tumor.chr20 | |||
##contig=<ID=1,length=249250621,assembly=GRCh37> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just what is generated now? Otherwise, don't really need all the contigs listed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, it's what is generated now.
if there was a more narrow a way to test this i'd be open to it, but this seems like the most straightforward way to have an end-to-end test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No that makes sense, OOC - did these positions match the test positions before or are they different/whatever was output? It's possible more positions were output than the test positions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This predates this PR, but all 25 positions that used to be checked in SomaticStandardRealDataSuite are here, as well as 9 more.
?? Does that seem right? |
No, coveralls is clearly confused on this PR. It just occurred to me that it's likely to do with my caching Let me either make sure that doesn't happen, or remove that change. In general I think zinc is reliable enough at this point that we could save a lot of time not recompiling everything every build, and only rarely (if ever) having to clear caches; having repros for such zinc bugs as cause us to need to clear caches could be useful anyway. |
travis apparently caps out at 3gb
OK, I think I fixed coveralls. |
There's no "managing" to be done; we have the equivalent of a |
this ports us on to most of the non-trivial changes that have happened in ADAM since 0.19.0, using my forked "0.20.0".
ADAM's VCF-writing now requires
Sample.sampleId
to matchGenotype.sampleId
, and I am using the "sampleName" pulled from readsets.args.Base / ReadSets rather than the loaded-from-datasampleName
onRead
.The latter is now unused so I removed it. It might make sense to keep it around explicitly in some way… seems like what ADAM now does with RecordGroups is the right thing, maybe we'll want to use that if we care about that data again.
This change is