Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to Adam 0.17 #311

Merged
merged 1 commit into from Jul 2, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
13 changes: 9 additions & 4 deletions pom.xml
Expand Up @@ -16,7 +16,7 @@
<name>guacamole: variant caller</name>

<properties>
<adam.version>0.16.0</adam.version>
<adam.version>0.17.0</adam.version>
<bdg-formats.version>0.4.0</bdg-formats.version>
<java.version>1.8</java.version>
<scala.version>2.10.3</scala.version>
Expand Down Expand Up @@ -137,7 +137,7 @@
<configuration>
<scalaVersion>${scala.version}</scalaVersion>
<recompileMode>incremental</recompileMode>
<useZincServer>true</useZincServer>
<useZincServer>false</useZincServer>
<args>
<arg>-unchecked</arg>
<arg>-optimise</arg>
Expand Down Expand Up @@ -276,19 +276,24 @@
</dependency>
<dependency>
<groupId>org.bdgenomics.adam</groupId>
<artifactId>adam-core</artifactId>
<artifactId>adam-core_${scala.version.prefix}</artifactId>
<version>${adam.version}</version>
</dependency>
<dependency>
<groupId>org.bdgenomics.adam</groupId>
<artifactId>adam-cli</artifactId>
<artifactId>adam-cli_${scala.version.prefix}</artifactId>
<version>${adam.version}</version>
</dependency>
<dependency>
<groupId>org.bdgenomics.bdg-formats</groupId>
<artifactId>bdg-formats</artifactId>
<version>${bdg-formats.version}</version>
</dependency>
<dependency>
<groupId>org.bdgenomics.qc-metrics</groupId>
<artifactId>qc-metrics-core</artifactId>
<version>0.0.1-SNAPSHOT</version>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on a SNAPSHOT seems poor form, but I don't see the alternative (this code has been split off from core Adam but hasn't had a release yet).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we use this directly? what is causing us to need this dep now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See Concordance.scala, which GermlineStandardCaller depends on.

I'm happy to rip out all that code instead if it's unused.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are you resolving the qc-metrics snapshot from?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's the spot. I didn't configure that—maven knew where to look for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IntelliJ is not understanding where this comes from for me, even though cmdline-Maven finds it

is IJ working for you here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did run into some issues yesterday where command-line maven & IntelliJ got out of sync. IntelliJ seems to have figured things out since yesterday, though there's nothing specific I can point to that I did which fixed it:
screen shot 2015-07-01 at 2 59 30 pm

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still futzing with IntelliJ and trying to get it to see this dependency; this doesn't have to block this PR but it is a little strange that we're seeing such not-understood behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, seems like my intellij project had some broken paths from being moved to a different place on my filesystem, I think I'm all sorted now.

</dependency>
<dependency>
<groupId>colt</groupId>
<artifactId>colt</artifactId>
Expand Down
2 changes: 1 addition & 1 deletion src/main/scala/org/hammerlab/guacamole/Command.scala
Expand Up @@ -19,7 +19,7 @@
package org.hammerlab.guacamole

import org.apache.spark.{ SparkContext, Logging }
import org.bdgenomics.adam.cli.{ Args4j, Args4jBase }
import org.bdgenomics.utils.cli.{ Args4j, Args4jBase }

/**
* Interface for running a command from command line arguments.
Expand Down
6 changes: 3 additions & 3 deletions src/main/scala/org/hammerlab/guacamole/Common.scala
Expand Up @@ -30,7 +30,7 @@ import org.apache.hadoop.fs.{ FileSystem, Path }
import org.apache.hadoop.mapred.FileAlreadyExistsException
import org.apache.spark.rdd.RDD
import org.apache.spark.{ Logging, SparkConf, SparkContext }
import org.bdgenomics.adam.cli.{ Args4jBase, ParquetArgs }
import org.bdgenomics.utils.cli.{ Args4jBase, ParquetArgs }
import org.bdgenomics.adam.rdd.ADAMContext._
import org.bdgenomics.formats.avro.Genotype
import org.hammerlab.guacamole.Concordance.ConcordanceArgs
Expand Down Expand Up @@ -126,7 +126,7 @@ object Common extends Logging {
if (path.endsWith(".vcf")) {
sc.loadGenotypes(path)
} else {
sc.adamLoad(path)
sc.loadParquet(path)
}
}

Expand Down Expand Up @@ -266,7 +266,7 @@ object Common extends Logging {
} else if (outputPath.toLowerCase.endsWith(".vcf")) {
progress("Writing genotypes to VCF file: %s.".format(outputPath))
val sc = subsetGenotypes.sparkContext
subsetGenotypes.toVariantContext.coalesce(1, shuffle = true).adamVCFSave(outputPath)
subsetGenotypes.toVariantContext.coalesce(1, shuffle = true).saveAsVcf(outputPath)
} else {
progress("Writing genotypes to: %s.".format(outputPath))
subsetGenotypes.adamParquetSave(
Expand Down
4 changes: 3 additions & 1 deletion src/main/scala/org/hammerlab/guacamole/Concordance.scala
Expand Up @@ -25,6 +25,7 @@ import org.apache.spark.rdd.RDD
import org.bdgenomics.adam.rdd.ADAMContext._
import org.apache.spark.SparkContext._
import org.bdgenomics.adam.rdd.variation.ConcordanceTable
import org.bdgenomics.adam.rdd.variation.GenotypeConcordanceRDDFunctions
import org.bdgenomics.adam.rich.RichVariant
import org.bdgenomics.formats.avro.{ GenotypeType, Genotype }

Expand Down Expand Up @@ -85,7 +86,8 @@ object Concordance {
val filteredTrueGenotypes = trueGenotypes.filter(relevantVariants)

val sampleName = filteredCalledAlleles.take(1)(0).getSampleId.toString
val sampleAccuracy = filteredCalledAlleles.concordanceWith(filteredTrueGenotypes).collectAsMap()(sampleName)

val sampleAccuracy = new GenotypeConcordanceRDDFunctions(filteredCalledAlleles).concordanceWith(filteredTrueGenotypes).collectAsMap()(sampleName)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryan-williams I assume there's a more concise way to express this but I had trouble finding it.


// We called AND it was called in truth
val truePositives = sampleAccuracy.total(ConcordanceTable.CALLED, ConcordanceTable.CALLED)
Expand Down
4 changes: 2 additions & 2 deletions src/main/scala/org/hammerlab/guacamole/reads/MappedRead.scala
Expand Up @@ -97,8 +97,8 @@ case class MappedRead(
})

override def toString(): String =
"MappedRead(%d, %s, %s, %s)".format(
start,
"MappedRead(%s:%d, %s, %s, %s)".format(
referenceContig, start,
cigar.toString,
mdTagString,
Bases.basesToString(sequence)
Expand Down
Expand Up @@ -61,7 +61,7 @@ class MappedReadSerializer extends Serializer[MappedRead] with CanSerializeMateP

val matePropertiesOpt = read(kryo, input)

val cigar = TextCigarCodec.getSingleton.decode(cigarString)
val cigar = TextCigarCodec.decode(cigarString)
MappedRead(
token,
sequenceArray,
Expand Down
6 changes: 3 additions & 3 deletions src/main/scala/org/hammerlab/guacamole/reads/Read.scala
Expand Up @@ -124,7 +124,7 @@ object Read extends Logging {
val sequenceArray = sequence.map(_.toByte).toArray
val qualityScoresArray = baseQualityStringToArray(baseQualities, sequenceArray.length)

val cigar = TextCigarCodec.getSingleton.decode(cigarString)
val cigar = TextCigarCodec.decode(cigarString)
MappedRead(
token,
sequenceArray,
Expand Down Expand Up @@ -356,7 +356,7 @@ object Read extends Logging {
AlignmentRecordField.recordGroupPredictedMedianInsertSize
)

val adamRecords: RDD[AlignmentRecord] = adamContext.adamLoad(filename, projection = Some(ADAMSpecificProjection))
val adamRecords: RDD[AlignmentRecord] = adamContext.loadParquet(filename, projection = Some(ADAMSpecificProjection))
val sequenceDictionary = new ADAMSpecificRecordSequenceDictionaryRDDAggregator(adamRecords).adamGetSequenceDictionary()

val reads: RDD[Read] = adamRecords.map(fromADAMRecord(_, token))
Expand Down Expand Up @@ -399,7 +399,7 @@ object Read extends Logging {
referenceContig = alignmentRecord.getContig.getContigName.toString.intern(),
alignmentQuality = alignmentRecord.getMapq,
start = alignmentRecord.getStart,
cigar = TextCigarCodec.getSingleton.decode(alignmentRecord.getCigar.toString),
cigar = TextCigarCodec.decode(alignmentRecord.getCigar.toString),
mdTagString = alignmentRecord.getMismatchingPositions.toString,
failedVendorQualityChecks = alignmentRecord.getFailedVendorQualityChecks,
isPositiveStrand = !alignmentRecord.getReadNegativeStrand,
Expand Down
Expand Up @@ -35,7 +35,7 @@ class MappedReadSerializerSuite extends GuacFunSuite with Matchers {
"chr5",
50,
325352323,
TextCigarCodec.getSingleton.decode(""),
TextCigarCodec.decode(""),
mdTagString = "11",
false,
isPositiveStrand = true,
Expand Down Expand Up @@ -85,7 +85,7 @@ class MappedReadSerializerSuite extends GuacFunSuite with Matchers {
"chr5",
50,
325352323,
TextCigarCodec.getSingleton.decode(""),
TextCigarCodec.decode(""),
mdTagString = "11",
false,
isPositiveStrand = true,
Expand Down Expand Up @@ -135,7 +135,7 @@ class MappedReadSerializerSuite extends GuacFunSuite with Matchers {
"chr5",
50,
325352323,
TextCigarCodec.getSingleton.decode(""),
TextCigarCodec.decode(""),
mdTagString = "11",
false,
isPositiveStrand = true,
Expand Down
Expand Up @@ -35,7 +35,7 @@ class MappedReadSuite extends GuacFunSuite with Matchers {
"chr5",
50,
325352323,
TextCigarCodec.getSingleton.decode(""),
TextCigarCodec.decode(""),
mdTagString = "11",
false,
isPositiveStrand = true,
Expand Down Expand Up @@ -86,7 +86,7 @@ class MappedReadSuite extends GuacFunSuite with Matchers {
"chr5",
50,
325352323,
TextCigarCodec.getSingleton.decode(""),
TextCigarCodec.decode(""),
mdTagString = "11",
false,
isPositiveStrand = true,
Expand Down
4 changes: 2 additions & 2 deletions src/test/scala/org/hammerlab/guacamole/util/TestUtil.scala
Expand Up @@ -78,8 +78,8 @@ object TestUtil extends Matchers {
qualityScores.get.map(q => q + 33).map(_.toChar).mkString
} else {
sequence.map(x => '@').mkString
}
}

Read(
sequence,
cigarString = cigar,
Expand Down