Tools for using Picard and GATK with the Google Genomics API.
Java Shell
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
lib
src/main
testdata
.gitignore
CONTRIBUTING.rst
LICENSE
README.md
build.xml
pom.xml

README.md

gatk-tools-java

Tools for using Picard and GATK with Genomics API.

  • Common classes for getting Reads from GA4GH Genomics API and exposing them as SAMRecord "Iterable" resource.

  • Implementation of a custom reader that can be plugged into Picard tools to handle reading of the input data specified via a url and coming from GA4GH API.

  • A set of shell scripts (src/main/scripts) that demonstrate how to run Picard tools with Ga4GH custom reader.

  • Requires htsjdk version 1.128 and greater and Picard latest version (past this commit https://github.com/iliat/picard/commit/ebe987313d799d58b0673351b95d3ca91fed82bf).

  • You can download Picard from: http://broadinstitute.github.io/picard/ and build it according to the instructions.

Build:
To build with ant: ant gatk-tools-java-jar.

Note that examples below assume you have built with ant, it produces dist/gatk-tools-java-1.0.jar The following examples assume you have picard folder side by side with gatk-tools-java.

The typical command line would look like:

java -jar \  
-Dsamjdk.custom_reader=https://www.googleapis.com/genomics,<location of gatk-tools-java jar> \  
-Dga4gh.client_secrets=<location of client_secrets.json>  \   
dist/picard.jar <ToolName> \  
INPUT=<input url>  

E.g

java -jar \
-Dsamjdk.custom_reader=https://www.googleapis.com/genomics,com.google.cloud.genomics.gatk.htsjdk.GA4GHReaderFactory,\
`pwd`/dist/gatk-tools-java-1.0.jar \  
-Dga4gh.client_secrets=client_secrets.json \  
../picard/dist/picard.jar ViewSam \  
INPUT=https://www.googleapis.com/genomics/v1beta2/readgroupsets/CK256frpGBD44IWHwLP22R4/  

The test read group set used here is the ex1_sorted.bam that can be found in testdata/ folder.
The data has been uploaded to the cloud project: https://console.developers.google.com/storage/browser/gatk-tools-java/.

The dataset id is: 15448427866823121459 and the read group set id is CK256frpGBD44IWHwLP22R4.

To build with Maven: mvn compile mvn bundle:bundle.
Note that Maven build produces gatk-tools-java-1.1-SNAPSHOT.jar.

  • For Picard tools that have not yet been instrumented to work with a custom reader, you can use Ga4GHPicardRunner. It is a wrapper around Picard tools that allows for INPUTS into Picard tools to be ga4gh:// urls by consuming the data via the API and using pipes to send it to Picard tool.