Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wdlupdate #23

Merged
merged 29 commits into from Jul 23, 2018

Conversation

@bshifaw
Copy link
Contributor

commented Jun 18, 2018

  • added haplotypcaller nio, added disk sizing within task.
  • update gatk image to 4.0.6.0
  • minor correction to picard command in JD
bshifaw added 28 commits Apr 1, 2018
set -e
set -o pipefail

samtools view -h -T ${ref_fasta} ${input_cram} |

This comment has been minimized.

Copy link
@EvanTheB

EvanTheB Jul 18, 2018

I am wondering why a cram to bam task has been added. I was able to successfully run this wdl (pre CramtoBam) with a cram file as input. HaplotypeCaller seemed to handle this fine.

However there were some minor numerical differences in the output (I think they were minor, I am not sure). I asked about this here: gatkforums However I have not had a chance to get to the bottom of the issue.

Beside that point, there is also an issue with samtools vs gatk calculation of the NM and MD values. samtools . If the cram does not contain NM and MD (default cram behaviour), then samtools will use its own method to calculate those values. The method is soon to be standardised in hts-specs, but I'm not sure how gatk plans to resolve the difference.

Lastly the pipe is unnecessary, you can directly add -b -o to the first samtools line.

This comment has been minimized.

Copy link
@bshifaw

bshifaw Jul 23, 2018

Author Contributor

Our developers recommend using the C implementation of cram parsing via samtools rather than the Java implementation via GATK because the Java version hasn't been validated for accuracy. We've also found that converting back to cram before HaplotypeCaller leads to better performance.

Thanks for the feedback, I've directed the developers to this comment and the forum post you mentioned.

@bshifaw bshifaw requested a review from ldgauthier Jul 18, 2018
docker: docker
memory: select_first([machine_mem_gb, 15]) + " GB"
disks: "local-disk " + select_first([disk_space_gb, disk_size]) + if use_ssd then " SSD" else " HDD"
preemptibe: preemptible_attempts

This comment has been minimized.

Copy link
@ldgauthier

ldgauthier Jul 23, 2018

preemptible is spelled wrong. I'm surprised this doesn't generate an error.

This comment has been minimized.

Copy link
@bshifaw

bshifaw Jul 23, 2018

Author Contributor

Good catch, womtool couldn't pick it up and the workflow ran fine on firecloud. Perhaps it was ignored.


"##_COMMENT4": "MISCELLANEOUS PARAMETERS",
"#HaplotypeCallerGvcf_GATK4.HaplotypeCaller.make_gvcf": "True",
"#HaplotypeCallerGvcf_GATK4.HaplotypeCaller.contamination": "(optional) Float?",

This comment has been minimized.

Copy link
@ldgauthier

ldgauthier Jul 23, 2018

Why are some optionals in the front and some in the back?

This comment has been minimized.

Copy link
@bshifaw

bshifaw Jul 23, 2018

Author Contributor

The latest inputs are generated by womtool which places optional at the end while while the older variables used wdltool which places optional at the beginning. I'll shift to womtool format for all to make it more uniform.

@bshifaw bshifaw merged commit b9bbbdc into master Jul 23, 2018
@github-fish

This comment has been minimized.

Copy link

commented Sep 18, 2018

hi, I want to thanks in advance.
I am running this pipeline with data offered in its json file on my Pouta Cloud VM (24cPCUs and 117.2G RAM and 900G storage, centOS7). Swap 99G.
I got 50 warn and errors like this (for all 50 cram):
[2018-09-18 02:27:40,18] [warn] Localization via hard link has failed: /home/cloud-user/gatk4-germline-snps-indels/work-log/cromwell-executions/HaplotypeCallerGvcf_GATK4/4369c020-c653-4180-a856-06fd773e1bce/call-HaplotypeCaller/shard-37/inputs/952256031/NA12878_NA12878.bai -> /home/cloud-user/gatk4-germline-snps-indels/work-log/cromwell-executions/HaplotypeCallerGvcf_GATK4/4369c020-c653-4180-a856-06fd773e1bce/call-CramToBamTask/execution/NA12878_NA12878.bai: Operation not permitted
[2018-09-18 02:27:40,21] [error] BackgroundConfigAsyncJobExecutionActor [4369c020HaplotypeCallerGvcf_GATK4.HaplotypeCaller:37:1]: Error attempting to Execute
java.lang.Exception: Failed command instantiation
at cromwell.backend.standard.StandardAsyncExecutionActor.instantiatedCommand(StandardAsyncExecutionActor.scala:536)
at cromwell.backend.standard.StandardAsyncExecutionActor.instantiatedCommand$(StandardAsyncExecutionActor.scala:471)
at cromwell.backend.impl.sfs.config.BackgroundConfigAsyncJobExecutionActor.instantiatedCommand$lzycompute(ConfigAsyncJobExecutionActor.scala:193)
at cromwell.backend.impl.sfs.config.BackgroundConfigAsyncJobExecutionActor.instantiatedCommand(ConfigAsyncJobExecutionActor.scala:193)
at cromwell.backend.standard.StandardAsyncExecutionActor.commandScriptContents(StandardAsyncExecutionActor.scala:265)
at cromwell.backend.standard.StandardAsyncExecutionActor.commandScriptContents$(StandardAsyncExecutionActor.scala:264)
at cromwell.backend.impl.sfs.config.BackgroundConfigAsyncJobExecutionActor.commandScriptContents(ConfigAsyncJobExecutionActor.scala:193)
at cromwell.backend.sfs.SharedFileSystemAsyncJobExecutionActor.writeScriptContents(SharedFileSystemAsyncJobExecutionActor.scala:141)
at cromwell.backend.sfs.SharedFileSystemAsyncJobExecutionActor.writeScriptContents$(SharedFileSystemAsyncJobExecutionActor.scala:140)
at cromwell.backend.impl.sfs.config.BackgroundConfigAsyncJobExecutionActor.cromwell$backend$sfs$BackgroundAsyncJobExecutionActor$$super$writeScriptContents(ConfigAsyncJobExecutionActor.scala:193)
at cromwell.backend.sfs.BackgroundAsyncJobExecutionActor.writeScriptContents(BackgroundAsyncJobExecutionActor.scala:12)
at cromwell.backend.sfs.BackgroundAsyncJobExecutionActor.writeScriptContents$(BackgroundAsyncJobExecutionActor.scala:11)
at cromwell.backend.impl.sfs.config.BackgroundConfigAsyncJobExecutionActor.writeScriptContents(ConfigAsyncJobExecutionActor.scala:193)
at cromwell.backend.sfs.SharedFileSystemAsyncJobExecutionActor.execute(SharedFileSystemAsyncJobExecutionActor.scala:124)
at cromwell.backend.sfs.SharedFileSystemAsyncJobExecutionActor.execute$(SharedFileSystemAsyncJobExecutionActor.scala:121)
at cromwell.backend.impl.sfs.config.BackgroundConfigAsyncJobExecutionActor.execute(ConfigAsyncJobExecutionActor.scala:193)
at cromwell.backend.standard.StandardAsyncExecutionActor.$anonfun$executeAsync$1(StandardAsyncExecutionActor.scala:599)
at scala.util.Try$.apply(Try.scala:209)
at cromwell.backend.standard.StandardAsyncExecutionActor.executeAsync(StandardAsyncExecutionActor.scala:599)
at cromwell.backend.standard.StandardAsyncExecutionActor.executeAsync$(StandardAsyncExecutionActor.scala:599)
at cromwell.backend.impl.sfs.config.BackgroundConfigAsyncJobExecutionActor.executeAsync(ConfigAsyncJobExecutionActor.scala:193)
at cromwell.backend.standard.StandardAsyncExecutionActor.executeOrRecover(StandardAsyncExecutionActor.scala:912)
at cromwell.backend.standard.StandardAsyncExecutionActor.executeOrRecover$(StandardAsyncExecutionActor.scala:904)
at cromwell.backend.impl.sfs.config.BackgroundConfigAsyncJobExecutionActor.executeOrRecover(ConfigAsyncJobExecutionActor.scala:193)
at cromwell.backend.async.AsyncBackendJobExecutionActor.$anonfun$robustExecuteOrRecover$1(AsyncBackendJobExecutionActor.scala:65)
at cromwell.core.retry.Retry$.withRetry(Retry.scala:37)
at cromwell.backend.async.AsyncBackendJobExecutionActor.withRetry(AsyncBackendJobExecutionActor.scala:61)
at cromwell.backend.async.AsyncBackendJobExecutionActor.cromwell$backend$async$AsyncBackendJobExecutionActor$$robustExecuteOrRecover(AsyncBackendJobExecutionActor.scala:65)
at cromwell.backend.async.AsyncBackendJobExecutionActor$$anonfun$receive$1.applyOrElse(AsyncBackendJobExecutionActor.scala:88)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
at akka.actor.Actor.aroundReceive(Actor.scala:514)
at akka.actor.Actor.aroundReceive$(Actor.scala:512)
at cromwell.backend.impl.sfs.config.BackgroundConfigAsyncJobExecutionActor.aroundReceive(ConfigAsyncJobExecutionActor.scala:193)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:527)
at akka.actor.ActorCell.invoke(ActorCell.scala:496)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: common.exception.AggregatedMessageException: Error(s):
:
Could not localize /home/cloud-user/gatk4-germline-snps-indels/work-log/cromwell-executions/HaplotypeCallerGvcf_GATK4/4369c020-c653-4180-a856-06fd773e1bce/call-CramToBamTask/execution/NA12878_NA12878.bam -> /home/cloud-user/gatk4-germline-snps-indels/work-log/cromwell-executions/HaplotypeCallerGvcf_GATK4/4369c020-c653-4180-a856-06fd773e1bce/call-HaplotypeCaller/shard-37/inputs/952256031/NA12878_NA12878.bam:
/home/cloud-user/gatk4-germline-snps-indels/work-log/cromwell-executions/HaplotypeCallerGvcf_GATK4/4369c020-c653-4180-a856-06fd773e1bce/call-CramToBamTask/execution/NA12878_NA12878.bam doesn't exist
/home/cloud-user/gatk4-germline-snps-indels/work-log/cromwell-executions/HaplotypeCallerGvcf_GATK4/4369c020-c653-4180-a856-06fd773e1bce/call-HaplotypeCaller/shard-37/inputs/952256031/NA12878_NA12878.bam -> /home/cloud-user/gatk4-germline-snps-indels/work-log/cromwell-executions/HaplotypeCallerGvcf_GATK4/4369c020-c653-4180-a856-06fd773e1bce/call-CramToBamTask/execution/NA12878_NA12878.bam: Operation not permitted
/home/cloud-user/gatk4-germline-snps-indels/work-log/cromwell-executions/HaplotypeCallerGvcf_GATK4/4369c020-c653-4180-a856-06fd773e1bce/call-CramToBamTask/execution/NA12878_NA12878.bam -> /home/cloud-user/gatk4-germline-snps-indels/work-log/cromwell-executions/HaplotypeCallerGvcf_GATK4/4369c020-c653-4180-a856-06fd773e1bce/call-HaplotypeCaller/shard-37/inputs/952256031/NA12878_NA12878.bam.tmp: No space left on device
at common.validation.Validation$ValidationTry$.toTry$extension1(Validation.scala:60)
at common.validation.Validation$ValidationTry$.toTry$extension0(Validation.scala:56)
at cromwell.backend.standard.StandardAsyncExecutionActor.instantiatedCommand(StandardAsyncExecutionActor.scala:534)
... 42 common frames omitted

I already added my current user to docker user group and chmod +x and chmod +w for all input files before I run wdl script.
I don´t know how to fix this.
And, to run this process, it cost 5.5hours to throw out error. Then I only offer 3 scattered_calling_intervals_temp_0001_of_50_scattered.interval_list, it still cost a long time to run, do we have a smarter way to reduce the test time for this pipeline?

@github-fish

This comment has been minimized.

Copy link

commented Sep 19, 2018

Report recently result, this workflow finished successfully even it still report warn:Localization via hard link has failed. I didn´t get the reason why it worked 2nd time. I only changed the hg38_wgs_scattered_calling_intervals.txt file, left first 3 scattered.interval_list and I start to run this wdl when I am inside of the folder which contains the wdl and json file. however, i always use absolute path to run the wdl script, i think this shouldn't be a reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.