[WIP] STAR Workflow v2.0.0 by claymcleod · Pull Request #1 · stjudecloud/workflows

claymcleod · 2019-06-30T21:34:31Z

This PR is an implementation of the new STAR alignment workflow for RNA-seq (v2.0). You can follow the discussion on what features the pipeline will have on the RFC pull request.

claymcleod · 2019-06-30T21:35:02Z

@a-frantz Let's use this repo/branch to develop out the workflow together.

…till sketching) - Renamed `bio-base` to `bioinformatics-base`. - Cleans up multiple layers in the `bioinformatics-base` image. - Move images to their own folders. Still sketching out how these might should look.

zaeleus · 2019-07-19T21:29:21Z

+ENV PATH /opt/conda/bin:$PATH
+
+RUN apt-get update && \
+    apt-get upgrade -y && \


Suggested change

apt-get upgrade -y && \

Assume the base image is already up-to-date.

Can you explain a little further why this would be a best practice? Just curious from your perspective.

It's the responsibility of the base image to maintain and update core packages periodically. Anything else can be updated/installed individually.

…orkflows into rnaseq-workflow

… STAR.

zaeleus · 2019-08-07T14:19:53Z

Integer values for memory requirements are in bytes, e.g., 75000 = ~73 KiB. Use a string with a unit instead: "75 GiB".

a-frantz · 2019-08-07T14:34:03Z

Integer values for memory requirements are in bytes, e.g., 75000 = ~73 KiB. Use a string with a unit instead: "75 GiB".

Just using an integer works everywhere else we've used it. I know on LSF there's a setting in the config file to change default units, would that be a problem if that setting was, say GB instead of bytes or whatever the default value that's currently being used?

zaeleus · 2019-08-07T18:53:08Z

Just using an integer works everywhere else we've used it. I know on LSF there's a setting in the config file to change default units, would that be a problem if that setting was, say GB instead of bytes or whatever the default value that's currently being used?

The spec defines integers are parsed as bytes.

In the backend configuration for the job submit command, use the suffixed version of memory.

adthrasher · 2019-08-08T17:06:45Z

Just using an integer works everywhere else we've used it. I know on LSF there's a setting in the config file to change default units, would that be a problem if that setting was, say GB instead of bytes or whatever the default value that's currently being used?

The spec defines integers are parsed as bytes.

In the backend configuration for the job submit command, use the suffixed version of memory.

@a-frantz - I think what @zaeleus is saying is that there is implicit unit conversion going on and it is not clear to an end-user. For example, star.build_db specifies a memory value of "50000" which according to the WDL spec should be in bytes. The LSF conf file doesn't specify units, so the "50000" is passed directly to the bsub command. Our LSF cluster is set to use MB as the default unit (LSF's built-in default unit is KB). So the "50000" value should be bytes according to the WDL spec, but at runtime, this is being interpreted as MB to get 50GB of reserved memory. For correctness and clarity, we should use the string version of memory specification and set the unit being used in the LSF conf.

…emory. Setting tools to specify memory requirements as a string.

…pecified.

…orkflows into rnaseq-workflow

claymcleod · 2019-09-24T17:24:22Z

This seems like it is getting close to being ready for merging, right?

adthrasher · 2019-09-24T17:26:48Z

This seems like it is getting close to being ready for merging, right?

Yes, I think it is ready to merge. Assuming we want anything arising through the RFC to be in a separate PR.

claymcleod · 2019-09-25T14:55:50Z

This seems like it is getting close to being ready for merging, right?

Yes, I think it is ready to merge. Assuming we want anything arising through the RFC to be in a separate PR.

Yeah I think that's probably the right approach. Go ahead and merge whenever you are ready.

Getting acquainted with WDL. Sketch of STAR workflow.

5787bb6

claymcleod requested a review from a-frantz June 30, 2019 21:34

claymcleod assigned zaeleus, claymcleod and a-frantz Jun 30, 2019

Draft of star-alignment dockerfile.

f28631c

zaeleus reviewed Jul 1, 2019

View reviewed changes

Comment thread docker/star-alignment.Dockerfile Outdated

claymcleod and others added 5 commits July 10, 2019 15:57

Syncing of latest STAR dockerfile for Andrew.

0e3190a

Add tasks to star.wdl

8b02f71

Move new tasks in star.wdl to qc.wdl

3fe0ad3

Add star-qc workflow

e98be0f

Refactor star-qc.wdl

427178c

zaeleus reviewed Jul 17, 2019

View reviewed changes

Comment thread tools/qc.wdl Outdated

Comment thread tools/qc.wdl Outdated

claymcleod assigned adthrasher Jul 17, 2019

claymcleod requested a review from adthrasher July 17, 2019 16:40

adthrasher reviewed Jul 17, 2019

View reviewed changes

Comment thread tools/qc.wdl Outdated

adthrasher and others added 10 commits July 18, 2019 10:18

Reorganize wdl tools

eb79ac0

Adding samtools tasks

c52cdcc

Use glob() instead of ugly Array[File] hack

f4e48df

Make samtools split fail on unaccounted reads, unless workflow overrides

24cca66

Remove extra bracket

581e685

Updated split function for samtools

f1da9b0

Merging changes

774b7ac

Adding RSeQC infer_experiment call.

47def83

chore(gitignore): Add log files to .gitignore (produced by dive)

ea7c349

zaeleus reviewed Jul 19, 2019

View reviewed changes

adthrasher added 2 commits July 19, 2019 17:53

Fixing variable references

4b783c1

Merge branch 'rnaseq-workflow' of https://github.com/stjude/sjcloud-w…

900f72f

…orkflows into rnaseq-workflow

a-frantz and others added 4 commits July 31, 2019 10:41

Use gtf instead of gff for htseq-count

dcb3d41

Make var name changes to make wdltool validate happy

3726440

Add runtime memory parameters to heavy tasks

a2df3b3

Bumping the star alignment memory requirement and setting a limit for…

3d73ed8

… STAR.

adthrasher added 4 commits August 8, 2019 13:29

Adding lsf.conf for Cromwell backend. Setting backend to use MB for m…

381f7c6

…emory. Setting tools to specify memory requirements as a string.

Adding fq lib to the Docker image.

41fe74e

Renaming variables from basename for toil compatibility.

52a1587

Installing fq to /usr/local/

552eaad

zaeleus reviewed Aug 10, 2019

View reviewed changes

Comment thread docker/bioinformatics-base/Dockerfile Outdated

adthrasher added 10 commits August 12, 2019 10:00

Moving fq lib to a separate build layer.

c0ee835

Updating LSF conf to allow singularity wrapper if a docker image is s…

d31370f

…pecified.

Merge branch 'rnaseq-workflow' of https://github.com/stjude/sjcloud-w…

955df89

…orkflows into rnaseq-workflow

Adding docker runtime

925bbe1

Adding v0.3.1 tag to fq lib install

f98ad5e

Adding a template configuration file for running on AWS with Cromwell.

67d38ed

Adding documentation to workflows and licensing information.

f749943

Adding additional documentation.

0b9d066

Adding deeptools to Docker image.

fea42d2

Adding bigwig generation step with deeptools.

4aafcac

adthrasher merged commit ceb55de into master Sep 25, 2019

claymcleod deleted the rnaseq-workflow branch September 26, 2019 02:46

adthrasher mentioned this pull request Feb 25, 2025

Hi-C workflow #139

Closed

adthrasher mentioned this pull request Jun 1, 2026

[WIP] refactor: module support #318

Draft

6 tasks

Conversation

claymcleod commented Jun 30, 2019

Uh oh!

claymcleod commented Jun 30, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zaeleus Jul 19, 2019

Choose a reason for hiding this comment

Uh oh!

claymcleod Jul 20, 2019

Choose a reason for hiding this comment

Uh oh!

zaeleus Jul 22, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zaeleus commented Aug 7, 2019

Uh oh!

a-frantz commented Aug 7, 2019

Uh oh!

zaeleus commented Aug 7, 2019

Uh oh!

adthrasher commented Aug 8, 2019

Uh oh!

Uh oh!

claymcleod commented Sep 24, 2019

Uh oh!

adthrasher commented Sep 24, 2019

Uh oh!

claymcleod commented Sep 25, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants