[WIP] STAR Workflow v2.0.0#1
Conversation
|
@a-frantz Let's use this repo/branch to develop out the workflow together. |
…till sketching) - Renamed `bio-base` to `bioinformatics-base`. - Cleans up multiple layers in the `bioinformatics-base` image. - Move images to their own folders. Still sketching out how these might should look.
| ENV PATH /opt/conda/bin:$PATH | ||
|
|
||
| RUN apt-get update && \ | ||
| apt-get upgrade -y && \ |
There was a problem hiding this comment.
| apt-get upgrade -y && \ |
Assume the base image is already up-to-date.
There was a problem hiding this comment.
Can you explain a little further why this would be a best practice? Just curious from your perspective.
There was a problem hiding this comment.
It's the responsibility of the base image to maintain and update core packages periodically. Anything else can be updated/installed individually.
|
Integer values for memory requirements are in bytes, e.g., 75000 = ~73 KiB. Use a string with a unit instead: "75 GiB". |
Just using an integer works everywhere else we've used it. I know on LSF there's a setting in the config file to change default units, would that be a problem if that setting was, say |
The spec defines integers are parsed as bytes. In the backend configuration for the job submit command, use the suffixed version of |
@a-frantz - I think what @zaeleus is saying is that there is implicit unit conversion going on and it is not clear to an end-user. For example, star.build_db specifies a memory value of "50000" which according to the WDL spec should be in bytes. The LSF conf file doesn't specify units, so the "50000" is passed directly to the bsub command. Our LSF cluster is set to use MB as the default unit (LSF's built-in default unit is KB). So the "50000" value should be bytes according to the WDL spec, but at runtime, this is being interpreted as MB to get 50GB of reserved memory. For correctness and clarity, we should use the string version of memory specification and set the unit being used in the LSF conf. |
…emory. Setting tools to specify memory requirements as a string.
…orkflows into rnaseq-workflow
|
This seems like it is getting close to being ready for merging, right? |
Yes, I think it is ready to merge. Assuming we want anything arising through the RFC to be in a separate PR. |
Yeah I think that's probably the right approach. Go ahead and merge whenever you are ready. |
This PR is an implementation of the new
STARalignment workflow for RNA-seq (v2.0). You can follow the discussion on what features the pipeline will have on the RFC pull request.