-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add modules for sra-human-scrubber #2694
base: master
Are you sure you want to change the base?
Conversation
@rpetit3 , it looks like the version naming structure for the database has changed: |
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? | ||
'https://depot.galaxyproject.org/singularity/sra-human-scrubber:2.0.0--hdfd78af_0': | ||
'quay.io/biocontainers/sra-human-scrubber:2.0.0--hdfd78af_0' }" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No input 😱
DBVERSION=\$(curl "https://ftp.ncbi.nlm.nih.gov/sra/dbs/human_filter/current/version.txt") | ||
curl -f "https://ftp.ncbi.nlm.nih.gov/sra/dbs/human_filter/human_filter.db.\${DBVERSION}" -o "\${DBVERSION}.human_filter.db" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could probably be done outside of a process? I mean Channel.fromPath("https://ftp.ncbi.nlm.nih.gov/sra/dbs/human_filter/human_filter.db.\${DBVERSION}" -o "\${DBVERSION}.human_filter.db")
will achieve much the same thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This process is currently looking up what the most recent version is, and downloading that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely think you can do a GET and make a channel it will achieve the same thing and not require a full process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that is true. I'm just trying to get PRs to a point where they can be merged in (or potentially to be closed if they aren't necessary).
One option would be to just add the scrubber module in, and leave the database fetching to be external (potentially via a local module). Otherwise this is going to change the md5sum any time the external database is changed anyway.
|
||
input: | ||
tuple val(meta), path(reads) | ||
path db |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
path db | |
tuple val(meta2), path(db) |
def VERSION = '2.0.0' // WARN: Version information not provided by tool on CLI. Please update this string when bumping container versions. | ||
if (meta.single_end) { | ||
""" | ||
zcat ${reads} | scrub.sh -d $db | gzip > ${prefix}.scrubbed.fastq.gz | ||
|
||
cat <<-END_VERSIONS > versions.yml | ||
"${task.process}": | ||
sra-human-scrubber: $VERSION | ||
END_VERSIONS | ||
""" | ||
} else { | ||
""" | ||
zcat ${reads[0]} | scrub.sh -d $db | gzip > ${prefix}_R1.scrubbed.fastq.gz | ||
zcat ${reads[1]} | scrub.sh -d $db | gzip > ${prefix}_R2.scrubbed.fastq.gz | ||
|
||
cat <<-END_VERSIONS > versions.yml | ||
"${task.process}": | ||
sra-human-scrubber: $VERSION | ||
sra-human-scrubber-db: \$DBVERSION | ||
END_VERSIONS | ||
""" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to not use meta.single_end
, although I know I'm in the minority. Instead I prefer to test the number of FASTQs explicitly.
def VERSION = '2.0.0' // WARN: Version information not provided by tool on CLI. Please update this string when bumping container versions. | |
if (meta.single_end) { | |
""" | |
zcat ${reads} | scrub.sh -d $db | gzip > ${prefix}.scrubbed.fastq.gz | |
cat <<-END_VERSIONS > versions.yml | |
"${task.process}": | |
sra-human-scrubber: $VERSION | |
END_VERSIONS | |
""" | |
} else { | |
""" | |
zcat ${reads[0]} | scrub.sh -d $db | gzip > ${prefix}_R1.scrubbed.fastq.gz | |
zcat ${reads[1]} | scrub.sh -d $db | gzip > ${prefix}_R2.scrubbed.fastq.gz | |
cat <<-END_VERSIONS > versions.yml | |
"${task.process}": | |
sra-human-scrubber: $VERSION | |
sra-human-scrubber-db: \$DBVERSION | |
END_VERSIONS | |
""" | |
} | |
def VERSION = '2.0.0' // WARN: Version information not provided by tool on CLI. Please update this string when bumping container versions. | |
def num_fastq = reads instanceof List ? reads.size() : 1 | |
if (num_fastq == 1) { | |
""" | |
zcat ${reads} | scrub.sh -d $db | gzip > ${prefix}.scrubbed.fastq.gz | |
cat <<-END_VERSIONS > versions.yml | |
"${task.process}": | |
sra-human-scrubber: $VERSION | |
END_VERSIONS | |
""" | |
} else { | |
// Could handle it better here but you get the idea | |
""" | |
zcat ${reads[0]} | scrub.sh -d $db | gzip > ${prefix}_R1.scrubbed.fastq.gz | |
zcat ${reads[1]} | scrub.sh -d $db | gzip > ${prefix}_R2.scrubbed.fastq.gz | |
cat <<-END_VERSIONS > versions.yml | |
"${task.process}": | |
sra-human-scrubber: $VERSION | |
sra-human-scrubber-db: \$DBVERSION | |
END_VERSIONS | |
""" | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also find the single-end modules a bit odd, have noticed that in trying to get these old PRs across the line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR checklist
Closes #2693
versions.yml
file.label
PROFILE=docker pytest --tag <MODULE> --symlink --keep-workflow-wd --git-aware
PROFILE=singularity pytest --tag <MODULE> --symlink --keep-workflow-wd --git-aware
PROFILE=conda pytest --tag <MODULE> --symlink --keep-workflow-wd --git-aware
SRA Human Scrubber uses a >1gb database, so I went with stub runs for the tests.