-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update module sratools prefetch #305
Conversation
This PR is against the
|
|
subworkflows/nf-core/fastq_download_prefetch_fasterqdump_sratools/tests/main.nf.test.snap
Outdated
Show resolved
Hide resolved
…ols/tests/main.nf.test.snap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, all tests are OK.
Just checking some extra things before merging.
Thanks a lot @suhrig for the work
Hi @maxulysse, can I help you with the additional checks? All I can say is I have run this several times now without any issues. Regards, Sebastian |
# check file integrity using vdb-validate or (when archive contains no checksums) md5sum | ||
vdb-validate !{id} > vdb-validate_result.txt 2>&1 || exit 1 | ||
if grep -q "checksums missing" vdb-validate_result.txt; then | ||
VALID_MD5SUMS=$(curl --silent --fail --location --retry 3 --retry-delay 60 'https://locate.ncbi.nlm.nih.gov/sdl/2/retrieve?filetype=run&acc=!{id}') | ||
LOCAL_MD5SUMS=$(md5sum !{id}/* | cut -f1 -d' ') | ||
if ! grep -q -F -f <(echo "$LOCAL_MD5SUMS") <(echo "$VALID_MD5SUMS"); then | ||
echo "MD5 sum check failed" 1>&2 | ||
exit 1 | ||
fi | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only issue I can see is that we're no longer doing vdb-validate
over !{id.}sralite
.
But I do love the added md5 sum possibility.
Can you keep the sralite
part in this logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is no longer needed. It was needed in old version of sratools. But the latest version places .sralite
files in a subdirectory named after the SRA ID just like regular .sra
files. prefetch
dowloads the dataset which gave rise to this code just fine (SRR1806585
, see issue #162) and vdb-validate
checks it properly. fasterq-dump
still fails at a later step as mentioned in issue #162, but that's because it uses an outdated version of sratools (2.11). I guess it makes sense to update fasterq-dump
as part of this PR as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, then we're good into merging this PR.
I'd rather we don't update fasterq-dump
for now as it would be re-creating other issues: #221
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my side it looks good, but @maxulysse 's comment is a good question, of course.
This PR fixes #285. Downloads using sratools often resulted in corrupt files, because the tool to check the file integrity,
vdb-validate
is not always reliable. This PR updates the modulesratools/prefetch
, which includes a fix which performs a manual MD5 sum check in cases wherevdb-validate
is unreliable.PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).nextflow run . -profile debug,test,docker --outdir <OUTDIR>
).CHANGELOG.md
is updated.