-
Notifications
You must be signed in to change notification settings - Fork 848
Description
Description of feature
First of all thanks for the great work. It is really a breeze working with this pipeline and having so much QC and other perks like the auto strandedness at hand. However, I recently found a missing peace that I think comes in very handy in situations where one processes many samples downloaded from a public repository (like I currently do).
The problem:
So my main workflow nowadays is (i) generate a sample list, (ii) download with fetchngs (sratools), (iii) process data with a modified version of rnaseq (only wanted the auto strandedness and did my own alignment processing). Especially with public repositories and sratools it sometimes seems to happen that there are non-terminating exceptions which make the fetchngs process seem to complete normally but actually the files miss some reads or whatever (I encountered a bunch of different situations). This subsequently leads to some problems with processing with rnaseq where either FQ_LINT or trimming fails because readfiles (especially paired) don't match. First I thought that this may be easily solved by just ignoring FQ_LINT errors which would saveguard trimming end everything downstream because in my mind the data flow was FQ_LINT -> TRIM -> everything else. Unfortunately, I found FQ_LINT is not feeding into the trimming processes which results in trimming errors on the same samples that also fail linting.
Expected behaviour:
ignoring linting errors should saveguard trimming by simply ignoring all samples that fail at linting
Observed behaviour:
linting and trimming are completely independent processes feed from the same channel so even though linting fails trimming commences on the same sample
Solution:
My solution for this is simply to join the output of FQ_LINT with the input channel of the trimming stage like so (file subworkflows/nf-core/fastq_qc_trim_filter_setstrandedness/main.nf:
ch_filtered_reads
.join ( FQ_LINT.out.lint )
.map { it[0..-2] }
.set { ch_linted_reads }
This construct basically filters all samples that fail at linting and prevents trimming errors later on. I personally find this to be the correct behaviour and I think it would help other people dealing with failing pipelines when processing a large number of samples.