Skip to content

make trimming input reliant on fq lint output #1514

@dmalzl

Description

@dmalzl

Description of feature

First of all thanks for the great work. It is really a breeze working with this pipeline and having so much QC and other perks like the auto strandedness at hand. However, I recently found a missing peace that I think comes in very handy in situations where one processes many samples downloaded from a public repository (like I currently do).

The problem:
So my main workflow nowadays is (i) generate a sample list, (ii) download with fetchngs (sratools), (iii) process data with a modified version of rnaseq (only wanted the auto strandedness and did my own alignment processing). Especially with public repositories and sratools it sometimes seems to happen that there are non-terminating exceptions which make the fetchngs process seem to complete normally but actually the files miss some reads or whatever (I encountered a bunch of different situations). This subsequently leads to some problems with processing with rnaseq where either FQ_LINT or trimming fails because readfiles (especially paired) don't match. First I thought that this may be easily solved by just ignoring FQ_LINT errors which would saveguard trimming end everything downstream because in my mind the data flow was FQ_LINT -> TRIM -> everything else. Unfortunately, I found FQ_LINT is not feeding into the trimming processes which results in trimming errors on the same samples that also fail linting.

Expected behaviour:
ignoring linting errors should saveguard trimming by simply ignoring all samples that fail at linting

Observed behaviour:
linting and trimming are completely independent processes feed from the same channel so even though linting fails trimming commences on the same sample

Solution:
My solution for this is simply to join the output of FQ_LINT with the input channel of the trimming stage like so (file subworkflows/nf-core/fastq_qc_trim_filter_setstrandedness/main.nf:

  ch_filtered_reads
      .join ( FQ_LINT.out.lint )
      .map { it[0..-2] }
      .set { ch_linted_reads }

This construct basically filters all samples that fail at linting and prevents trimming errors later on. I personally find this to be the correct behaviour and I think it would help other people dealing with failing pipelines when processing a large number of samples.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions