-
Notifications
You must be signed in to change notification settings - Fork 705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pipeline fails due to trimming related removal of all reads from a sample #825
Comments
Hi @dmalzl ! You could try to use this in a config you pass via
That should ignore any failing processes when you Is there a way we can detect "empty" files like this? So the files still have content but the sequences are zero length. Are you able to provide an example here for future reference? |
Hmm I think this is worth trying, but then I would have to check all failures manually in the end which might be a pain but thanks for the suggestion anyway. For me the fastq file
and could easily be parsed to see what percentage of the reads are thrown out and then like compare this to a threshold. |
Can you do |
this is the ouptut I get fom
The file is not empty due to the gzip header I guess but |
Hmmm...thanks. Ok. Maybe it's easier to parse the Cutadapt logs in that case 👍🏽 So something like this instead. |
I think this would be easiest. Thanks for looking into it. Another related question. Since I am not really planning on using the output salmon produces I was thinking if there is a way to tell the pipeline to only ignore errors in any salmon process something like this:
do you think that will work? |
Update: seems to work for now |
Fixed in drpatelh@96f6988 Added a new parameter Added some logic and a new process that generates a warning in the MultiQC report if samples fail the |
Description of the bug
I am using the rnaseq pipeline to process a large number of samples I downloaded from the SRA. Everything runs as expected but it encountered an error writing that Salmon could not detect any reads aligned to the reference transcriptome. Upon closer inspection of the sample that caused the error I found that the length of the reads in this sample is 17 bp. Adapter and quality trimming with trim_galore imposes a hard length cutoff of trimmed reads which by default is 20 thus effectively removing all reads from my sample even before alignment. Consequently STAR does not have anything to align and writes an empty bamfile which eventually is passed to Salmon and causes the pipeline to fail. Currently there is no saveguard for this except removing such samples prior to the processing or skip trimming altogether. Since I do not want to rerun the pipeline from the start as most of my samples are already aligned, is there another route I can take to make the pipeline finish. I am already thinking of changing the Salmon processes a bit but I think this will just pass the problem to the next stages of the pipeline. I would therefore suggest to somehow catch this edge case in some or the other way after trimming to avoid failures in such situations
Command used and terminal output
Relevant files
any fastqfile with too short reads (e.g. SRR5277994)
System information
No response
The text was updated successfully, but these errors were encountered: