Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Assertion error in few processesed files #369
I am running cutadapt as part of DADA2 pipeline to analyze paired-end MiSeq 16S sequencing data. Since my sequences contains 0-7 b long heterogeneity spacer sequence, as described in Fadrosh et al. 2014, in addition to primer sequences, I had to remove them by using primer sequences as a template instead of just cutting fixed number of bases from each end.
So, I am relatively new to running things on command prompt, but after persistent trial and error, I managed to install cutadapt to my WSL Ubuntu 18.04.
As DADA2 is R program, I am naturally running it on latest version of R that is 3.5.3. Python is also updated to 3.6.7 and cutadapt is version 2.1
This is the code that I was running in R:
Running this begun smoothly and I got these prints as expected:
....(skipped prints for the second read, they were pretty similar to this one)
Things went pretty good until I got this error message:
Code kept running after this and proceed to process some more files succesfully. Only 5 samples out of 30 were affected and looks like they are missing most of the reads. As an example one sample lost over 70% of its reads. This ended up being pretty lengthy post, but I wasn't sure what to include.
Thanks, I got the sequences.
Note for myself: I’ve been able to pinpoint the problem to an empty read in the input file, combined with using a wildcard in the adapter sequence. This will reproduce the problem: