Hello, I was wondering if someone could help me?
I've been trying to adapter trim and merge my dataset using Seqprep, but when I plot the read lengths after merging, I'm missing most of the reads between 40 and 50bp. I can't work out why, or whether I'm doing something wrong!
So: read length plots resemble this:
L120_2.read_lengths.pdf
I'm running SeqPrep as follows:
SeqPrep -f L120_1.qual.fastq -r L120_2_.qual.fastq -1 L120-R1.qual.unmerged.fastq -2 L120-R2.qual.unmerged.fastq -3 L120_NeutCap_2-R1.qual.discarded.fastq -4 L120_NeutCap_2-R2.qual.discarded.fastq -L 30 -q 15 -A AGATCGGAAGAGCACACGTC -B GGAAGAGCGTCGTGTAGGGA -s L120_NeutCap_2.qual.merged.fastq -E L120_NeutCap_2.qual.readable_alignment.txt -o 10
You'll notice that while the first adapter is the standard illumina one, but the second is a modified one, missing the first 5 bp. You can see both adapters present in the file if you grep the sequences (indicated below with [xx])…
Read1 quality trimmed, L120_2 above:
@HISEQ:268:C8TMGANXX:2:1101:1430:1965 1:N:0:NTCGTCGGNCGCAACG CAGGCACTCCCTGGAAACTCTAAGGGGCAGTTCTACTCT[AGATCGGAAGA] + A@B0BGGGGGGGCFGGGGGGGGGGGEGGGGGGGGGGCGG@1E@FGD/CEF
@HISEQ:268:C8TMGANXX:2:1101:1457:1992 1:N:0:TTCGTCGGNCGCAACG CTAGACCGCGAATACACACA[AGATCGGAAGAGCACACGTCTGAACTCCAG] + 33<<BGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGBGGGGGGGG
@HISEQ:268:C8TMGANXX:2:1101:1684:1955 1:N:0:TTCGTCGGCCGCAACG NTGATATGTCCGGAGTGCATCGTATGGCGCTTTCAATGAATTTG[AGATCG] + #3<<@EGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGGGGG
@HISEQ:268:C8TMGANXX:2:1101:1619:1977 1:N:0:TTCGTCGGCCGCAACG CGGTGCCATCGAGCCTGTTCTGTCTCATAGTGACCCT[AGATCGGAAGAGC] + 33@>@GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
@HISEQ:268:C8TMGANXX:2:1101:1574:1983 1:N:0:TTCGTCGGCCGCAACG CCATCCTAGTGGGGGGAAAT[AGATCGGAAGAGCACACGTCTGAACTCCAA] + <330<E1EFFCGGGGGFGECDGEGGFGBDCDDGEGGGGCD0DDCDG=EBC
Read 2, quality trimmed, for L120_2 above.
@HISEQ:268:C8TMGANXX:2:1101:1430:1965 2:N:0:NTCGTCGGNCGCAACG AGAGTAGAACTGCCCCNNNNAGTTTCCAGGGAGTGCCTG[GGAAGAGCGTC] + BB@BBGGDFGGGGGGG####==EFGDFFGGGGGGGGGGGGEGGGGGGGGF
@HISEQ:268:C8TMGANXX:2:1101:1457:1992 2:N:0:TTCGTCGGNCGCAACG TGTGTGTATTCGCGGTCTATGGAAGAGCGTCGTGTAG[GGAAAGAGTGTCG] + CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
@HISEQ:268:C8TMGANXX:2:1101:1684:1955 2:N:0:TTCGTCGGCCGCAACG CAAATTCATTGAAAGNNNNNTACGATGCACTCCGGACATATCAT[GGAAGA] + CCCCCGGGGGGGGGG#####@=EFGGGGGGGGGGGGGGGGGGGGGGGGGG
@HISEQ:268:C8TMGANXX:2:1101:1619:1977 2:N:0:TTCGTCGGCCGCAACG AGGGTCACTATGAGACAGAACAGGCTCGATGGCACCT[GGAAGAGCGTCGT] + CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
@HISEQ:268:C8TMGANXX:2:1101:1574:1983 2:N:0:TTCGTCGGCCGCAACG ATTTCCCCCCACTAGGATGT[GGAAGAGCGTCGTGTAGGGAAAGAGTGTCG] + BCCCCGGGGGDGGGGGGGGGGGGGGGGGGGDGGGGGGGGGGGGGGGGGFG
So I think the adapter sequences are correct, but I can't explain why there's a dip in the read length frequency. Is this a quirk of SeqPrep? Can anyone offer any explanation?
Many thanks!
Hello, I was wondering if someone could help me?
I've been trying to adapter trim and merge my dataset using Seqprep, but when I plot the read lengths after merging, I'm missing most of the reads between 40 and 50bp. I can't work out why, or whether I'm doing something wrong!
So: read length plots resemble this:
L120_2.read_lengths.pdf
I'm running SeqPrep as follows:
SeqPrep -f L120_1.qual.fastq -r L120_2_.qual.fastq -1 L120-R1.qual.unmerged.fastq -2 L120-R2.qual.unmerged.fastq -3 L120_NeutCap_2-R1.qual.discarded.fastq -4 L120_NeutCap_2-R2.qual.discarded.fastq -L 30 -q 15 -A AGATCGGAAGAGCACACGTC -B GGAAGAGCGTCGTGTAGGGA -s L120_NeutCap_2.qual.merged.fastq -E L120_NeutCap_2.qual.readable_alignment.txt -o 10You'll notice that while the first adapter is the standard illumina one, but the second is a modified one, missing the first 5 bp. You can see both adapters present in the file if you grep the sequences (indicated below with [xx])…
Read1 quality trimmed, L120_2 above:
@HISEQ:268:C8TMGANXX:2:1101:1430:1965 1:N:0:NTCGTCGGNCGCAACG CAGGCACTCCCTGGAAACTCTAAGGGGCAGTTCTACTCT[AGATCGGAAGA] + A@B0BGGGGGGGCFGGGGGGGGGGGEGGGGGGGGGGCGG@1E@FGD/CEF@HISEQ:268:C8TMGANXX:2:1101:1457:1992 1:N:0:TTCGTCGGNCGCAACG CTAGACCGCGAATACACACA[AGATCGGAAGAGCACACGTCTGAACTCCAG] + 33<<BGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGBGGGGGGGG@HISEQ:268:C8TMGANXX:2:1101:1684:1955 1:N:0:TTCGTCGGCCGCAACG NTGATATGTCCGGAGTGCATCGTATGGCGCTTTCAATGAATTTG[AGATCG] + #3<<@EGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGGGGG@HISEQ:268:C8TMGANXX:2:1101:1619:1977 1:N:0:TTCGTCGGCCGCAACG CGGTGCCATCGAGCCTGTTCTGTCTCATAGTGACCCT[AGATCGGAAGAGC] + 33@>@GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG@HISEQ:268:C8TMGANXX:2:1101:1574:1983 1:N:0:TTCGTCGGCCGCAACG CCATCCTAGTGGGGGGAAAT[AGATCGGAAGAGCACACGTCTGAACTCCAA] + <330<E1EFFCGGGGGFGECDGEGGFGBDCDDGEGGGGCD0DDCDG=EBCRead 2, quality trimmed, for L120_2 above.
@HISEQ:268:C8TMGANXX:2:1101:1430:1965 2:N:0:NTCGTCGGNCGCAACG AGAGTAGAACTGCCCCNNNNAGTTTCCAGGGAGTGCCTG[GGAAGAGCGTC] + BB@BBGGDFGGGGGGG####==EFGDFFGGGGGGGGGGGGEGGGGGGGGF@HISEQ:268:C8TMGANXX:2:1101:1457:1992 2:N:0:TTCGTCGGNCGCAACG TGTGTGTATTCGCGGTCTATGGAAGAGCGTCGTGTAG[GGAAAGAGTGTCG] + CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG@HISEQ:268:C8TMGANXX:2:1101:1684:1955 2:N:0:TTCGTCGGCCGCAACG CAAATTCATTGAAAGNNNNNTACGATGCACTCCGGACATATCAT[GGAAGA] + CCCCCGGGGGGGGGG#####@=EFGGGGGGGGGGGGGGGGGGGGGGGGGG@HISEQ:268:C8TMGANXX:2:1101:1619:1977 2:N:0:TTCGTCGGCCGCAACG AGGGTCACTATGAGACAGAACAGGCTCGATGGCACCT[GGAAGAGCGTCGT] + CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG@HISEQ:268:C8TMGANXX:2:1101:1574:1983 2:N:0:TTCGTCGGCCGCAACG ATTTCCCCCCACTAGGATGT[GGAAGAGCGTCGTGTAGGGAAAGAGTGTCG] + BCCCCGGGGGDGGGGGGGGGGGGGGGGGGGDGGGGGGGGGGGGGGGGGFGSo I think the adapter sequences are correct, but I can't explain why there's a dip in the read length frequency. Is this a quirk of SeqPrep? Can anyone offer any explanation?
Many thanks!