custom databases - manual intervention required to complete code for sequence reads #144

pjbiggs · 2024-01-30T00:09:17Z

Hi there,

I am using SRST2 for a custom database to search for a small variable gene region (~320bp with flanking) within a set of Campylobacter sp genomes. i have made a small database of unique sequences from a much larger sequence dataset using the provided instructions. This small dataset has 100 sequences, and clusters to 5 sequences at c = 0.9 within cdhit-est. i have made the sequence names as simple as possible in case that was the issue. My problem is that the code cannot run without manual intervention (having to push Ctrl-C) after the line <mpileup> Set max per-file depth to 8000 to complete the run, as shown below (I have changed the input file names, but all other code is correct):

testOfFlankingBla$ time python2 ~/software/srst2/scripts/srst2.py --input_pe ../flaA_singleTest/SRRxxxxx_1.fastq.gz ../flaA_singleTest/SRRxxxxx_2.fastq.gz --output SRRxxxxx --gene_db ../flankingBlaBit_cdhit.fasta --log
1968887 reads; of these:
1968887 (100.00%) were paired; of these:
1968800 (100.00%) aligned concordantly 0 times
9 (0.00%) aligned concordantly exactly 1 time
78 (0.00%) aligned concordantly >1 times
----
1968800 pairs aligned concordantly 0 times; of these:
0 (0.00%) aligned discordantly 1 time
----
1968800 pairs aligned 0 times concordantly or discordantly; of these:
3937600 mates make up the pairs; of these:
3937577 (100.00%) aligned 0 times
4 (0.00%) aligned exactly 1 time
19 (0.00%) aligned >1 times
0.01% overall alignment rate
[samopen] SAM header is present: 100 sequences.
[mpileup] 1 samples in 1 input files
<mpileup> Set max per-file depth to 8000
sh: 1: OXC8243__27943: not found
sh: 1: OXC8243__00001: not found

^Csh: 1: NCTC11168__48: not found
sh: 1: NCTC11168__00008: not found
^Csh: 1: ARI2590__39380: not found
sh: 1: ARI2590__00095: not found
^Csh: 1: 8096__00098: not found
sh: 1: 8096__24271: not found
^C
real 14m28.381s
user 1m5.051s
sys 0m3.154s

i let this run go on (~14 minutes) to see if it was a timing issue (it wasn't). However, i get to the <mpileup> Set max per-file depth to 8000 line after about 90 seconds. Automating this on a folder of Illumina PE sequences is therefore currently not possible. i do get output, including a table of hits. Do you have any idea about why this is happening, and how to solve it?

Thanks in advance,

Patrick

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

custom databases - manual intervention required to complete code for sequence reads #144

custom databases - manual intervention required to complete code for sequence reads #144

pjbiggs commented Jan 30, 2024 •

edited

Loading

custom databases - manual intervention required to complete code for sequence reads #144

custom databases - manual intervention required to complete code for sequence reads #144

Comments

pjbiggs commented Jan 30, 2024 • edited Loading

pjbiggs commented Jan 30, 2024 •

edited

Loading