Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

custom databases - manual intervention required to complete code for sequence reads #144

Open
pjbiggs opened this issue Jan 30, 2024 · 0 comments

Comments

@pjbiggs
Copy link

pjbiggs commented Jan 30, 2024

Hi there,

I am using SRST2 for a custom database to search for a small variable gene region (~320bp with flanking) within a set of Campylobacter sp genomes. i have made a small database of unique sequences from a much larger sequence dataset using the provided instructions. This small dataset has 100 sequences, and clusters to 5 sequences at c = 0.9 within cdhit-est. i have made the sequence names as simple as possible in case that was the issue. My problem is that the code cannot run without manual intervention (having to push Ctrl-C) after the line <mpileup> Set max per-file depth to 8000 to complete the run, as shown below (I have changed the input file names, but all other code is correct):

testOfFlankingBla$ time python2 ~/software/srst2/scripts/srst2.py --input_pe ../flaA_singleTest/SRRxxxxx_1.fastq.gz ../flaA_singleTest/SRRxxxxx_2.fastq.gz --output SRRxxxxx --gene_db ../flankingBlaBit_cdhit.fasta --log
1968887 reads; of these:
1968887 (100.00%) were paired; of these:
1968800 (100.00%) aligned concordantly 0 times
9 (0.00%) aligned concordantly exactly 1 time
78 (0.00%) aligned concordantly >1 times
----
1968800 pairs aligned concordantly 0 times; of these:
0 (0.00%) aligned discordantly 1 time
----
1968800 pairs aligned 0 times concordantly or discordantly; of these:
3937600 mates make up the pairs; of these:
3937577 (100.00%) aligned 0 times
4 (0.00%) aligned exactly 1 time
19 (0.00%) aligned >1 times
0.01% overall alignment rate
[samopen] SAM header is present: 100 sequences.
[mpileup] 1 samples in 1 input files
<mpileup> Set max per-file depth to 8000
sh: 1: OXC8243__27943: not found
sh: 1: OXC8243__00001: not found

^Csh: 1: NCTC11168__48: not found
sh: 1: NCTC11168__00008: not found
^Csh: 1: ARI2590__39380: not found
sh: 1: ARI2590__00095: not found
^Csh: 1: 8096__00098: not found
sh: 1: 8096__24271: not found
^C
real 14m28.381s
user 1m5.051s
sys 0m3.154s

i let this run go on (~14 minutes) to see if it was a timing issue (it wasn't). However, i get to the <mpileup> Set max per-file depth to 8000 line after about 90 seconds. Automating this on a folder of Illumina PE sequences is therefore currently not possible. i do get output, including a table of hits. Do you have any idea about why this is happening, and how to solve it?

Thanks in advance,

Patrick

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant