Crash when trying to create novel allele consensus fasta from ARG database #40

swlong · 2015-06-17T19:05:09Z

Howdy all,

Quick question, I am trying to recover the novel allele sequences from some samples being run against the ARG database included with SRST2. I'm getting the following error when trying to create the allele pileup:

06/17/2015 12:44:06 Generate pileup...
06/17/2015 12:44:06 Running: samtools mpileup -L 1000 -f /home/mresswl/archive/ARGFinder_DB/ARGannot.fasta -Q 20 -q 1 -B data_out/KPN166ec__KPN166ec.ARGannot.sorted.bam
[mpileup] 1 samples in 1 input files
Set max per-file depth to 8000
06/17/2015 12:44:11 Processing SAMtools pileup...
06/17/2015 12:44:34 Scoring alleles...
Traceback (most recent call last):
File "/share/apps/srst2/bin/srst2.py", line 1592, in
main()
File "/share/apps/srst2/bin/srst2.py", line 1548, in main
db_reports, db_results = run_srst2(args,fileSets,args.gene_db,"genes")
File "/share/apps/srst2/bin/srst2.py", line 1102, in run_srst2
db_reports, db_results_list = process_fasta_db(args, fileSets, run_type, db_reports, db_results_list, fasta)
File "/share/apps/srst2/bin/srst2.py", line 1164, in process_fasta_db
unique_gene_symbols, unique_allele_symbols,run_type,ST_db,results,gene_list,db_report,cluster_symbols,max_mismatch)
File "/share/apps/srst2/bin/srst2.py", line 1275, in map_fileSet_to_db
unique_gene_symbols, unique_allele_symbols, pileup_file)
File "/share/apps/srst2/bin/srst2.py", line 859, in parse_scores
allele_pileup_file = create_allele_pileup(top_allele, pileup_file) # XXX Creates a new pileup file for that allele. Not currently cleaned up
File "/share/apps/srst2/bin/srst2.py", line 765, in create_allele_pileup
with open(outpileup, 'w') as allele_pileup:
IOError: [Errno 2] No such file or directory: '77__OqxA_Flq__OqxA__1044.data_out/KPN166ec__KPN166ec.ARGannot.pileup'

So it seems like I'm having a problem creating a file / directory but I'm not certain why that is... the script is able to generate all the other output files it needs to... this is only an error when I try to run with --report_new_consensus. Any help is appreciated. This is happening for every paired end readset I attempt to generate the --report_new_consensus reports for.

Full program output is appended below, in case it is helpful.

06/17/2015 12:39:54 program started
06/17/2015 12:39:54 command line: /share/apps/srst2/bin/srst2.py --input_pe data_in/KPN166ec_1.fastq.gz data_in/KPN166ec_2.fastq.gz --report_new_consensus --gene_db /home/mresswl/archive/ARGFinder_DB/ARGannot.fasta --output data_out/KPN166ec
06/17/2015 12:39:54 Total paired readsets found:1
06/17/2015 12:39:54 Index for /home/mresswl/archive/ARGFinder_DB/ARGannot.fasta is already built...
06/17/2015 12:39:54 Processing database /home/mresswl/archive/ARGFinder_DB/ARGannot.fasta
06/17/2015 12:39:54 Non-unique:55, CmlA_PheCmlA5
06/17/2015 12:39:54 Non-unique:114, CfrA_Phe
06/17/2015 12:39:54 Processing sample KPN166ec
06/17/2015 12:39:54 Output prefix set to: data_out/KPN166ec__KPN166ec.ARGannot
06/17/2015 12:39:54 Aligning reads to index /home/mresswl/archive/ARGFinder_DB/ARGannot.fasta using bowtie2...
06/17/2015 12:39:54 Running: bowtie2 -1 data_in/KPN166ec_1.fastq.gz -2 data_in/KPN166ec_2.fastq.gz -S data_out/KPN166ec__KPN166ec.ARGannot.sam -q --very-sensitive-local --no-unal -a -x /home/mresswl/archive/ARGFinder_DB/ARGannot.fasta
1764353 reads; of these:
1764353 (100.00%) were paired; of these:
1760384 (99.78%) aligned concordantly 0 times
2397 (0.14%) aligned concordantly exactly 1 time
1572 (0.09%) aligned concordantly >1 times
----
1760384 pairs aligned concordantly 0 times; of these:
156 (0.01%) aligned discordantly 1 time
----
1760228 pairs aligned 0 times concordantly or discordantly; of these:
3520456 mates make up the pairs; of these:
3517876 (99.93%) aligned 0 times
1544 (0.04%) aligned exactly 1 time
1036 (0.03%) aligned >1 times
0.31% overall alignment rate
06/17/2015 12:43:47 Processing Bowtie2 output with SAMtools...
06/17/2015 12:43:47 Generate and sort BAM file...
06/17/2015 12:43:47 Running: samtools view -b -o data_out/KPN166ec__KPN166ec.ARGannot.unsorted.bam -q 1 -S data_out/KPN166ec__KPN166ec.ARGannot.sam.mod
[samopen] SAM header is present: 1683 sequences.
06/17/2015 12:43:50 Running: samtools sort data_out/KPN166ec__KPN166ec.ARGannot.unsorted.bam data_out/KPN166ec__KPN166ec.ARGannot.sorted
06/17/2015 12:44:06 Deleting sam and bam files that are not longer needed...
06/17/2015 12:44:06 Deleting data_out/KPN166ec__KPN166ec.ARGannot.sam
06/17/2015 12:44:06 Deleting data_out/KPN166ec__KPN166ec.ARGannot.sam.mod
06/17/2015 12:44:06 Deleting data_out/KPN166ec__KPN166ec.ARGannot.unsorted.bam
06/17/2015 12:44:06 Generate pileup...
06/17/2015 12:44:06 Running: samtools mpileup -L 1000 -f /home/mresswl/archive/ARGFinder_DB/ARGannot.fasta -Q 20 -q 1 -B data_out/KPN166ec__KPN166ec.ARGannot.sorted.bam
[mpileup] 1 samples in 1 input files
Set max per-file depth to 8000
06/17/2015 12:44:11 Processing SAMtools pileup...
06/17/2015 12:44:34 Scoring alleles...
Traceback (most recent call last):
File "/share/apps/srst2/bin/srst2.py", line 1592, in
main()
File "/share/apps/srst2/bin/srst2.py", line 1548, in main
db_reports, db_results = run_srst2(args,fileSets,args.gene_db,"genes")
File "/share/apps/srst2/bin/srst2.py", line 1102, in run_srst2
db_reports, db_results_list = process_fasta_db(args, fileSets, run_type, db_reports, db_results_list, fasta)
File "/share/apps/srst2/bin/srst2.py", line 1164, in process_fasta_db
unique_gene_symbols, unique_allele_symbols,run_type,ST_db,results,gene_list,db_report,cluster_symbols,max_mismatch)
File "/share/apps/srst2/bin/srst2.py", line 1275, in map_fileSet_to_db
unique_gene_symbols, unique_allele_symbols, pileup_file)
File "/share/apps/srst2/bin/srst2.py", line 859, in parse_scores
allele_pileup_file = create_allele_pileup(top_allele, pileup_file) # XXX Creates a new pileup file for that allele. Not currently cleaned up
File "/share/apps/srst2/bin/srst2.py", line 765, in create_allele_pileup
with open(outpileup, 'w') as allele_pileup:
IOError: [Errno 2] No such file or directory: '77__OqxA_Flq__OqxA__1044.data_out/KPN166ec__KPN166ec.ARGannot.pileup'

katholt · 2015-06-18T00:00:11Z

Hi Scott, I'm not sure what's going wrong here.
From the log it seems the problem is that the pileup is not being generated when new alleles are encountered.
I just tried running (v0.1.5) with some local read sets, using --gene_db ARGannot.fasta and --report_new_consensus and had no errors and got expected behaviour: pileups were generated for each case where the best scoring allele was an imprecise match to the database, and consensus was extracted from this and reported in *.new_consensus_alleles.fasta output files.
What version of the code are you using?
I'll try a bit more trouble shooting too...

swlong · 2015-06-18T03:21:35Z

It may be a problem with the relative path and the way the cluster that I'm running it on handles it with respect to my job submission script... I'm going to take a look in the morning and see if that's the issue and if I can fix it. Thanks for taking a look for me, too.

niyunyun · 2015-06-18T18:59:55Z

It is a problem in the python code where it tries to prefix the supplied output on line 765. It works if the supplied output is a filename, but fails if it has directory path. I have edited the code and created a pull request to fix this issue. Thanks!

swlong · 2015-06-18T19:25:05Z

Thanks to Yunyun for sorting this out!

katholt · 2015-11-19T07:53:41Z

Thanks this is now tested and is being merged into the new release

katholt closed this as completed Nov 19, 2015

katholt mentioned this issue Nov 19, 2015

Couple of fixes for #48

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crash when trying to create novel allele consensus fasta from ARG database #40

Crash when trying to create novel allele consensus fasta from ARG database #40

swlong commented Jun 17, 2015

katholt commented Jun 18, 2015

swlong commented Jun 18, 2015

niyunyun commented Jun 18, 2015

swlong commented Jun 18, 2015

katholt commented Nov 19, 2015

Crash when trying to create novel allele consensus fasta from ARG database #40

Crash when trying to create novel allele consensus fasta from ARG database #40

Comments

swlong commented Jun 17, 2015

katholt commented Jun 18, 2015

swlong commented Jun 18, 2015

niyunyun commented Jun 18, 2015

swlong commented Jun 18, 2015

katholt commented Nov 19, 2015