Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using new ARG-ANNOT v3 database #94

Closed
hjb60 opened this issue Sep 28, 2017 · 5 comments
Closed

Using new ARG-ANNOT v3 database #94

hjb60 opened this issue Sep 28, 2017 · 5 comments

Comments

@hjb60
Copy link

hjb60 commented Sep 28, 2017

I would like to use the newest version of the ARG-ANNOT database but on running the script I am not getting the full genes output file. The error file contains the following:

09/20/2017 20:47:42 Processing SAMtools pileup...
Traceback (most recent call last):
File "/software/pathogen/external/apps/usr/local/Python-2.7.13/bin/srst2", line 11, in
load_entry_point('srst2==0.2.0', 'console_scripts', 'srst2')()
File "/software/pathogen/external/apps/usr/local/Python-2.7.13/lib/python2.7/site-packages/srst2/srst2.py", line 1729, in main
db_reports, db_results = run_srst2(args,fileSets,args.gene_db,"genes")
File "/software/pathogen/external/apps/usr/local/Python-2.7.13/lib/python2.7/site-packages/srst2/srst2.py", line 1264, in run_srst2
db_results_list, fasta)
File "/software/pathogen/external/apps/usr/local/Python-2.7.13/lib/python2.7/site-packages/srst2/srst2.py", line 1327, in process_fasta_db
results,gene_list, db_report, cluster_symbols, max_mismatch)
File "/software/pathogen/external/apps/usr/local/Python-2.7.13/lib/python2.7/site-packages/srst2/srst2.py", line 1422, in map_fileSet_to_db
read_pileup_data(pileup_file, size, args.prob_err)
File "/software/pathogen/external/apps/usr/local/Python-2.7.13/lib/python2.7/site-packages/srst2/srst2.py", line 337, in read_pileup_data
allele_size = size[allele]
KeyError: 'AGly-Aac3-IIa:X51534:91-951:861'

Is there some additional formatting I should be doing before trying to use it? Sorry if this is obvious - this area is not my forte!

@katholt
Copy link
Owner

katholt commented Sep 29, 2017

What command are you attempting to run?

@hjb60
Copy link
Author

hjb60 commented Oct 2, 2017 via email

@wanyuac
Copy link
Contributor

wanyuac commented Oct 2, 2017

@hjb60 May I ask where did you get the resistance.fasta from? I feel the sequence header "AGly-Aac3-IIa:X51534:91-951:861" does not look like the one in the ARG-ANNOT v3 database "(AGly)Aac3-IIa:X51534:91-951:861" or in our curated version of the same database "203__Aac3-IIa_AGly__Aac3-IIa__882 no;no;Aac3-IIa;AGly;X51534;91-951;861".

SRST2 extracts information from sequence headers following a specific format as aforementioned (also please refer to Generating SRST2-compatible clustered database from raw sequences for more details). An error of unknown keys arises when this requirement is not fullfilled.

You may want to compare your resistance database with the formal release of the ARG-ANNOT v3 database, or try our curated version, which has already been tested on SRST2.

@hjb60
Copy link
Author

hjb60 commented Oct 2, 2017 via email

@katholt
Copy link
Owner

katholt commented Oct 3, 2017

Unless you have a specific reason to do otherwise, I would suggest using our pre-formatted version of this resistance database (ARGannot_r2.fasta) which is in the /data directory

@katholt katholt closed this as completed Oct 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants