Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Parsing of external gene calls provided by `--external-gene-calls` #374
This is for version 2.0.0rc3 (83dac84).
When I try to run
The gene calls seem to be ok, similar to the example given in the help section, with lengths divisible by 3. The relevant code is in dbops.py:
sequence = contig_sequences[contig_name][gene_call['start']:gene_call['stop']]
I assume this is a problem since gene coordinates are mostly 1-based.
Can you please take a look?
Thank you very much for trying out the rc3! I am hoping to release the new stable version very soon, but clearly there are things to address.
The reason you get an error at line 1259 is probably because of the
I will make sure it is clear in the documentation, but anvi'o follows the convention of string indexing that is identical the way one does it in Python or C (so you should change your input file instead of the code to make it work with what you have right now).
I.e., for a gene call that is like this
1 2 3 nt pos: 12345678901234567890123456789012 seq: NNNATGNNNNNNNNNNNNNNNNNTAGAAAAAA |______ gene X _______|
I am not sure whether it is common to start from
I asked @tdelmont many times, but the computer scientist inside him made him insist with the 0-index splicing of strings.
added a commit
Jul 1, 2016
changed the title from
external gene calls appear to be improperly parsed
Parsing of external gene calls provided by `--external-gene-calls`
Jul 1, 2016
The default behavior is now clarified in the tutorial: http://merenlab.org/2016/06/22/anvio-tutorial-v2/#external-gene-calls
Thanks for letting me know! I am glad it is sorted out.
That is a fair point, if I hear one more complaint about this I will change the default behavior and issue a public apology :)
No no no no, no need to change it, or apologize for it :) 99.99% of people won't use an external caller. I only did it because I was being impatient and just ran prodigal on a chunked version of my contigs. I'm very happy that you provide a way of adding these gene calls to the contig db.
I just want to make one comment here! I agree that it is fine to leave it 0-indexed, but I want to ask about the discrepancy in the start and stop positions. Most gene callers would call your example gene from positions 4 - 26. When I was writing a program to convert RAST to an anvio gene table, I just subtracted 1 from both to get a 0-index for both start and stop positions. But this didn't work because in reality, Anvio is looking for a 0-indexed 5' and a 1-indexed 3' (at least from what most gene callers would give you), that is 3-26. Why didn't I say 0-indexed start and 1-indexed stop? Because in the forward direction, the start is 0-indexed and in the reverse direction the stop is 0-indexed. A bit confusing for a lay person, I think. Thoughts?