-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unranked targets when building custom DB #35
Comments
Yes, exactly.
Unfortunately, there is no direct way. I think the best way would be to generate a list of all targets using
and check if the columns of the lineages are 0 (which means no valid TaxonId). |
Hi, Thanks for suggesting Is this the expected operation of MetaCache and I should take care to remove any characters before an initial underscore from accessions >18 characters when providing a mapping file to Thanks, |
Hi, accession numbers are a total mess. We use a regex to identify NCBI-style accession or accession.version sequence identifiers. For some reason that I don't remember we only allow the letter part to be 7 characters long (including the underscore). If you want a super quick fix, go to the file "src/sequence_io.cpp" line 471 and replace the regex André |
Hi, Thanks. I can look to implement the same regex expression to ensure consistency. Why is modifying accessions necessary? I'd like to add in my own genomes that don't necessarily have NCBI-style accessions. This is made a bit more complicated if I have to account for changes that might be made by MetaCache. Thanks, |
This should be solved by the changes in version v.2.4.2. |
Hi,
I'm building a custom DB from a large set of genome files. I'm indicating the TaxonId of each sequence using the
NCBI-style accession2taxid tab-separated files
. When I build the DB, it appears some sequences are not being given a rank. Specifically, the output ofmetacache build
indicates262383 targets remain unranked
. Does this mean that MetaCache identified and placed sequenced in the DB that do not have a taxonomic assignment (i.e. associated TaxonId)? Is there any way to determine which sequences remain unranked?Thanks,
Donovan
The text was updated successfully, but these errors were encountered: