-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sequence names starting with "Sp_" #65
Comments
Thank you, Stephen. There's a bug when using the command you used, the colname of the accession column would be treated as one of the ranks, which messed up all the ranks. I've fixed it but haven't released it yet. Please use the binary here:
But it seems not the issue you met. Can you paste some data to reproduce? |
I figure out what happend. Please wait for a few minutes. |
Fixed. The old default regular expression
Also fix the command to create taxdump from MGV data |
Fixed! ... An unrelated question I was hoping you could answer: how should I format the input file for sequences that are unclassified at a given rank? Can I use "unclassified" or an empty string "" or do I need to include the parent taxon e.g. "unclassified_proteobacteria"? |
Just leave it blank (empty string ""), the accession would point to the closest node above the node in
|
I built a taxdump using a custom taxonomy with the command:
taxonkit create-taxdump genome_taxonomy.tsv -A 1 -O out--force
A few of the accessions in
genome_taxonomy.tsv
start with "Sp_" and I noticed this prefix was removed in thetaxid.map
output file causing some issues.I'll find a workaround, but thought you might want to know
The text was updated successfully, but these errors were encountered: