-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build-ganon unable to read taxonomy file(s) #282
Comments
There are two problems. First the file order in I will try to document better the taxonomy file ordering and skip empty lines when parsing the taxonomy in the next release. |
I also notice that this small example will not build properly. There are no assembly accession information in the file names, which is expected by default. The easiest in this case is to generate a file linking each file to the taxonomic target and use it with
|
A longer explanation, since this already appeared before #277 Before version 2.0.0, you could use one or more files in any format and with That's why the creating the I'm working in a solution for this in ganon2 to bring back the old functionality. |
Thanks very much @pirovc for the detailed explanations! I will attempt to update the nf-core module and dataset accordingly (where the original failures came from). I'll let you close this issue when you're ready (e.g. if you want to keep it open until you've updated the documentation). |
@pirovc |
I also realise I'm not really following what the From here: https://pirovc.github.io/ganon/custom_databases/#non-standardcustom-accessions I understand I can have 3 or 5 columns:
I don't think I'm following the terminology of 'target' and 'specialization' based on the information on that page...I al Following your example above of
I get the following error:
Which is confusing: it needs 5, but I've given 3, but there are two many columns? I also get the same error when I specify a similar file, but with the sequence accession ID here:
If I extend to what I presume is a 5 column file:
Then I get
|
The test files from the issues above |
I see that it's a bit confusing, the
Specialization (cols. 4 and 5) are only needed if you want to create a specialized taxonomic level with a custom name, with the option to group files under this node. Example:
When |
Answering the number of columns, they are all optional, with the following behavior:
|
Most issues fixed in v2.1.0. The empty line bug will be fixed later on multitax. |
Great thank you very much @pirovc ! |
I saw the bioconda recipe hasn't been updated yet so I didn't continue today. I'm travelling for a month from next week, so will have to provide feedback (if any needed!) then :) |
From nf-core/createtaxdb
(ignore the 'prot' in the nodes/names.dmp files, it is standard taxonomy :) ).
The relevant files:
sarscov2 = https://raw.githubusercontent.com/nf-core/test-datasets/createtaxdb/data/fasta/sarscov2.fasta
influenzae = https://raw.githubusercontent.com/nf-core/test-datasets/createtaxdb/data/fasta/haemophilus_influenzae.fna.gz
nodesdmp = 'https://raw.githubusercontent.com/nf-core/test-datasets/createtaxdb/data/taxonomy/prot_nodes.dmp'
namesdmp = 'https://raw.githubusercontent.com/nf-core/test-datasets/createtaxdb/data/taxonomy/prot_names.dmp'
The text was updated successfully, but these errors were encountered: