-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UniVec_Core build failed - skipping entries without valid taxonomic nodes #277
Comments
I've also been having some issues building the RefSeq mitochondrion, plasmid, and plastid databases.mitochondrion:1 valid file(s) [--input-extension fna.gz, --input-recursive] found in /ourdisk/hpc/hofmanlmamr/karsav1511/auto_archive_notyet/tape_2copies/RefSeq/ganonRefSeqLim3/refs/mitochondrion Downloading and parsing ncbi taxonomy
Parsing sequences from --input (1 files)
Retrieving sequence information from NCBI e-utils
Validating taxonomy
Downloading and parsing auxiliary files for genome size estimation
Estimating genome sizes
Building index (raptor)
raptor layout
raptor build Error code: -6 Plasmid and plastid:10 valid file(s) [--input-extension fna.gz, --input-recursive] found in /ourdisk/hpc/hofmanlmamr/karsav1511/auto_archive_notyet/tape_2copies/RefSeq/ganonRefSeqLim3/refs/plasmid Downloading and parsing ncbi taxonomy
Parsing --input (10 files)
Downloading assembly_summary files
Parsing assembly_summary files
Validating taxonomy
ERROR: Unable to match taxonomy to targets I made an input tsv file for the plasmid and plastid files to fix the taxonomy problem, but now it comes back with the same error as the mitochondrion build (and I use --restart in all the build commands now). Downloading and parsing ncbi taxonomy
Parsing --input-file /ourdisk/hpc/hofmanlmamr/karsav1511/auto_archive_notyet/tape_2copies/RefSeq/ganonRefSeqLim3/refs/plastid/plastid_ganon_input_file.tsv
Validating taxonomy
Downloading and parsing auxiliary files for genome size estimation
Estimating genome sizes
Building index (raptor) Error code: -6 |
Regarding the UniVec database, the examples in the documentation are indeed outdated. Follow the commands to build it: echo -e "UniVec_Core.fasta\tUniVec_Core\t81077" > UniVec_Core_ganon_input_file.tsv
ganon build-custom --input-file UniVec_Core_ganon_input_file.tsv --db-prefix UniVec_Core --level leaves
|
The same goes for the plasmid, plastid and mitochondrion. Use the following after downloading the files: mkdir sequences
zcat plasmid.* plastid.* mitochondrion.* | awk '$0 ~ ">" {accver=(substr($1,2)); print accver}{print $0 > "sequences/"accver".fna"}' | ganon-get-seq-info.sh -e -i - | awk '{print "sequences/"$1".fna\t"$1"\t"$3}' > ppm.tsv
ganon build-custom --input-file ppm.tsv --db-prefix ppm --level species --threads 20
rm -rf sequences you could also build them separately, just change the
Fixed in v2.1.0 #285, this works now:
|
Documentation updated in v2.0.1 #281 |
Hi, I'm trying to build a UniVec_Core database. I keep running into this unable to match taxonomy targets error. I also tried to use taxID 28384 for all the univec_core sequences, just to see if that would work (same issue).
I downloaded the fasta file and then made an input file before running the below build command:
grep -o '^>[^ ]*' Univec_Core.fasta | sed 's/^>//' | awk '{print "Univec_Core.fasta\t"$1"\t81077"}' > Univec_Core_ganon_input_file.tsv
ganon build-custom -t 20 -n$db/refs/"$ {cat}"/"${cat}"_ganon_input_file.tsv -d $db/"$ {cat}"_k19 --level species -k 19 -w 31 -s 4 -v hibf -p 0.001
Error file:
Downloading and parsing ncbi taxonomy
Parsing --input-file /ourdisk/hpc/hofmanlmamr/karsav1511/auto_archive_notyet/tape_2copies/RefSeq/ganonRefSeqLim3/refs/Univec_Core/Univec_Core_ganon_input_file.tsv
Validating taxonomy
ERROR: Unable to match taxonomy to targets
Total elapsed time: 19.32 seconds.
The text was updated successfully, but these errors were encountered: