Skip to content

Commit

Permalink
fixes ganon v1.6.0 (#251)
Browse files Browse the repository at this point in the history
* docs, fix test sets

* genome_updater v0.6.2, small fixes
  • Loading branch information
pirovc committed May 10, 2023
1 parent 4472217 commit 941874c
Show file tree
Hide file tree
Showing 10 changed files with 12 additions and 12 deletions.
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ before_install:
- eval "${MATRIX_EVAL}"
- python3 -m pip install --upgrade pip
- python3 -m pip install "pandas>=1.1.0"
- python3 -m pip install "multitax>=1.2.1"
- python3 -m pip install "multitax>=1.3.1"
- if [ "$BUILD_TYPE" == "Coverage" ]; then
python3 -m pip install coverage;
fi
Expand Down
4 changes: 2 additions & 2 deletions docs/default_databases.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,8 @@ NCBI RefSeq and GenBank repositories are common resources to obtain reference se
|---|---|---|---|
| Complete | 1595845 | | <details><summary>cmd</summary>`ganon build --source genbank --organism-group archaea bacteria fungi viral --threads 48 --db-prefix abfv_gb`</details> |
| One assembly per species | 99505 | 91 - 420 | <details><summary>cmd</summary>`ganon build --source genbank --organism-group archaea bacteria fungi viral --threads 48 --genome-updater "-A 'species:1'" --db-prefix abfv_gb_t1s`</details> |
| Complete genomes (higher quality) | 92917 | 24 - | <details><summary>cmd</summary>`ganon build --source genbank --organism-group archaea bacteria fungi viral --threads 48 --complete-genomes --db-prefix abfv_gb_cg`</details> |
| One assembly per species of complete genomes | 34497 | 10 - | <details><summary>cmd</summary>`ganon build --source genbank --organism-group archaea bacteria fungi viral --threads 48 --complete-genomes "-A 'species:1'" --db-prefix abfv_gb_cg_t1s`</details> |
| Complete genomes (higher quality) | 92917 | 24 - 132 | <details><summary>cmd</summary>`ganon build --source genbank --organism-group archaea bacteria fungi viral --threads 48 --complete-genomes --db-prefix abfv_gb_cg`</details> |
| One assembly per species of complete genomes | 34497 | 10 - 34 | <details><summary>cmd</summary>`ganon build --source genbank --organism-group archaea bacteria fungi viral --threads 48 --complete-genomes "-A 'species:1'" --db-prefix abfv_gb_cg_t1s`</details> |

\* Size (GB) is the final size of the database and the approximate amount of RAM necessary to build it (calculated with default parameters). The two values represent databases built with and without the `--hibf` parameter, respectively. The trade-offs between those two modes are explained [here](#hibf).

Expand Down
2 changes: 1 addition & 1 deletion libs/genome_updater
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ def read(filename):
url="https://www.github.com/pirovc/ganon",
license='MIT',
author="Vitor C. Piro",
description="ganon classifies short DNA sequences against large sets of genomic reference sequences efficiently",
description="ganon classifies DNA sequences against large sets of genomic reference sequences efficiently",
long_description=read("README.md"),
package_dir={'': 'src'},
packages=["ganon"],
Expand Down
2 changes: 1 addition & 1 deletion src/ganon/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -397,7 +397,7 @@ def validate(self):
elif check_file(db_prefix + ".ibf"):
ibf = True
else:
print_log("File not found: " + prefix + ".ibf/.hibf" )
print_log("File not found: " + db_prefix + ".ibf/.hibf" )
return False

if check_file(db_prefix + ".tax"):
Expand Down
2 changes: 2 additions & 0 deletions tests/ganon/data/build/releases/latest/MD5SUM.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
07b534765d6b7d3e4d8bf67f549a5d66 build/releases/latest/ar53_taxonomy.tsv.gz
70a673d332f60af1cf68e34d09a56816 build/releases/latest/bac120_taxonomy.tsv.gz
2 changes: 0 additions & 2 deletions tests/ganon/data/build/releases/release207/207.0/MD5SUM

This file was deleted.

8 changes: 4 additions & 4 deletions tests/ganon/data/download_test_set_build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -61,14 +61,14 @@ md5sum "${outfld}pub/taxonomy/new_taxdump/new_taxdump.tar.gz" > "${outfld}pub/ta
rm "${outfld}new_taxdump.tar.gz" "${outfld}taxidlineage.dmp" "${outfld}rankedlineage.dmp" "${outfld}pub/taxonomy/new_taxdump/taxidlineage.dmp" "${outfld}pub/taxonomy/new_taxdump/rankedlineage.dmp"

#gtdb
gtdb_out="${outfld}releases/release207/207.0/"
gtdb_out="${outfld}releases/latest/"
mkdir -p "${gtdb_out}"
gtdb_tax=( "ar53_taxonomy_r207.tsv.gz" "bac120_taxonomy_r207.tsv.gz" )
gtdb_tax=( "ar53_taxonomy.tsv.gz" "bac120_taxonomy.tsv.gz" )
for tax in "${gtdb_tax[@]}"; do
wget --quiet --show-progress --output-document "${outfld}${tax}" "https://data.gtdb.ecogenomic.org/releases/release207/207.0/${tax}"
wget --quiet --show-progress --output-document "${outfld}${tax}" "https://data.gtdb.ecogenomic.org/releases/latest/${tax}"
join -1 1 -2 1 <(cut -f 1 "${outfld}accessions_taxids.txt" | sort) <(zcat "${outfld}${tax}" | awk 'BEGIN{FS=OFS="\t"}{print $1,$1,$2}' | sed -r 's/^.{3}//' | sort) -t$'\t' -o "2.2,2.3" | gzip > "${gtdb_out}${tax}"
rm "${outfld}${tax}"
done

md5sum ${gtdb_out}*.tsv.gz > "${gtdb_out}MD5SUM"
md5sum ${gtdb_out}*.tsv.gz > "${gtdb_out}MD5SUM.txt"
rm ${outfld}accessions_taxids.txt

0 comments on commit 941874c

Please sign in to comment.