taXaminer

taXaminer - examine the taxonomic diversity in genome assemblies. Designed to detect and differentiate contamination and horizontal gene transfer.

taXaminer combines a reference-free and an alignment-based approach to detect and differentiate contamination and horizontal gene transfer in genome assemblies. It uses a total of 16 intrinsic features to describe the gene set. Among these are the read coverage, sequence composition, gene length and the size of the scaffold it is annotated on (see details here). To identify genes which discern from the average, a Principal Component Analysis is used to cluster genes with similar features. The taxonomic assignment targets at identifying the true taxon of origin for each gene. It is based on their protein sequence to reduce the need of having the exact reference in the database.

The results can be interactively explored in the accompanying dashboard.

Installation

To install taXaminer, use the python package installer pip. Note: taXaminer is as of yet not published at pypi, thus you need to download this repository and provide pip with the link to the directory for installation.

git clone https://github.com/BIONF/taXaminer.git
pip install ./taXaminer

To install the additional dependencies, use the setup function included in taXaminer. You can install the tools either via conda or locally in a specified directory.

Using conda (installs into the currently active environment):

taxaminer.setup --conda

In a local directory:

taxaminer.setup -o </path/to/tool/directory/>

To download and build the database, use:

taxaminer.setup --db -d </path/to/database/directory/>

Use the following command to use an existing database.

taxaminer.setup -d </path/to/existing_database/directory/>

Usage

Create a configuration file using the following template and adapt it to fit your data.

fasta_path: "path/to/assembly.fasta" # path to assembly FASTA
gff_path: "path/to/assembly.gff" # path to annotation in GFF3 format
output_path: "path/to/output_directory/" # directory to save results in
taxon_id: "<NCBI taxon ID>" # NCBI Taxon ID of query species

To include coverage information, add the path to a sorted bam file (this is optional). Otherwise, omit this parameter from the configuration file.

bam_path_1: "path/to/mapping.bam" # path to BAM file

Note: When using multiple coverage sets, duplicate the parameter and increase the number in the suffix

To run taXaminer, call it with the path to the config file, like so:

taxaminer.run <config.yml>

For details on additional options see Configuration parameters.

Bugs

Any bug reports, comments or suggestions are highly appreciated. Please open an issue on GitHub or reach out via email.

Contributors

Freya Arthen
Simonida Zehr

License

taXaminer is released under MIT license.

Contact

Please contact us via email.

Name		Name	Last commit message	Last commit date
Latest commit History 210 Commits
taxaminer		taxaminer
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
config.yml		config.yml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

taXaminer

Table of Contents

Installation

Usage

Bugs

Contributors

License

Contact

About

Releases 15

Packages

Languages

License

fdarthen/taXaminer

Folders and files

Latest commit

History

Repository files navigation

taXaminer

Table of Contents

Installation

Usage

Bugs

Contributors

License

Contact

About

Resources

License

Stars

Watchers

Forks

Releases 15

Packages 0

Languages

Packages