Skip to content
GitHub no longer supports this web browser. Learn more about the browsers we support.
Detection of incorrectly labeled sequences across kingdoms
C++ Shell CMake C Dockerfile Perl
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.github Add Marv Logo Nov 11, 2019
data Fix dna example, make length calculation more readable in predictcont… Jan 26, 2020
example Fix dna example, make length calculation more readable in predictcont… Jan 26, 2020
lib/mmseqs Merge commit '00622e8c35082e0a92736731200f8da0aaa18a84' as 'lib/mmseqs' Jan 26, 2020
src Fix version Cmake issue for conda Jan 26, 2020
.gitmodules Removed submodule mmseqs2 Jan 26, 2020
CMakeLists.txt Rework to support taxonomy expressions, clean up Aug 21, 2019
LICENCE.md
README.md Update README.md Jan 27, 2020
azure-pipelines.yml Adjust azure to handle the mmseqs2 subtree Jan 26, 2020

README.md

Conterminator

Build Status

Detection of contamination in nucleotide and protein sequence sets

Conterminator is an efficient method to detect incorrectly labeled sequences across kingdoms by an exhaustive all-against-all sequence comparison. It is free open-source GPLv3-licensed software for Linux and macOS, and is developed on top of modules provided by MMseqs2.

Terminating contamination: large-scale search identifies more than 2,000,000 contaminated entries in GenBank. biorxiv, doi: 10.1101/2020.01.26.920173 (2020)

Install

Conterminator requires a 64-bit Linux system (check with uname -a | grep x86_64) with at least the SSE4.1 instruction set (check by executing cat /proc/cpuinfo | grep sse4_1.

# SSE4.1
wget https://mmseqs.com/conterminator/conterminator-linux-sse41.tar.gz; tar xvfz conterminator-linux-sse41.tar.gz; export PATH=$(pwd)/conterminator/:$PATH
# AVX2
wget https://mmseqs.com/conterminator/conterminator-linux-avx2.tar.gz; tar xvfz conterminator-linux-avx2.tar.gz; export PATH=$(pwd)/conterminator/:$PATH
# conda
conda install -c bioconda conterminator

Compile from source

Conterminator can be installed by compiling from source.

git clone --recursive https://github.com/martin-steinegger/conterminator 
mkdir conterminator/build && cd conterminator/build
cmake -DCMAKE_BUILD_TYPE=RELEASE -DCMAKE_INSTALL_PREFIX=. ..
make -j 4
make install
export PATH=$(pwd)/bin/:$PATH 

Getting started

Conterminator computes ungapped local alignments of all sequence and reports contamination across user-specifed specified taxa, by default this is done at kingdom level.

To process nucleotide sequences use the following command:

conterminator dna example/dna.fas example/dna.mapping result tmp 

Conterminator requires a FASTA input and mappingFile file, which maps FASTA identfiers to NCBI taxon identfiers.

Protein sequences can be processed as following:

conterminator protein example/prots.fas example/prots.mapping result tmp  

Important Parameters

--kingdom

This parameters controls across which ranks contaminations should be considered. Each taxon definition is seperated by a , e.g. to search for contamination between bacteria and human use --taxon-list 2,9606. It is also possible to use more advanced expressions for contamination rules, through the following operators:

! NEGATION 
|| OR  
&& AND 

The default rule is as follows:

2||2157,4751,33208,33090,2759&&!4751&&!33208&&!33090   

This searches for contamination between the following taxa:

2||2157  # Bacteria OR Archaea 
4751     # Fungi
33208    # Metazoa
33090    # Viridiplantae  
2759&&!4751&&!33208&&!33090 # Eukaryota without Fungi Metazoa and Viridiplantae
You can’t perform that action at this time.