SortMeRNA is a local sequence alignment tool for filtering, mapping and clustering.
The core algorithm is based on approximate seeds and allows for sensitive analysis of NGS reads. The main application of SortMeRNA is filtering rRNA from metatranscriptomic data. SortMeRNA takes as input a file of reads (fasta or fastq format) and one or multiple rRNA database file(s), and sorts apart aligned and rejected reads into two files specified by the user. Additional applications include clustering and taxonomy assignation available through QIIME v1.9.1 (http://qiime.org). SortMeRNA works with Illumina, Ion Torrent and PacBio data, and can produce SAM and BLAST-like alignments.
Visit http://bioinfo.lifl.fr/RNA/sortmerna/ for more information.
- Support
- Documentation
- Getting Started
- Compilation
- Tests
- Third-party libraries
- Wrappers and packages
- Taxonomies
- Citation
- Contributors
- References
For questions and comments, please use the SortMeRNA forum.
If you have Doxygen installed, you can generate the documentation
by modifying the following lines in doxygen_configure.txt
:
INPUT = /path/to/sortmerna/include /path/to/sortmerna/src
IMAGE_PATH = /path/to/sortmerna/algorithm
and running the following command:
doxygen doxygen_configure.txt
This command will generate a folder html
in the directory from which the
command was run.
SortMeRNA can be built and run on Windows, Linux, and Mac.
There are 3 methods to install SortMeRNA:
- GitHub repository development version (master branch) ...* Installation instructions
- GitHub releases (tar balls, zip) ...* Installation instructions Linux ...* Installation instructions Mac OS ...* Installation instructions Windows OS
- BioInfo releases (tar balls including compiled binaries)
Option (3) is the simplest, as it provides access to pre-compiled binaries to various OS.
CMake is used for generating the build files and should be installed prior the build. CMake distributions are available for all major operating systems. Please visit CMake project website for download and installation instructions.
We tested the build on Ubuntu 16.04 LTS Xenial with GCC 5.4.0
(1) Install GCC if not already installed. SortmeRNA is C++14 compliant, so the GCC needs to be fairly new e.g. 5.4.0 works OK.
```bash
gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
```
(2) Install pre-requisites (CMake, Git, Zlib, RocksDB, RapidJson)
```
sudo apt update
sudo apt install cmake
sudo apt install git
suod apt install zlib
sudo apt install rocksdb
sudo apt install rapidjson
```
(3) Clone the Git repository
```
git clone https://github.com/biocore/sortmerna.git
```
(2) Generate the build files using CMake:
```bash
mkdir -p $SMR_HOME/build/Release
pushd $SMR_HOME/build/Release
cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release -DEXTRA_CXX_FLAGS_RELEASE="-pthread" ../..
```
NOTE: $SMR_HOME
is the top directory where sortmerna code (e.g. git repo) is located.
The above commands will perform necessary system check-ups, dependencies, and generate Makefile.
(3) Compile and build executables:
```bash
make
```
The binaries are created in $SMR_HOME/build/Release/src/indexdb
and $SMR_HOME/build/Release/src/sortmerna
Simply add the build binaries to the PATH e.g.
export PATH="$SMR_HOME/build/Release/src/indexdb:$SMR_HOME/build/Release/src/sortmerna:$PATH"
We tested the build on macOS 10.13 High Sierra (64-bit). We recommend the Homebrew - an excellent packager for Mac [1], which has all the latest packages required to build SortmeRNA. The build can be performed using either Clang or GCC.
(1) Install Homebrew:
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" [1]
brew --version
brew help
(2) Install pre-requisites (CMake, Git, Zlib, RocksDB, RapidJson)
brew install cmake
brew install git
brew install zlib
brew install rocksdb
brew install rapidjson
(3) Clone the GIt repository
git clone https://github.com/biocore/sortmerna.git
(4) Generate the build files:
mkdir -p $SMR_HOME/build/Release
pushd $SMR_HOME/build/Release
cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release -DEXTRA_CXX_FLAGS_RELEASE="-pthread" ../..
-- The CXX compiler identification is AppleClang 9.0.0.9000039
-- The C compiler identification is AppleClang 9.0.0.9000039
-- Check for working CXX compiler: /Library/Developer/CommandLineTools/usr/bin/c++
-- Check for working CXX compiler: /Library/Developer/CommandLineTools/usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Check for working C compiler: /Library/Developer/CommandLineTools/usr/bin/cc
-- Check for working C compiler: /Library/Developer/CommandLineTools/usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
CMAKE_CXX_COMPILER_ID = AppleClang
CMAKE_CONFIGURATION_TYPES =
CMAKE_CXX_FLAGS_RELEASE: -O3 -DNDEBUG
EXTRA_CXX_FLAGS_RELEASE: -pthread
Cloning into 'concurrentqueue'...
Checking out files: 100% (1613/1613), done.
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/bc/sortmerna/build/Release
Note: $SMR_HOME
is the top directory where sortmerna code (e.g. git repo) is located.
CMake will perform necessary system check-ups, dependencies, and generate Makefile.
(5) Compile and build executables:
make
The binaries are created in $SMR_HOME/build/Release/src/indexdb
and $SMR_HOME/build/Release/src/sortmerna
Simply add the build binaries to the PATH e.g.
export PATH="$SMR_HOME/build/Release/src/indexdb:$SMR_HOME/build/Release/src/sortmerna:$PATH"
(1) Check if you have Clang installed:
clang --version
(2a) If Clang is installed, set your compiler to Clang:
export CC=clang
export CXX=clang++
(2b) If Clang is not installed, see Clang for Mac OS for installation instructions.
(1) Check if you have GCC installed:
gcc --version
(2a) If GCC is installed, set your compiler to GCC:
export CC=gcc-mp-5.4
export CXX=g++-mp-5.4
(2b) If GCC is not installed, it can be installed through Homebrew or MacPorts.
brew tap homebrew/versions
brew install [flags] gcc54
To list available flags
brew options gcc54
Installing Xcode (free through the App Store) and Xcode command line tools will automatically install the latest version of Clang supported with Xcode.
After installing Xcode, the Xcode command line tools may be installed via:
Xcode -> Preferences -> Downloads
Under "Components", click to install "Command Line Tools"
MS Visual Studio Community edition and CMake for Windows are required for building SortMeRNA.
We tested the build using Visual Studio 15 2017 Win64
and Visual Studio 14 2015 Win64
(1) Download and Install VS Community edition from Visual Studio community website
(2) Install CMake
CMake can be installed using either Windows Installer or binaries from archive. Download binary distributions from here
If you choose portable binaries (not the installer) e.g. cmake-3.11.0-rc1-win64-x64.zip, just download and extract the archive in a directory of your choice e.g.
C:\libs\cmake-3.11.0-rc1-win64-x64\
bin\
doc\
man\
share\
The bin
directory above contains cmake.exe
and cmake-gui.exe
. Add the bin
directory to your PATH
Start cmd
and
set PATH=C:\libs\cmake-3.11.0-rc1-win64-x64\bin;%PATH%
cmake --version
cmake-gui
(3) Install Git for Windows
Download binary distribution either portable or the installer from here
The portable distribution is a self-extracting archive that can be installed in a directory of your choice e.g.
C:\libs\git-2.16.2-64\
bin\
cmd\
dev\
etc\
mingw64\
tmp\
usr\
You can use either bash.exe
or native Windows CMD cmd.exe
.
If you choose to work with CMD, add the following to your path:
set GIT_HOME=C:\libs\git-2.16.2-64
set PATH=%GIT_HOME%\bin;%GIT_HOME%\usr\bin;%GIT_HOME%\mingw64\bin;%PATH%
git --version
(4) Clone the GIt repository
git clone https://github.com/biocore/sortmerna.git
(5) Prepare the build files:
On Windows we recommend using the cmake-gui
utility.
Either navigate to CMake installation directory (using Windows Explorer) and double-click
cmake-gui
, or launch it from command line as shown below:
set PATH=C:\libs\cmake-3.11.0-rc1-win64-x64\bin;%PATH%
cmake-gui
In the CMake GUI
- click
Browse source
button and navigate to the directory where Sortmerna sources are located (SMR_HOME). - click
Browse Build
and navigate to the directory where to build the binaries e.g. %SMR_HOME%\build - at the prompt select the Generator from the list e.g. "Visual Studio 15 2017 Win64"
- click
Configure
- Set the following variables:
- ZLIB_INCLUDE_DIR=%SMR_HOME%/3rdparty/zlib
- ZLIB_LIB_DEBUG=%SMR_HOME%/3rdparty/zlib/build/Debug
- ZLIB_LIB_RELEASE=%SMR_HOME%/3rdparty/zlib/build/Release
- ROCKSDB_INCLUDE_DIR=%SMR_HOME%/3rdparty/rocksdb/include
- ROCKSDB_LIB_DEBUG=%SMR_HOME%/3rdparty/rocksdb/build/Debug
- ROCKSDB_LIB_RELEASE=%SMR_HOME%/3rdparty/rocksdb/build/Release
- click
Configure
again - click
Generate
if all variables were set OK (no red background)
The Generate
generates VS project files in %SMR_HOME%\build\
directory.
%SMR_HOME%
is the top directory where SortMeRNA source distribution (e.g. Git repo) is installed.
(6) Configure and build Zlib library
When Cmake-gui Configure
is run it downloads required 3rd party source packages into %SMR_HOME%\3rdparty\
directory.
In Cmake-gui:
- click
Browse Source...
and select%SMR_HOME%\3rdparty\zlib\
- click
Browse Build...
and select%SMR_HOME%\3rdparty\zlib\build\
(confirm to create thebuild
directory if not already exists) - click
Configure
and set the required variables or accept defaults - click
Generate
In Visual Studio
File -> Open -> Project/Solution
and select%SMR_HOME%\3rdparty\zlib\build\zlib.sln
- In Solution Explorer right-click
ALL_BUILD
and selectbuild
from drop-down menu
(7) COnfigure and build RockDB library
In Cmake-gui:
- click
Browse Source...
and select%SMR_HOME%\3rdparty\rocksdb\
- click
Browse Build...
and select%SMR_HOME%\3rdparty\rocksdb\build\
(confirm to create thebuild
directory if not already exists) - click
Configure
and set the following variables:- Ungrouped Entries
- PORTABLE (checkbox)
- GIT_EXECUTABLE (select path to
git.exe
e.g.C:/libs/git-2.16.2-64/bin/git.exe
- WITH
- WITH_MD_LIBRARY
- WITH_ZLIB
- Accept defaults for the rest
- Ungrouped Entries
- click
Generate
In Visual Studio
File -> Open -> Project/Solution
and select%SMR_HOME%\3rdparty\rocksdb\build\rocksdb.sln
- In Solution Explorer right-click
ALL_BUILD
and selectbuild
from drop-down menu
(8) Build SormeRNA
In Visual Studio:
File -> Open -> Project/Solution .. open %SMR_HOME%\build\sortmerna.sln
- Select desired build type:
Release | Debug | RelWithDebInfo | MinSizeRel
. - In Solution explorer right-click
ALL_BUILD' and select
build` in drop-down menu.
Depending on the build type the binaries are generated in
%SMR_HOME%\build\src\sortmerna\Release
(or Debug | RelWithDebInfo | MinSizeRel
).
(9) Add sortmerna executables to PATH
set PATH=%SMR_HOME%\build\src\indexdb\Release;%SMR_HOME%\build\src\sortmerna\Release;%PATH%
Python code is provided for running tests in $SRM_HOME/tests (%SRM_HOME%\tests) and requires Python 3.5 or higher.
Tests can be run with the following command:
python ./tests/test_sortmerna.py
python ./tests/test_sortmerna_zlib.py
Make sure the data
folder is in the same directory as test_sortmerna.py
Users require scikit-bio 0.5.0 to run the tests.
Various features in SortMeRNA are dependent on third-party libraries, including:
- ALP: computes statistical parameters for Gumbel distribution (K and Lambda)
- CMPH: C Minimal Perfect Hashing Library
- Zlib: reading compressed Reads files
- RocksDB: storage for SortmeRNA alignment results
- RapidJson: serialization of Reads objects to store in RocksDB
- Concurrent Queue: Lockless buffer for Reads accessed from multiple processing threads
Thanks to Björn Grüning and Nicola Soranzo, an up-to-date Galaxy wrapper exists for SortMeRNA. Please visit Björn's github page for installation.
Thanks to the Debian Med team, SortMeRNA 2.0 is now a package in Debian. Thanks to Andreas Tille for the sortmerna and indexdb_rna man pages (version 2.0). These have been updated for 2.1 in the master repository.
Thanks to Ben Woodcroft for adding SortMeRNA 2.1 to GNU Guix, find the package here.
SortMeRNA 2.0 can be used in QIIME's pick_closed_reference_otus.py, pick_open_reference_otus.py and assign_taxonomy.py scripts.
Note: At the moment, only 2.0 is compatible with QIIME.
The folder rRNA_databases/silva_ids_acc_tax.tar.gz
contains SILVA taxonomy strings (extracted from XML file generated by ARB)
for each of the reference sequences in the representative databases. The format of the files is three tab-separated columns,
the first being the reference sequence ID, the second being the accession number and the final column is the taxonomy.
If you use SortMeRNA, please cite: Kopylova E., Noé L. and Touzet H., "SortMeRNA: Fast and accurate filtering of ribosomal RNAs in metatranscriptomic data", Bioinformatics (2012), doi: 10.1093/bioinformatics/bts611.
See AUTHORS for a list of contributors to this project.