Skip to content

Installation

Jose Manuel Martí edited this page Jan 14, 2023 · 25 revisions

Contents

Step-by-step installation

  1. Basic requirements
  2. Getting the code
  3. Getting the databases
  4. Testing (optional step)
  5. Ready!
  6. Interactive visualization of Recentrifuge results
  7. Numeric results for downstream applications

Notes

Step-by-step installation

1. Basic requirements

Python 3.6 or higher is required, but the minimum recommended version is Python 3.8, with higher versions until Python 3.11 supported. If you need help with the installation and setup of Python, please consult Python Setup and Usage. No modules beyond Python Standard Library ones are used by Recentrifuge with the exception of biopython and, optionally:

  • pandas for exporting results to CSV or TSV as extra files or for testing Recentrifuge.
  • openpyxl package is also required, additionally, for pandas to export results in Excel format.
  • matplotlib and xlrd are needed in addition to the previous packages for comprehensive testing the Recentrifuge package.

Installing Recentrifuge inside a virtual environment is not mandatory but, in some circumstances, it could avoid installation issues. If you need them, you have detailed instructions in preparing a virtualenv for Recentrifuge.

2. Getting the code

  • Option 0: Install using the conda package.

  • Option 1: Get and install the Recentrifuge PyPI package.

$ pip install recentrifuge
  • Option 2: Clone the Recentrifuge repository on GitHub and get the required (and recommended) dependencies.
    $ git clone https://github.com/khyox/recentrifuge.git
    
    $ pip install biopython
    
    • To test Recentrifuge or to be able to export results to CSV, TSV, or Excel, you will also need some additional packages, which you can install easily with pip:
    $ pip install numpy openpyxl xlrd matplotlib pandas
    

Should you need help installing pandas or openpyxl, please check pandas installation instructions or openpyxl installation instructions.

3. Getting the databases

In the cloning dir, execute retaxdump. It will download and unzip the required local databases from NCBI servers under the subdirectory taxdump. For the importance of keeping the NCBI database updated, please check this. For LMAT plasmids support, please read this.

4. Testing (optional step)

Recentrifuge development is tested by an automatic continuous integration system (check Recentrifuge's Travis CI page for details). Please see comprehensive instructions about testing and validating your Recentrifuge installation here.

5. Ready!

At this point, Recentrifuge is ready to analyze your samples:

6. Interactive visualization of Recentrifuge results

Just open the HTML file generated by Recentrifuge with any JavaScript-enabled browser. Firefox or Chrome are recommended.

7. Numeric results for downstream applications

Recentrifuge generates CSV/TSV extra files or an Excel file with various sheets containing diverse statistics and detailed numeric results useful for downstream applications.

Notes

Python version

Support for Python 3.6 will be dropped very soon. Support for Python 3.7 will be dropped soon after its support end of life in 2023.

Python version under 3.6 is no supported as Recentrifuge uses new syntax features of Python 3.6, like syntax for variable annotations (PEP 526) and formatted string literals (PEP 498). The syntax for type annotations was introduced in Python 3.5 (PEP 484) but it is with Python 3.6 when it has achieved maturity for variable annotations. Powerful tools for static type analysis in Python have evolved along with these standards. The development of Recentrifuge includes checks with pylint and mypy. A code whose aim is to perform a robust comparative metagenomic analysis is a very good candidate for robust coding.

Support for LMAT plasmids classification

One of the most interesting but still quite unknown features of the LMAT software is its ability to properly classify over 4000 plasmids. These plasmids are assigned a taxonomical id (taxid) beyond the NCBI system. Recentrifuge offers support for this extended classification but requires the LMAT provided file plasmid.names.txt located in the same directory as the NCBI nodes information files. This location is controlled by the flag -n/--nodespath.

If Recentrifuge finds the plasmid.names.txt file, it will parse the plasmids taxid and name using ad hoc regular expressions in order to present the user with a meaningful name for the very diverse plasmids. Recentrifuge is also doing a check to assure that every plasmid in the file is compatible with the NCBI taxonomy used so that only those passing are added. Further details here.

Preparing a virtualenv for Recentrifuge

Before you can follow the step-by-step instructions below, you need pyenv installed in your system. If you don't have it installed on your computer, please see pyenv installation instructions.

If you already have it on your system, please update pyenv:

> pyenv update

Get the filtered list of available kernels:

> pyenv install --list | grep " 3\.[891]"

Install the desired Python version (for example, 3.11.1):

> pyenv install 3.11.1

Check versions to be sure that Python 3.11.1 is now available:

> pyenv versions

Create and check the new virtual environment:

> pyenv virtualenv 3.11.1 rcf_3.11.1
> pyenv virtualenvs

Activate the virtual environment and update pip:

> pyenv activate rcf_3.11.1
> pip install --upgrade pip

Now you can easily proceed to install Recentrifuge (see Option 1 under Getting the code) in your new virtual environment.