Skip to content

Installation

Guanliang MENG edited this page Dec 13, 2023 · 184 revisions

The most straightforward method to install MitoZ is by utilizing Apptainer/Singularity and Docker (or Udocker) MitoZ images. These approaches can help you circumvent various potential installation issues. Additionally, the Conda-Pack version could prove beneficial. Unlike the typical Conda installation method, the Conda-Pack tar file contains all the necessary dependencies, eliminating the need to download them separately from the website. This feature is handy if you have limited network connectivity. It's important to note that while Conda is a valuable tool, it may NOT always resolve all dependency problems, which is the reason I have provided various alternative installation methods here.

Here are some tips to ensure a successful installation:

  1. After completing the installation, it is recommended to run the test dataset first. This step helps verify that the installation was successful and that no issues are present. You can find instructions for running the test dataset here.

  2. If one installation method fails, don't hesitate to try alternative approaches. Experimenting with different methods increases the chances of finding a suitable installation option.

  3. If you or your servers prefer not to use Singularity/Docker or encounter difficulties with the Conda installation method, consider attempting the Conda-Pack installation first. This alternative could be a viable solution in such cases. You can find detailed instructions for the Conda-Pack installation here.

1. Docker

1.1 Install Docker

Any platform (e.g. Linux, Mac or Windows) on which Docker is able to run should be able to run MitoZ via the MitoZ Docker image. This also applies to Singularity.

Please refer to https://docs.docker.com/.

1.2 Download the MitoZ image

$ docker pull guanliangmeng/mitoz:3.6
# or 
$ docker pull guanliangmeng/mitoz:3.4
  • with the Docker image, you don't need to install the etetoolkit (NCBI Taxonomy) database by yourself, everything has been packaged into the Docker image.

  • In docker, image is different from container, when you run an image, docker actually creates a container based on that image, therefore, you are actually running the newly created container. Multiple containers can be created and run from the same image at the same time, each with a unique container ID (use docker ps to check running containers). Therefore, we usually add the --rm option to delete the containers after we get our analysis done (the original image is still there)

1.3 Run the container

$PWD is an environmental variable of your current terminal window, its value is the absolute path of your current directory. This means that when you change to another directory, its value will automatically change at the same time.

In the working directory (i.e. $PWD) (the fastq files should be in there) of your terminal, execute:

# Go to your directory where your raw data (fastq) files are located
$ cd /your/working/directory/
$ ls 
  sample1.R1.fq.gz sample1.R2.fq.gz

# You can check the value of your current $PWD
$ echo $PWD

$ docker run -v $PWD:$PWD -w $PWD --rm guanliangmeng/mitoz:3.6 mitoz -h
$ docker run -v $PWD:$PWD -w $PWD --rm guanliangmeng/mitoz:3.6 mitoz-tools -h

# For example:
$ docker run -v $PWD:$PWD -w $PWD --rm guanliangmeng/mitoz:3.6  mitoz all --fq1 $PWD/sample1.R1.fq.gz --fq2 $PWD/sample1.R2.fq.gz ...

The -v $PWD:$PWD here means mounting your current host directory into the $PWD of the Docker container. Only in this way, can you access the files under the $PWD of your host machine within the docker container. But within the docker container, we won't be able to access any other files (or soft-links or maybe hard-links) outside the $PWD directory of your host machine.

Multiple -v options can be used at the same time, for example, if your fastq files are in /pool/data/ and you are NOT in this directory now but you want to access these files within the Docker container, you can do:

$ docker run -v $PWD:$PWD -v /pool/data/:/pool/data/ -w $PWD --rm guanliangmeng/mitoz:3.6 mitoz -h
# for example:
$ docker run -v $PWD:$PWD -v /pool/data/:/pool/data/ -w $PWD --rm guanliangmeng/mitoz:3.6 mitoz all --fq1 /pool/data/sample1.R1.fq.gz --fq2 /pool/data/sample1.R2.fq.gz ...

Known bugs:

For some reason, the default shell used in the Mitoz 3.5 Docker image is NOT bash, which leads to the missing annotation of tRNA genes (e.g. https://github.com/linzhi2013/MitoZ/issues/187). So please use either Mitoz 3.6 or 3.4 or 2.3 instead.

Workaround: use the 1.4 methods below and do something before running MitoZ.

I will rebuild the image asap.

1.4 You can also shell into the container

In your host working directory (i.e. $PWD) (the fastq files should be in there and they are NOT soft-links pointing to other directories!!!), shell into the container:

$ cd /your/working/directory/
$ echo $PWD
$ docker run -v $PWD:$PWD -w $PWD --rm -it guanliangmeng/mitoz:3.4
# for mitoz 3.6, use this:
$ docker run -v $PWD:$PWD -w $PWD --rm -it guanliangmeng/mitoz:3.6 /bin/bash

To learn more about Docker usage, please go to https://docs.docker.com/.

1.5 Installation location within the Docker image

With the Docker image, MitoZ (version 3.4) is installed /app/anaconda/bin/mitoz and /app/anaconda/lib/python3.9/site-packages/mitoz:

$ docker run -it -v  $PWD:$PWD -w $PWD --rm guanliangmeng/mitoz:3.4
root@cb99de738f74:/Users/gmeng# ls -lhrt /app/anaconda/lib/python3.9/site-packages/mitoz
total 48K
-rw-rw-r-- 2 root root   12 Jun 10 08:54 __init__.py
-rw-rw-r-- 2 root root 3.2K Jun 10 08:54 MitoZ.py
drwxr-xr-x 4 root root 4.0K Jul  1 13:47 annotate
drwxr-xr-x 3 root root 4.0K Jul  1 13:47 utility
drwxr-xr-x 4 root root 4.0K Jul  1 13:47 assemble
drwxr-xr-x 3 root root 4.0K Jul  1 13:47 all
drwxr-xr-x 3 root root 4.0K Jul  1 13:47 visualize
drwxr-xr-x 7 root root 4.0K Jul  1 13:47 tools
drwxr-xr-x 6 root root 4.0K Jul  1 13:47 profiles
drwxr-xr-x 4 root root 4.0K Jul  1 13:47 findmitoscaf
drwxr-xr-x 3 root root 4.0K Jul  1 13:47 filter
drwxr-xr-x 2 root root 4.0K Jul  1 13:47 __pycache__

Or you can find it out by yourself:

$ docker run -it -v  $PWD:$PWD -w $PWD --rm guanliangmeng/mitoz:3.6

# Now we enter the docker container:

root@67f2dbb26f08:/# alias ll='ls -lhtr'

root@67f2dbb26f08:/# which mitoz
/usr/local/bin/mitoz

root@67f2dbb26f08:/# which mitoz-tools
/usr/local/bin/mitoz-tools

root@67f2dbb26f08:/# ll /usr/local/lib/python3.9/site-packages/mitoz
total 48K
-rw-rw-r--  1 root root   12 Jan  6 10:09 __init__.py
-rw-rw-r--  1 root root 3.6K Jan  6 10:09 MitoZ.py
drwxr-xr-x  3 root root 4.0K Jan  6 10:16 visualize
drwxr-xr-x  3 root root 4.0K Jan  6 10:16 utility
drwxr-xr-x 12 root root 4.0K Jan  6 10:16 tools
drwxr-xr-x  6 root root 4.0K Jan  6 10:16 profiles
drwxr-xr-x  4 root root 4.0K Jan  6 10:16 findmitoscaf
drwxr-xr-x  3 root root 4.0K Jan  6 10:16 filter
drwxr-xr-x  4 root root 4.0K Jan  6 10:16 assemble
drwxr-xr-x  4 root root 4.0K Jan  6 10:16 annotate
drwxr-xr-x  3 root root 4.0K Jan  6 10:16 all
drwxr-xr-x  2 root root 4.0K Jan  6 10:16 __pycache__

And MitoZ's database is at:

root@cb99de738f74:/Users/gmeng# ls -lhrt /app/anaconda/lib/python3.9/site-packages/mitoz/profiles/
total 16K
-rw-rw-r-- 2 root root    0 Jun 10 08:54 __init__.py
drwxr-xr-x 2 root root 4.0K Jul  1 13:47 rRNA_CM
drwxr-xr-x 2 root root 4.0K Jul  1 13:47 MT_database
drwxr-xr-x 2 root root 4.0K Jul  1 13:47 CDS_HMM
drwxr-xr-x 2 root root 4.0K Jul  1 13:47 __pycache__

If you want to copy this database out of the Docker image, do:

$ cd ~
$ mkdir mitoz_custom_db
$ docker run -v $PWD:$PWD -w $PWD --rm -it guanliangmeng/mitoz:3.4 
root@cb99de738f74:/Users/gmeng# cp -a /app/anaconda/lib/python3.9/site-packages/mitoz/profiles mitoz_custom_db
$ exit

# This way, the 'profiles' directory is copied to the  'mitoz_custom_db' of your host machine.

# Later, if you want to use the '--profiles_dir' option, you need to use Docker's '-v' option
# to map this host's 'mitoz_custom_db' directory into the Docker container via

$ docker run -v $PWD:$PWD -v ~/mitoz_custom_db:/mitoz_custom_db/ -w $PWD --rm -it guanliangmeng/mitoz:3.4
# Then within the Docker container:
root@cb99de738f74:/Users/gmeng# mitoz --profiles_dir /mitoz_custom_db/profiles <other options>

See also https://github.com/linzhi2013/MitoZ/wiki/Extending-MitoZ%27s-database.

2. Udocker

Unlike, Singularity and Docker, you don't need root/sudo privilege to install or/and run Udocker!!

2.1 Installation of Udocker

https://github.com/indigo-dc/udocker

For example,

$ mkdir /home/gmeng/soft/
$ cd /home/gmeng/soft/
$ wget https://github.com/indigo-dc/udocker/releases/download/v1.3.1/udocker-1.3.1.tar.gz
$ tar zxvf udocker-1.3.1.tar.gz
$ export PATH=`pwd`/udocker:$PATH 
$ which udocker
/home/gmeng/soft/udocker/udocker
$ udocker install

You can add the udocker command to your PATH environmental variable:

$ echo 'export PATH="/home/gmeng/soft/udocker/:$PATH"' >>~/.bashrc
$ source ~/.bashrc

Keep in mind that Udocker installs dependencies (and images) into your ~/.udocker/ directory. If your HOME directory has limited space, you can move this directory to another place, then use ln -s command to link it back to your HOME directory.

Go to https://github.com/indigo-dc/udocker for more details.

2.2 Download the MitoZ container

$ udocker pull guanliangmeng/mitoz:3.4
# or
$ udocker pull guanliangmeng/mitoz:3.6

$ udocker images
  REPOSITORY
  guanliangmeng/mitoz:3.4 

The usage of Udocker is similar to Docker, simply replace the docker command with udocker. Please refer to the above Docker part.

Go to https://indigo-dc.github.io/udocker/user_manual.html for more detail about Udocker usage.

3. Apptainer/Singularity

3.1 Install Apptainer/Singularity

See https://www.sylabs.io/docs/ or https://apptainer.org/ for instructions to install Apptainer/Singularity.

Apptainer was formerly known as Singularity and is now a part of the Linux Foundation. See https://github.com/apptainer/apptainer.

Any platform (e.g. Linux, Mac or Windows) on which Singularity is able to run should be able to run MitoZ via the MitoZ Singularity image. This also applies to Docker.

For the installation of Singularity on Mac or Windows, please refer to https://docs.sylabs.io/guides/3.2/user-guide/installation.html#install-on-windows-or-mac.

Note: according to the official documentation (Oct. 2019), the Singularity must be installed with root privilege. For the non-root installation, please refer to https://docs.sylabs.io/guides/3.6/admin-guide/user_namespace.html#unprivileged-installations, it has some requirements though, and you should ask your IT administrator to help you.

And the Singularity installed via conda (e.g. conda install -c bioconda singularity) may not work (at least when installing as normal users)!

How about Singularity on Mac and Windows?

MitoZ only runs on Linux systems, although some of its functions can now run on Mac or Windows.

Why do we want to run MitoZ on Mac and Windows? There are two main reasons: (1) With the two new de novo assemblers and small datasets, it is now possible to perform mitogenome assembly on a Mac or Windows with 16GB or 32GB RAM theoretically; (2) and actually only the mitoz all and mitoz assemble commands need much memory, all the other commands (mitoz filter/findmitoscaf/annotate/visualize or mitoz-tools`) need very little memory and thus can run on normal Mac or Windows (e.g. with 8GB RAM), and sometimes for these analyses, you do not want to upload the data to a Linux server. (3) MitoZ can now be installed on Mac via Conda (some assemblers might not work though)

3.2 Download the MitoZ image

You can download a pre-built Singularity (https://sylabs.io/) image from https://www.dropbox.com/sh/mqjqn656x41q2wb/AAD02t_kUCjNHbBgCeYpEM88a?dl=0 (**only for version 3.4); https://pan.baidu.com/s/1YIULJ9H3BeWKcIZdMZpcuw?pwd=7r9d (提取码:7r9d) (MitoZ version 3.2 and newer versions).

More easily, you can pull the image from the Docker Hub directly (so you can get the latest version).

$ singularity pull MitoZ_v3.6.sif docker://guanliangmeng/mitoz:3.6
# or
$ singularity pull MitoZ_v3.4.sif docker://guanliangmeng/mitoz:3.4
  • FYI. When I tried to run the MitoZ_v3.5.sif in a Ubuntu system within the Parallel Desktop on a Mac OS (M1 chip), I got the error the image's architecture (amd64) could not run on the host's (arm64).

  • After downloading MitoZ, you still need to install the etetoolkit (NCBI Taxonomy) database, especially when the automatic installation does not work for you. See 6. The Etetoolkit database section below.

Within the Singularity image, MitoZ is installed at /app/anaconda/bin/mitoz and /app/anaconda/lib/python3.9/site-packages/mitoz. MitoZ's annotation database is at /app/anaconda/lib/python3.9/site-packages/mitoz/profiles.

3.3 Usage 1

$ /path/to/MitoZ_v3.4.sif -h
# For example, to use the `all` subcommand:
$ /path/to/MitoZ_v3.4.sif all -h

# but for MitoZ 3.6, use this:
$ /path/to/MitoZ_3.6.sif mitoz -h
$ /path/to/MitoZ_3.6.sif mitoz all -h
$ /path/to/MitoZ_3.6.sif mitoz-tools -h

# or 
$ singularity run /path/to/MitoZ_v3.4.sif -h
$ singularity run /path/to/MitoZ_v3.4.sif all -h

# but for MitoZ 3.6, use this:
$ singularity run /path/to/MitoZ_3.6.sif mitoz -h
$ singularity run /path/to/MitoZ_3.6.sif mitoz all -h
$ singularity run /path/to/MitoZ_3.6.sif mitoz-tools -h

However, if you want to use the mitoz-tools command, or if you pull the Singularity image from the docker hub, you need to do it this way:

$ singularity exec /path/to/MitoZ_v3.4.sif mitoz
$ singularity exec /path/to/MitoZ_v3.4.sif mitoz-tools

# To use the `all` command of MitoZ with the `exec` command, do this:
$ singularity exec /path/to/MitoZ_v3.4.sif mitoz all -h

# but for MitoZ 3.6, keep using the 'run' command:
$ singularity run /path/to/MitoZ_3.6.sif mitoz all -h
$ singularity run /path/to/MitoZ_3.6.sif mitoz-tools -h

Like Docker, Singularity also has a mounting problem, to solve the problem, we use the --bind option instead of the -v as in Docker.

By default, Singularity automatically mounts the $PWD and $HOME directories into the Singularity container.

Multiple --bind options can be used at the same time, for example, if your fastq files are in /pool/data/ and you are NOT in this directory now but you want to access these files within the container, you can do:

$ singularity exec --bind /pool/data/ /path/to/MitoZ_v3.4.sif mitoz all -h

You can also 'shell' into the container, as shown by Usage 2 below.

Warning: You will run into errors if your fastq files (-fq1 -fq2) are soft links pointing to other directories when you do not explicitly bind these directories to the container. This is because neither Docekr nor Singularity can assess these files. The best way to solve the problem is like this:

$ singularity exec --bind /pool/data/ /path/to/MitoZ_v3.4.sif mitoz all -fq1 /pool/data/sample.R1.fq.gz -fq2 /pool/data/sample.R2.fq.gz 

3.4 Usage 2

$ mkdir -p /my/workdir/projectID
$ cd /my/workdir/projectID

# The below command assumes your fastq files are located under the `/my/workdir/projectID` directory,
# so within Singularity's shell, you can access these fastq files directly.
$ singularity shell /path/to/MitoZ_v3.4.sif
# After login the container, it is just like you are in another Linux machine,
# so you can use the `mitoz` command directly:
Singularity> which mitoz
/app/anaconda/bin/mitoz
Singularity> mitoz -h
Singularity> mitoz-tools -h
#
# After you finish the analysis, use the `exit` command to exit the container:
Singularity> exit

# However, if your fastq files are located at other places, say `/pool/data/`,
# To let the MitoZ Singularity container can access them, you need to mirror the path into the container using the `--bind` option:
$ cd /my/workdir/projectID
$ singularity shell --bind /pool/data/  /path/to/MitoZ_v3.4.sif

3.5 Copy MitoZ's database out of the Singularity container

Do this only if you want to customize your PCG annotation database, see https://github.com/linzhi2013/MitoZ/wiki/Extending-MitoZ%27s-database for more details.

$ mkidr ~/mitoz_custom_db/
$ singularity shell /path/to/MitoZ_v3.4.sif
Singularity> cp -r /app/anaconda/lib/python3.9/site-packages/mitoz/profiles ~/mitoz_custom_db
Singularity> exit
# Now the `profiles` are under the `~/mitoz_custom_db` of your host machine and you can modify them to create your custom database. 

4. Conda-Pack

The installation of MitoZ via Conda often has missing Perl module problems. If you cannot use the Singularity images nor Docker images methods, you can try this Conda-Pack version. This method is also useful if your server cannot access the Internet (But you still need to install the Etetoolkit taxonomy database by yourself if there is no Internet).

Here I packaged the whole conda environment (including all files) into a file named mitoz3.6.tar.gz using the Conda-Pack tool (https://conda.github.io/conda-pack/).

I created this environment on a Linux machine, thus it should also work on another Linux machine.

Firstly, can download the mitoz3.6.tar.gz file from Dropbox (https://www.dropbox.com/sh/x0xn8of73fub1p7/AAA9RCZe9k-rN2WstUn5cUKia?dl=0). 或者从百度云盘下载 (打开 https://pan.baidu.com/s/1uNLIF1SNrkBJp9EoCMoTDQ?pwd=cqz6 找到版本3.6)

Next,

# Choose a directory on your machine for the installation of MitoZ, e.g. '~/soft/mitoz3.6'
$ mkdir -p ~/soft/mitoz3.6

# then unpack mitoz3.6.tar.gz into this target directory 
$ tar -xzf /path/to/downloaded/mitoz3.6.tar.gz -C ~/soft/mitoz3.6

# Activate the environment
$ source ~/soft/mitoz3.6/bin/activate

# Cleanup prefixes from the active environment.
# Note that this command can also be run without activating the environment
# as long as some version of Python is already installed on the machine.
(mitoz3.6) $ conda-unpack

# At this point the environment is exactly as if you installed it here
# using conda directly. All scripts should work fine.
(mitoz3.6) $ mitoz -h
(mitoz3.6) $ mitoz-tools -h

# Deactivate the environment to remove it from your path when your finish the MitoZ analysis
(mitoz3.6) $ source ~/soft/mitoz3.6/bin/deactivate

Please refer to https://conda.github.io/conda-pack/ for more details.

Now you can go to install the Etetoolkit database

5. Conda

5.1 Installation of Conda and Mamba

Firstly, install Miniconda (https://docs.conda.io/en/latest/miniconda.html) (recommended) or Anaconda (https://www.anaconda.com/products/distribution#Downloads) :

$ curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ sh Miniconda3-latest-Linux-x86_64.sh
# setup channels
$ conda config --add channels defaults
$ conda config --add channels bioconda
$ conda config --add channels conda-forge

$ conda install mamba -n base -c conda-forge  # "mamba" is much much faster than the "conda" command!

5.2 Installation of MitoZ

The Conda version of MitoZ currently only fully functionally runs on Linux.

$ mamba clean -y -a # in case something unknown interferes with our installation
$ conda clean -y -a # in case something unknown interferes with our installation

# It is a good idea to install MitoZ into an independent environment, i.e. 'mitozEnv' here!
$ mamba create -n mitozEnv -c bioconda -c conda-forge mitoz=3.6 # It's recommended to specify the version you want to install!

# Tips: If the above command failed, try this instead:
$ mamba create -n mitozEnv -c bioconda -c conda-forge python=3.8 mitoz=3.6 # It's recommended to specify the version you want to install!


# Note: 
# 1. You can use any other name instead of 'mitozEnv' as the environment name, e.g. 'mitoz3.6', 
#    so you can do 'mamba create -n mitoz3.6 -c bioconda -c conda-forge mitoz=3.6'.
#    Personally, I prefer this way, so you can directly see which version of MitoZ you are using by the environment name.
#.   But for the convenience of this tutorial, I will keep using the name 'mitozEnv'.  
#
# 2. You can also install MitoZ to a specific path, 
#    like 'mamba create -p /share/pool/guanliang/soft/mitoz3.6 -c bioconda mitoz=3.6',
#    and then use 'source activate /share/pool/guanliang/soft/mitoz3.6' to activate the environment. 


$ source activate mitozEnv   # or use "mamba" or "conda" instead of "source" the command here.

$ circos --modules # check if all Perl modules required by circos are installed. Some modules could still be missing (don't know why conda did not fix them automatically). Similar problems can be seen at https://github.com/bioconda/bioconda-recipes/issues/9830

# Now we are ready to go:
$ mitoz # all subcommands are within this command now!

$ mitoz-tools # some useful tools for mitochondrial genome analysis

Now you can go to install the Etetoolkit database

5.3 Location of the installation

If you want to find the path where MitoZ is installed, execute:

$ conda env list
# conda environments:
#
base                  *  /home/guanliang/soft/miniconda3
mitozEnv                 /home/guanliang/soft/miniconda3/envs/mitozEnv

The exact path for me is: /home/guanliang/soft/miniconda3/envs/mitozEnv/lib/python3.7/site-packages/mitoz. For example, this is the path for MitoZ's database:

$ ll /home/guanliang/soft/miniconda3/envs/mitozEnv/lib/python3.7/site-packages/mitoz/profiles
total 16K
-rw-rw-r-- 2 guanliang    0 May 12 06:47 __init__.py
drwxrwxr-x 2 guanliang 4.0K May 24 16:06 CDS_HMM
drwxrwxr-x 2 guanliang 4.0K May 24 16:06 rRNA_CM
drwxrwxr-x 2 guanliang 4.0K May 24 16:06 __pycache__
drwxrwxr-x 2 guanliang 4.0K May 24 17:36 MT_database

See also Extending MitoZ's database.

5.4 Problems

Make sure that you are the owner of the conda/mamba commands, it happened to me that when I used another user's conda command I got a lot of trouble. In this case, you can follow the very beginning instruction of this page and install your own Miniconda/Anaconda.

5.4.1 mamba installation

$ conda install mamba -n base -c conda-forge
Collecting package metadata (current_repodata.json): done
Solving environment: / 
The environment is inconsistent, please check the package plan carefully
The following packages are causing the inconsistency:

  - defaults/linux-64::python-language-server==0.34.1=py38_0
  - defaults/noarch::python-jsonrpc-server==0.3.4=py_1                                                  
\ failed with initial frozen solve. Retrying with flexible solve.

Possible solutions:

$ mamba clean -y -a # in case something unknown interferes with our installation
$ conda clean -y -a # in case something unknown interferes with our installation

and try again.

Or, install the mamba into a separate environment:

$ conda create -n mambaEnv -c conda-forge mamba
# and then use the `mamba` command within this env:
$ conda activate mambaEnv

Finally, you can try to install a new Miniconda (https://docs.conda.io/en/latest/miniconda.html) (recommended) or Anaconda (https://www.anaconda.com/products/distribution#Downloads) at a totally different place.

You can use Google to find the solution that works for you.

Or, you can simply keep using the conda command (just replace the mamba with conda) to install MitoZ, which might cost you extra time though.

5.4.2 Circos Missing Perl Modules

After the mamba create -n mitozEnv -c bioconda mitoz command, you should check if there are some missing Perl modules required by Circos, sometimes they are missing, and I do not know the exact reason.

$ source activate mitozEnv
$ circos --modules # check if all Perl modules required by circos are installed. Some modules could still be missing (don't know why conda did not fix them automatically). Similar problems can be seen at https://github.com/bioconda/bioconda-recipes/issues/9830

# For me, the Perl modules "GD" and "GD::Polyline" were missing (although conda said they have been installed already when I ran 'conda install perl-gd'), I fixed them by running the following three commands:
$ mamba install -c conda-forge pkg-config
$ mamba install -c anaconda gcc_linux-64
$ cpanm install GD
# I will try to fix the circos' problem in bioconda's MitoZ recipe file, but for the moment, please use the above solution, or try the "mitozEnv.yaml" solution below.
# You can use the ”cpanm“ command to install other missing Perl modules if necessary. 
  • A user proposed this solution: https://github.com/linzhi2013/MitoZ/issues/152 for the missing GD module problem. You can test this solution and see if it works, and then leave some comments at https://github.com/linzhi2013/MitoZ/issues/152, so the other users and I can know if this is a universal solution. Thanks a lot for helping to improve the software!

  • Another user reported that after she installed a new Conda, the problem got solved without changing the Perl version (i.e. using 5.26).

Finally, if the above methods don't work for you, then don't waste your time on them, try to use the Singularity images or Docker images instead.

6. Source codes

6.1 Installation of Conda and Mamba

Firstly, install Miniconda (https://docs.conda.io/en/latest/miniconda.html) (recommended) or Anaconda (https://www.anaconda.com/products/distribution#Downloads) :

$ curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ sh Miniconda3-latest-Linux-x86_64.sh
# setup channels
$ conda config --add channels defaults
$ conda config --add channels bioconda
$ conda config --add channels conda-forge

$ conda install mamba -n base -c conda-forge  # "mamba" is much much faster than the "conda" command!

6.2 Installation of dependencies

The first way:

$ mamba clean -y -a # in case something unknown interferes with our installation
$ conda clean -y -a # in case something unknown interferes with our installation
$ mamba env create -n mitozEnv -f https://github.com/linzhi2013/MitoZ/releases/download/3.6/mitozEnv.yml
$ conda activate mitozEnv

# Note: 
# 1. You can use any other name instead of 'mitozEnv' as the environment name, e.g. 'mitoz3.6', 
#.   so you can do 'mamba env create -n mitoz3.6 -f https://github.com/linzhi2013/MitoZ/releases/download/3.6/mitozEnv.yml'
#
# 2. You can also install MitoZ to a specific path, 
#    like 'mamba env create -p /share/pool/guanliang/soft/mitoz3.6 -f https://github.com/linzhi2013/MitoZ/releases/download/3.6/mitozEnv.yml',
#    and then use 'source activate /share/pool/guanliang/soft/mitoz3.6' to activate the environment. 

Then go to section 6.3 below.

The second way:

$ mamba env create -f https://github.com/linzhi2013/MitoZ/releases/download/3.6/mitoz3.6.environment.yml
# which will create an environment named 'mitoz3.6' in your system, and MitoZ has also been installed in it!

# Tips:
# If the above command failed, try
$ conda config --set channel_priority flexible
# and then
$ mamba env create -f https://github.com/linzhi2013/MitoZ/releases/download/3.6/mitoz3.6.environment.yml

# To activate the environment,
$ conda activate mitoz3.6

# If 'conda activate mitoz3.6' does not work for you, you can 
$ conda env list
# to list the path of the environment, for example, mine is '/home/gmeng/.conda/envs/mybase/envs/mitoz3.6'
# and then I will do 
$ source activate /home/gmeng/.conda/envs/mybase/envs/mitoz3.6
# to activate the environment.

If you are using the second way, you can skip section 6.3 below.

6.3 Installation of MitoZ

# Next, please download the newest version of MitoZ source code from https://github.com/linzhi2013/MitoZ/releases/
$ pip install ./mitoz-3.6.tar.gz 
# or 
$ tar -zxvf mitoz-3.6.tar.gz
$ cd mitoz-3.6
$ python3 setup.py install

# Finally, check
$ circos --modules # check if all Perl modules required by circos are installed. Some modules could still be missing (don't know why conda did not fix them automatically). Similar problems can be seen at https://github.com/bioconda/bioconda-recipes/issues/9830

Now you can go to install the Etetoolkit database

6.4 Location of the installation

If you want to find the path where MitoZ is installed, execute:

$ conda env list
# conda environments:
#
base                  *  /home/guanliang/soft/miniconda3
mitozEnv                 /home/guanliang/soft/miniconda3/envs/mitozEnv

The exact path for me is: /home/guanliang/soft/miniconda3/envs/mitozEnv/lib/python3.7/site-packages/mitoz.

The newest version may not always be available on bioconda, because it takes time for the bioconda team to incorporate a new version of software into the bioconda channel, besides, the Bioconda website often does not show the latest available versions or builds. Thus, in this case, you may want to check the https://anaconda.org/bioconda/mitoz/files or use the second method for installation.

7. The Etetoolkit database

  • After the installation of MitoZ, you still need to install the etetoolkit (NCBI Taxonomy) database, especially when the automatic installation does not work for you.

  • Warning: it is reported that a broken etetoolkit (NCBI Taxonomy) database would result in some PCGs not annotated (https://github.com/linzhi2013/MitoZ/issues/89), or MitoZ getting "Error" during the run (e.g. during the findmitoscaf step). Thus, please make sure this database works well before running MitoZ.

  • It is recommended to run the test dataset before applying MitoZ to your own samples, just to make sure your installation is okay. See 8. Running the test dataset.

  • Make sure your HOME directory has more than 700 MB of space available. Otherwise, you may get some error like sqlite3.OperationalEoor: disk I/O error. To solve the problem, do this first:

    1. Create a directory somewhere else that has enough space left:
    $ mkdir /other/place/myetetoolkit
    1. Remove the directory ~/.etetoolkit created by ete3 before (if any):
     $ rm -rf ~/.etetoolkit
    1. Link your new directory to the HOME directory:
    $ ln -s /other/place/myetetoolkit ~/.etetoolkit
    1. Follow the instructions below.

7.1 Installation of the etetoolkit database

Unless you install MitoZ via the Docker method, otherwise you always need further to install the etetoolkit database.

Firstly try:

$ conda activate mitozEnv

$ python3
Python 3.9.7 (default, Sep 16 2021, 08:50:36)
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from ete3 import NCBITaxa
>>> ncbi = NCBITaxa()
>>> exit()

If you are using the Singularity image, you need to shell into the container first:

$ singularity shell /path/to/MitoZ_v3.4.sif
Singularity> python3
Singularity> Python 3.9.7 (default, Sep 16 2021, 08:50:36)
Singularity> [Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Singularity> Type "help", "copyright", "credits" or "license" for more information.
Singularity> >>> from ete3 import NCBITaxa
Singularity> >>> ncbi = NCBITaxa()
Singularity> >>> exit()

Singularity> exit

Now verify the database:

$ conda activate mitozEnv
# or shell into the singularity container:
# $ singularity shell /path/to/MitoZ_v3.4.sif

$ python3
>>> from ete3 import NCBITaxa
>>> a = NCBITaxa()
>>> a.get_name_translator(["Arthropoda"])
{'Arthropoda': [6656]}

If the above works for you, then you are finished and can go to 6. Running the test dataset. Otherwise, please read the below instructions.

If you have trouble downloading and installing the Etetoolkit database, you can download the taxdump.tar.gz file or my pre-built database from https://www.dropbox.com/sh/mqjqn656x41q2wb/AAD02t_kUCjNHbBgCeYpEM88a?dl=0 or from https://pan.baidu.com/s/1YIULJ9H3BeWKcIZdMZpcuw?pwd=7r9d (提取码:7r9d).

Then execute:

$ conda activate mitozEnv

$ python3
Python 3.9.7 (default, Sep 16 2021, 08:50:36)
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from ete3 import NCBITaxa
>>> ncbi = NCBITaxa(taxdump_file='/path/to/downloaded/taxdump.tar.gz')
Loading node names...
2424313 names loaded.
277227 synonyms loaded.
Loading nodes...
2424313 nodes loaded.
Linking nodes...
Tree is loaded.
Updating database: /Users/gmeng/.etetoolkit/taxa.sqlite ...
 2424000 generating entries...
Uploading to /Users/gmeng/.etetoolkit/taxa.sqlite

Inserting synonyms:      275000
Inserting taxid merges:  65000
Inserting taxids:       2420000
>>> exit()

$ ls -lhrt ~/.etetoolkit/
total 1171272
-rw-r--r--  1 gmeng  staff    12M Jun  2 11:35 taxa.sqlite.traverse.pkl
-rw-r--r--  1 gmeng  staff   558M Jun  2 11:36 taxa.sqlite

However, if you got something like this:

>>> from ete3 import NCBITaxa
>>> ncbi = NCBITaxa(taxdump_file='taxdump.tar.gz')
Loading node names...
2424313 names loaded.
277265 synonyms loaded.
Loading nodes...
2424313 nodes loaded.
Linking nodes...
Tree is loaded.
Updating database: /home/guanliang/.etetoolkit/taxa.sqlite ...
 2424000 generating entries...
Uploading to /home/guanliang/.etetoolkit/taxa.sqlite

Inserting synonyms:      75000 Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/app/anaconda/lib/python3.9/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 106, in __init__
    self.update_taxonomy_database(taxdump_file)
  File "/app/anaconda/lib/python3.9/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 131, in update_taxonomy_database
    update_db(self.dbfile, taxdump_file)
  File "/app/anaconda/lib/python3.9/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 760, in update_db
    upload_data(dbfile)
  File "/app/anaconda/lib/python3.9/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 802, in upload_data
    db.execute("INSERT INTO synonym (taxid, spname) VALUES (?, ?);", (taxid, spname))
sqlite3.IntegrityError: UNIQUE constraint failed: synonym.spname, synonym.taxid

There are some bugs in the ETE 3.1.1 package, you got this problem because you installed MitoZ (<= 3.3) via conda or mamba commands, and unfortunately, at the early builds on BioConda, I wrongly specified ete3=3.1.1, I should use ete3>=3.1.2 instead.

In this case, you can download my pre-build version of the etetoolkit database (filename: etetoolkit.tgz) from https://www.dropbox.com/sh/mqjqn656x41q2wb/AAD02t_kUCjNHbBgCeYpEM88a?dl=0 or from https://pan.baidu.com/s/1YIULJ9H3BeWKcIZdMZpcuw?pwd=7r9d (提取码:7r9d), and then:

$ mv /path/to/etetoolkit.tgz ~
$ cd ~
$ rm -rf ~/.etetoolkit
$ tar -zxvf etetoolkit.tgz

OR, you can upgrade MitoZ

  • via the mamba env create -n mitozEnv -f mitozEnv.yaml method (see the beginning)
  • via the mamba create -n mitozEnv -c bioconda mitoz=3.4 command (see the beginning)
  • get out of the mitozEnv environment (conda deactivate mitozEnv), then install an ete3 in this 'base' environment via mamba install -c conda-forge ete3>=3.1.2. And then, use the Python and ete3 in this 'base' environment to create the etetoolkit database by following the beginning part of 4-the-etetoolkit-database.

7.2 Upgrading the local database

When you want to upgrade the etetoolkit database you can check this. See http://etetoolkit.org/docs/latest/tutorial/tutorial_ncbitaxonomy.html#upgrading-the-local-database

$ python3
Python 3.9.7 (default, Sep 16 2021, 08:50:36)
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from ete3 import NCBITaxa
>>> ncbi = NCBITaxa()
>>> ncbi.update_taxonomy_database()
>>> exit()

Or you can also download the latest file from https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz,

$ wget -c https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz

$ python3
Python 3.9.7 (default, Sep 16 2021, 08:50:36)
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from ete3 import NCBITaxa
>>> ncbi = NCBITaxa(taxdump_file='/path/to/downloaded/taxdump.tar.gz')
>>> exit()

8. About the thread number and data size

  • There is a bug for the --data_size_for_mt_assembly option. --data_size_for_mt_assembly 5 actually means to extract 50 Gb data. So if you want only 5 Gb data, use --data_size_for_mt_assembly 0.5 instead!

  • For MitoAssemble (--assembler mitoassemble), using 8 to 16 threads + 2 to 8 G bp fastq data is good enough, for example --thread_number 8, or --thread_number 12. A bigger thread could take a lot of RAM (e.g. 150 GB) for the assembly step.

    • More data does not necessarily mean better mitogenome
    • Too many threads do not necessarily mean faster.
  • For Megahit (--assembler megahit),

    • When tested 5 Gbp data with 4 threads and set --memory 20, megahit actually took up to 32 G RAM.

    • When tested 15 G bp fastq data with 16 threads, and set --memory 50, and it took around 50 GB RAM, so you can use more data and threads with Megahit.

    • While memory usage with more data (e.g. 15G bp) seems not to be a big problem for Megahit, using more data does take more time, so it is recommended to use fewer data to save time.

    • You can also increase --memory usage to save time if your servers have enough RAM.

  • For Spades (--assembler spades), I did not record the RAM usage, which may be similar to Megahit?

How to check how much RAM MitoZ uses?

You can use the top or htop (recommended; https://anaconda.org/conda-forge/htop) to check how many resources MitoZ uses if you are running MitoZ on your server; or you can use the qstat command if you are using an SGE cluster.

9. Running the test dataset

Before applying MitoZ to your own samples, it is important to run MitoZ on the test dataset.

$ mkdir ~/test
$ cd ~/test

$ wget -c https://raw.githubusercontent.com/linzhi2013/MitoZ/master/test/test.R1.fq.gz 
$ wget -c https://raw.githubusercontent.com/linzhi2013/MitoZ/master/test/test.R2.fq.gz

$ conda activate mitozEnv
$ mitoz all  \
--outprefix test \
--thread_number 4 \
--clade Chordata \
--genetic_code 2 \
--species_name "Homo sapiens" \
--fq1 test.R1.fq.gz \
--fq2 test.R2.fq.gz \
--fastq_read_length 151 \
--data_size_for_mt_assembly 3,0 \
--assembler megahit \
--kmers_megahit 71 99 \
--memory 50 \
--requiring_taxa Chordata

The above command takes around 12 minutes and 1.1 GB RAM to finish.

You can then analyze your samples by following the Tutorial

Clone this wiki locally