Skip to content

Latest commit

 

History

History
91 lines (64 loc) · 3.54 KB

TroubleShooting.md

File metadata and controls

91 lines (64 loc) · 3.54 KB

TOGA troubleshooting

TOGA's configure script automatically tries to install all dependencies. If you encounter error messages related to these two dependencies, please see below for help.

  1. XGBoost
  2. Nextflow

Do note that previous TOGA versions used BerkeleyDB. Now it has been replaced by HDF5.

XGBoost

Sometimes xgboost installation with pip doesn't work and shows a message like:

Command "/usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/genome/scratch/tmp/pip-install-g6qbjl5j/xgboost/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /genome/scratch/tmp/pip-record-4dhjvr_9/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /genome/scratch/tmp/pip-install-g6qbjl5j/xgboost/

One of solutions is to compile XGBoost from sources, as explained here:

https://xgboost.readthedocs.io/en/latest/build.html

Please note that xgboost requires CMake >=3.13 for build.

Nextflow

There are several nextflow-related issues that you might encounter. If you have any issues with installation most likely this is something related to java version. The simplest solution would be to install nextflow using conda:

conda install -c bioconda nextflow

This will automatically add nextflow executable to $PATH. Alternatively, you can install java using the following command:

sudo apt install openjdk-8-jre-headless

And then install nextflow using curl | bash command.

Nextflow also might show the following error message:

Can't open cache DB: /lustre/projects/project-xxx/.nextflow/cache/a80d212d-5a68-42b0-a8a5-d92665bdc492/db

Nextflow needs to be executed in a shared file system that supports file locks.
Alternatively you can run it in a local directory and specify the shared work
directory by using by `-w` command line option.

In this case nextflow is not able to write temporary files and logs in the current directory. Probably you can find a solution together with your system administrator, however, there could be substantive reasons for disabling file locks. As a workaround, you can do the following:

  1. Find a directory outside the cluster file system that you can access, that could be /home/$username, /tmp/$username or something like this. Then create some directory inside, let's say "nextflow_temp". Nextflow writes quite a lot of hidden files so it could be reasonable.
  2. When you call toga.py, add --nd flag to the command, such as --nd /home/user/nextflow_temp

In this case TOGA will call nextflow from the specified directory.

If something doesn't work and you like to configure managing cluster jobs yourself then please have a look at the following functions in the toga.py script:

  1. __chain_genes_run: this function pushes cluster jobs to extract chain features.
  2. __run_cesar_jobs: this one is responsible for calling CESAR jobs.

To execute a batch of jobs in parallel TOGA creates a temporary text file containing commands that might be executed independently, it looks like this:

./script.py input/part_1.txt output/part_1.txt
./script.py input/part_2.txt output/part_2.txt
./script.py input/part_3.txt output/part_3.txt
./script.py input/part_4.txt output/part_4.txt
...

Then TOGA pushes these jobs to cluster queueing system and waits until they are done. Please note that each line of this file contains a complete and independent command. It means that these commands could be sequentially executed even like this:

bash jobslist.txt