Skip to content

Installation

Adam English edited this page Jun 7, 2022 · 2 revisions

BioGraph software is designed to run on modern Linux systems. For a full list of requirements, see System Requirements.

The BioGraph program and classifier model are distributed in separate files.

  • BioGraph-7.0.0.tgz: the BioGraph program and support libraries, available via GitHub.
  • biograph_model-7.0.0.ml: the variant classifier model, downloadable via https or s3 at s3://spiral-archive/models/biograph_model-7.1.0.ml

The BioGraph program is required for all installations. The classifier model is approximately 7GB, and is required for the qual_classifier step.

Docker support

If your system supports Docker, you can run BioGraph directly via Docker Hub:

$ docker run spiralgenetics/biograph

Installing BioGraph

This procedure works on most Internet-connected Linux systems. While BioGraph itself does not require Internet access, dependencies are downloaded automatically at install time. If you are installing on a cluster without direct Internet access, see Installing without Internet access.

BioGraph requires Python 3.6 or later and should be installed in a python virtualenv or venv. You will use this python environment anytime you use biograph commands.

Creating a virtualenv

If the virtualenv command is available on your system, create and activate a new environment:

$ virtualenv --python=python3.6 bg7
$ . bg7/bin/activate
(bg7)$ 

Creating a venv

If the virtualenv command is not available on your system, you can create an environment using the built-in venv python module. It will require updating after creation and activation:

$ python3.6 -mvenv bg7
$ . bg7/bin/activate
(bg7)$ pip install --upgrade pip wheel setuptools
Collecting pip
...
(bg7)$ 

Install BioGraph

Finally, install the BioGraph tarball using pip in the active python environment.

(bg7)$ pip install BioGraph-7.0.0.tgz
Processing BioGraph-7.0.0.tgz
...
(bg7)$

This will install BioGraph and all required python libraries.

On some systems, a few python dependencies may require compilation. If you encounter installation issues, see Compiling Python Dependencies.

The biograph_model-7.0.0.ml file may be kept anywhere convenient. The path to this file will be provided as an option when running BioGraph commands.

Additional software

These additional open source tools are also required to run the full BioGraph pipeline. They may be installed in any directory in your PATH.

To install these packages on Ubuntu 18.04:

$ sudo apt install -y vcftools tabix bcftools

Verify the installation

You should now be able to run the biograph command:

(bg7)$ biograph
usage: biograph [-h]

biograph v7.0.0 - the BioGraph genome processing pipeline

    Pipeline Commands:
        full_pipeline    Run the full BioGraph single-sample pipeline

        reference        Build a BioGraph reference from FASTA
        create           Convert reads to the BioGraph format
        discovery        Discover variants on a BioGraph vs. a reference
        coverage         Calculate coverage for VCF entries
        qual_classifier  Assign quality scores and filter variants
        vdb              Access the variant database (beta)

    Utility Commands:
        license          Check license status
        stats            Get basic QC stats from a BioGraph
        version          Print the BioGraph version and exit
        refhash          Identify the reference in a VCF, FASTA, SAM, or 
                         BioGraph refdir

For help on any command, use the --help option:

    $ biograph full_pipeline --help

For full documentation: https://www.spiralgenetics.com/user-documentation

optional arguments:
  -h, --help  show this help message and exit

Congratulations, you're ready to go.

If you encounter problems, check the test_*/log.txt file and contact Spiral support for assistance.

Additional installation steps

Some systems require additional steps for installation. See the sections below, or contact Spiral support for assistance.

Compiling Python dependencies

Several supporting Python packages will be installed in addition to BioGraph itself. While we take care to avoid the need to compile code on most systems, some dependencies (htslib, pysam, numpy, pandas, scipy) may require compilation.

If you see an error when running the pip install command, be sure that a working compiler is installed. You will also need the development libraries python3-dev, liblzma, libbz2-dev, and zlib. On some systems, liblapack-dev and libblas-dev may also be required. The following command will install these dependencies on Ubuntu systems:

(bg7)$ sudo apt install -y build-essential python3-dev liblzma-dev zlib1g-dev libbz2-dev liblapack-dev libblas-dev

Installing without Internet access

In some cluster computing environments, direct Internet access is restricted or prohibited from worker nodes. In this case, the BioGraph dependencies can be downloaded on a separate system that has Internet access (such as a laptop) and then installed manually.

Note that the download machine and the worker node must be running the same operating system (for example, Ubuntu 18.04) and have the same architecture (eg. x86_64).

To begin, create a python environment as described above on the machine with Internet access. Be sure to use the same version of Python for this virtualenv (for example, Python 3.6) as will be used on the cluster. Then activate the environment and run the following commands:

(bg7)$ mkdir install_me
(bg7)$ cd install_me
(bg7)$ pip download /path/to/BioGraph-7.0.0.tgz

This will download several tarballs to the current directory.

Next, transfer the install_me/ folder to the cluster. Log into the cluster, make a new python environment, and activate it as described above. Finally, cd into the install_me directory and install the packages with the following commands:

(bg7)$ cd install_me
(bg7)$ pip install *

BioGraph and all dependencies are now installed to your python environment.

Ready to go

With a verified working BioGraph installation, you are ready to run your first dataset.

If you have any issues or concerns about your BioGraph installation, don't hesitate to contact Spiral support.


Next: Quick Start