Skip to content

Installation instructions

Milton Pividori edited this page Dec 6, 2019 · 36 revisions

Table of Contents

ukbREST needs to be installed only in one computer or server. From the the clients connecting to it, you generally do not need to install anything (you just need standard tools like curl).

Since ukbREST works with PostgreSQL, the first step is to get it installed. There are several ways to do this, but the recommended one is to use Docker (explained below).

The second step to get ukbREST up and running is to decide if you want to use the ukbREST Docker image (recommended in most cases) or prefer to use a native Python environment.

The third step is running ukbREST, connecting it to PostgreSQL, and load your UK Biobank (UKB) data into it.

Once you complete all these steps, you and other authorized users (if you enabled it) are ready to start making phenotype and genotype queries.

Computer/server specifications

The computer/server specifications needed to run ukbREST largely depends on your needs/use cases. However, to carry out all pre-processing and setup steps (including the decryption of your UK Biobank application files) it's important to have enough disk space (roughly 10TB if you are planning to serve the genotype and phenotype data, as well as perform all previous pre-processing and setup steps), a decent amount of memory (at least 4GB, 8GB recommended), preferably a multi-core CPU (at least 4 cores) and, more importantly, to correctly configure PostgreSQL (see below). You should not use the default configuration of PostgreSQL.

Regarding the operating systems, we have tested the Docker image of ukbREST in macOS and Linux (Ubuntu). We recommend a Linux-based operating system if you are using Docker. In macOS you need to specify the amount of disk space reserved for Docker clicking on Preferences, going to the Disk tab, and increase the value (this is not necessary on Linux):

Configuration of disk space in Docker for Mac

Docker installation

You need Docker (the Community Edition is enough) both if you are going to use our ukbREST Docker image or use the PostgreSQL Docker image as well. Follow the instruction in the Docker documentation to get it installed in your system. In macOS, after starting Docker, you will see an icon like this:

Configuration of disk space in Docker for Mac

You don't need Docker if you are installing all natively (both ukbREST and PostgreSQL), what, again, is not recommended if you are not an expert.

PostgreSQL database

We have tested ukbREST on PostgreSQL 9.6 and 11 (which is now the recommended version). It should run with newer versions too. The easiest way is to use the PostgreSQL Docker image already available from its developers, so in this case you need Docker installed.

Once Docker is installed and running, you first need to create a network. Open a terminal and run this:

$ docker network create ukb

Then download and run PostgreSQL with this command:

$ docker run -d --name pg --net ukb -p 127.0.0.1:5432:5432 \
  -e POSTGRES_USER=test -e POSTGRES_PASSWORD=test \
  -e POSTGRES_DB=ukb \
  postgres:11

You will see all the output in the terminal. You can stop the server with the usual Ctrl-C. You can run PostgreSQL in the background if you replace parameter --rm with -d. Refer to the Docker documentation for more information on how to manage your containers and images.

IMPORTANT: the above command runs PostgreSQL with default settings, which could make it work really slow. Take a look at the PostgreSQL Docker image page to see how to change the database settings to tune it for your hardware. Remember that if you are using Docker on macOS, the resources available to the PostgreSQL container are not those of your computer but those available through Docker. In the example below, 4 CPUs and 10GB of RAM will be available to Docker containers:

Configuration of disk space in Docker for Mac

As a general guideline, this is the configuration we used with PostgreSQL 11:

# Config generated with https://pgtune.leopard.in.ua/
# DB Version: 11
# OS Type: mac
# DB Type: dw
# Total Memory (RAM): 10 GB
# CPUs num: 4
# Connections num: 100
# Data Storage: ssd

listen_addresses = '*'
max_connections = 100
shared_buffers = 2560MB
effective_cache_size = 7680MB
maintenance_work_mem = 1280MB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 500
random_page_cost = 1.1
work_mem = 6553kB
min_wal_size = 4GB
max_wal_size = 8GB
max_worker_processes = 4
max_parallel_workers_per_gather = 2
max_parallel_workers = 4

ukbREST Installation

Using the ukbREST Docker image

To use our ukbREST Docker image, you need to install Docker first. Once that's done, open a new terminal and download our image:

$ docker pull hakyimlab/ukbrest

Once you got the image, you already have everything within it. Follow the rest of the guide for more instructions. You can test ukbREST using sample data, or follow the tips necessary to load the real UK Biobank data and then try more complex phenotype or genotype queries. You can also check the list of available YAML query files from other users.

Using a native Python environment

For ease of use, it is really recommended to use the ukbREST Docker image instead of a native Python environment. If your use case doesn't allow you to do this, then you will need to use a local Python environment to run ukbREST and meet all external dependencies.

The first step consists in preparing a Python environment. We recommend Miniconda/Anaconda. Once this is installed, you can run these commands to get a new environment called ukbrest with all Python dependencies needed:

$ cd [UKBREST REPO]
$ conda env create -n ukbrest -f environment.yml
Solving environment: done

Downloading and Extracting Packages
pyyaml-3.12          |  161 KB | ######################################### | 100%
nose-1.3.7           |  214 KB | ######################################### | 100%
pandas-0.21.0        | 10.0 MB | ######################################### | 100%
[...]
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate ukbrest
#
# To deactivate an active environment, use
#
#     $ conda deactivate

As you can see from the above messages, before running any ukbREST script, you need to activate your newly created environment with conda activate ukbrest.

You also need to install a PostgreSQL client program (psql), the bgenix indexer for BGEN files and qctool (which will be used if you run the unit tests). For these instructions for a Linux Debian-based system, take a look at the Dockerfile in this repository. Keep in mind that commands psql, bgenix qctool must be visible in your PATH.

You should run the unit tests to make sure everything is working in your system.