A lightweight limited functionality R bgen read/write library
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.



Current Version: 0.0.6 Release date: Dec 20, 2018

A lightweight limited functionality R bgen read/write library

Build Status

This library supports "v1.3" of bgen defined here. It supports reading and writing using 8, 16, 24 or 32 bits per probability, using Layout = 2 and CompressedSNPBlocks = 1, for bi-allelic SNPs with samples of ploidy 2. Any other format specifications may crash unexpectedly without properly defined error.

Note that the rrbgen library was primarily written to support writing bgen output when using STITCH. It has similar functionality to rbgen, but provides an interface that suits how STITCH internally handles information when writing files, and is designed to be much lighter weight (fewer dependencies).

Quick start on Linux and Mac

Conda instructions below. To install from source, go to the releases page, download the latest release, and install

git clone --recursive https://github.com/rwdavies/rrbgen.git
cd rrbgen
cd releases
wget https://github.com/rwdavies/rrbgen/releases/download/0.0.6/rrbgen_0.0.6.tar.gz ## or curl -O
R CMD INSTALL rrbgen_0.0.6.tar.gz

To install the latest development code in the repository, use ./scripts/build-and-install.sh

To install an older release, either download an older release from the Github releases page, or use the older releases included with the repository in the releases folder

If you see errors like "error while loading shared libraries: libmpc.so.2: cannot open shared object file: No such file or directory", then please run the following before the R CMD INSTALL

export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:`pwd`/install/lib/

If you have any other installation problems or suggestions please report them as a github issue.

Install using conda

rrbgen can be installed using conda. Full tutorials can be found elsewhere, but briefly, something like this should work

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
conda install r-rrbgen -c defaults -c bioconda -c conda-forge
source activate
R -e 'library("rrbgen")'

Example commands in R


## compare against ./external/bgen/example.gen
## see also examples in rrbgen/tests/testthat/test-read.R and test-write.R
bgen_file <- "./external/bgen/example.16bits.bgen"

## sample names only
sample_names <- rrbgen_load_samples(bgen_file)

## variant information only
var_info <- rrbgen_load_variant_info(bgen_file)

## load sample names, variant information, and genotype probabilities
out <- rrbgen_load(bgen_file)

## load a subset based on sample names and variant information
## note as implemented this requires a full pass over the data (although not decompression)
out <- rrbgen_load(bgen_file,
    vars_to_get = c("SNPID_2", "SNPID_10", "SNPID_3"),
    samples_to_get = c("sample_100", "sample_010", "sample_035")