htsget on GCS
This repository contains an implementation of the htsget protocol that provides access to reads data stored in Google Cloud Storage buckets.
Currently, only BAM is supported and the BAM index file must be colocated with
the BAM file (that is,
sample.bam.bai must be in the same
CRAM support will be added in the very near future.
Note that this is an Early Access Preview release. If you use this software for production workloads, please be careful and help us by reporting problems using the issue tracker.
Quick start using Docker
A docker image containing the compiled
htsget-server binary is available at
If you have docker already installed, you can start the server and make it available to the host by running:
$ docker run -d -P gcr.io/genomics-tools/htsget
To determine the port that has been exposed to the host use the
command. By default, the server can only access public data sources (see below
for more information on secured access).
Quick start using AppEngine
Google AppEngine provides a secure and automatically scalable way to run the
htsget server. To get started, clone the htsget source code and deploy the
application in the
$ export GOPATH=$PWD $ go get github.com/googlegenomics/htsget/appengine $ gcloud app deploy src/github.com/googlegenomics/htsget/appengine
Once the deployment completes, you can make authorized htsget requests to https://your-project-id.appspot.com/.
The appengine application will serve requests to any GCS bucket
by default. This behavior can be modified using the settings found in the
Building the server
In order to build the server, you will need the Go tool chain (at least version 1.8).
Once Go is installed, you can build the server by running:
$ go get github.com/googlegenomics/htsget/htsget-server
This will produce a binary in $GOPATH/bin called htsget-server.
You can use htsget-server in one of two modes:
Insecure mode for public resources. When used in this way, the htsget-server does not use TLS and does not authenticate requests it makes to Google Cloud storage. This is useful if you want to access public data via the htsget protocol on a machine that is already running htslib based tools (like samtools).
Secure mode with authentication. In this mode, the server requires a TLS certificate and key to be passed as command line flags. It will then listen on all interfaces and accept requests secured via TLS. Each request must contain an OAuth2 Bearer Access Token which will be used to fetch data from GCS.
Required file layout
In either mode, read requests identify the bucket and object (file) to read.
As an example,
/reads/testing/123.bam will cause the server to try to access
the GCS bucket 'testing' and read two objects:
The index file MUST be in the same bucket and have the
Running the server
$ bin/htsget-server --port=1234 & $ samtools flagstat http://localhost:1234/reads/public-bucket/test.bam
This will use htsget to retrieve data from 'test.bam' stored in the GCS bucket 'public-bucket'.
$ bin/htsget-server --secure=true --port=443 --https_cert=server.crt --https_key=server.key & $ export CURL_CA_BUNDLE=server.crt $ export HTS_AUTH_LOCATION=/path/to/my-oauth2-token $ samtools flagstat http://localhost:1234/reads/private-bucket/test.bam
server.key can be generated using the
generate_cert tool that
comes with Go, or using
Note that you will require versions of
htslib that support the
environment variables used above (
This support was added in October of 2017.
In both secure and insecure mode the list of buckets from which the server is
allowed to read from can be restricted by passing a comma-separated list of
buckets via the
--buckets flag. If the
--buckets flag is not specified then
there is no restriction on the buckets from which the server can read.
The server isn't very efficient at limiting what reads are returned. This is an area we are actively working to improve (see issue #7).
Filters on fields are ignored. The server does not implement any filtering beyond read range and reference name filters. We do not currently plan to add support for this. If this is important to you, please file an issue and let us know.