web-based analysis tool for rare disease genomics
Switch branches/tags
Clone or download
hanars Merge pull request #676 from macarthur-lab/dev
fix old gene missense constraint display
Latest commit 0ffbfe6 Dec 10, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
breakpoint_search merged in breakpoint_search Feb 5, 2017
deploy build assets Nov 30, 2018
docs Update custom_ref_pops.md May 25, 2017
hail_elasticsearch_pipelines @ 0c83ec6 update hail_elasticsearch_pipelines Oct 31, 2018
reference_data add reference data to django admin Oct 4, 2018
seqr fix add bams for old models Dec 3, 2018
ui build assets Dec 7, 2018
xbrowse script for mysoeq report Dec 3, 2018
xbrowse_server fix old gene missense constraint display Dec 10, 2018
.gitignore added seqr_settings Oct 17, 2018
.gitmodules attempting to fix travis build error Oct 9, 2018
.travis.yml added hail-elasticsearch-pipelines submodule under hail_elasticsearch… Oct 9, 2018
LICENSE.txt Initial commit - starting new repo in xbrowse organization Jan 7, 2014
README.md Update README.md Oct 31, 2018
add_case_review_indivs_to_existing_project.py updated gnomad ref. population slugs Oct 16, 2017
add_indivs_to_existing_project.py updated gnomad ref. population slugs Oct 16, 2017
add_new_project_by_chromosomes.py updated gnomad ref. population slugs Oct 16, 2017
add_new_project_directory.py add 'topmed' AFs by default Feb 14, 2018
add_new_project_from_ped_file.py add 'topmed' AFs by default Feb 14, 2018
add_new_project_from_vcf_and_ped.py add 'topmed' AFs by default Feb 14, 2018
add_new_project_from_vcf_and_xls.py add 'topmed' AFs by default Feb 14, 2018
add_new_project_from_xls_file.py add 'topmed' AFs by default Feb 14, 2018
collect_static.sh use python2.7 Jul 24, 2016
functional_tests_using_selenium.py renamed unit test files to *_tests.py Mar 12, 2017
install_dependencies.sh added --upgrade flag Aug 10, 2018
manage.py utility scripts for running the several commands neeeded to load a pr… Mar 23, 2015
requirements.txt update requests version Nov 1, 2018
run_postgres_database_backup.py added project Oct 24, 2017
servctl fixed servctl deploy-all for gcloud-prod-es Nov 18, 2018
settings fixed service hosts Sep 4, 2017
settings.py switch kibana default to localhost for local installs Oct 31, 2018
setup.py removing residual xBrowse references Mar 31, 2017
shell.py added shortcut to 'VCFFile' Jul 13, 2018
wsgi.py Using DjangoWhiteNoise to fixed Django static file handling when DEBU… Mar 21, 2016

README.md

seqr

Build Status

seqr is a web-based analysis tool for rare disease genomics.

This repository contains code that underlies the Broad seqr instance and other seqr deployments.

Technical Overview

seqr consists of the following components:

  • seqr - the main client-server application. It consists of javascript + react.js on the client-side, python + django on the server-side.
  • postgres - SQL database used by seqr and phenotips to store project metadata and user-generated content such as variant notes, etc.
  • phenotips - 3rd-party web-based form for entering structured phenotype data.
  • matchbox - a tool for connecting with the Match Maker Exchange.
  • nginx - http server used as the main gateway between seqr and the internet.
  • pipeline-runner - container for running hail pipelines to annotate and load new datasets.
  • redis - in-memory cache used to speed up request handling.
  • elasticsearch - NoSQL database used to store variant callsets.
  • kibana - dashboard and visual interface for elasticsearch.
  • mongo - legacy NoSQL database originally used for variant callsets and still used now to store some reference data and logs.

Install

seqr can be installed on a laptop or on-prem server(s) using installation scripts in the deploy/ directory:

Detailed instructions for local installations

For cloud-based deployments, there are Docker images and Kubernetes configs:

Detailed instructions for Kubernetes deployments

Updating / Migrating an older xBrowse Instance

For notes on how to update an older xbrowse instance, see

Update/Migration Instructions

Data loading pipelines

seqr uses hail-based pipelines to run VEP and add in other reference data before loading them into elasticsearch. These pipelines can be run locally on a single machine or on-prem spark cluster, or on a cloud-based spark cluster like Google Dataproc. We are working on integrating these pipelines so that they are launched and managed by seqr. For now, they must be run manually, as shown in the example below. See hail_elasticsearch_pipelines for additional documentation.

Example with seqr deployed to google cloud GKE, and using Google Dataproc to run the pipeline:

# these commands should be run locally on your laptop
git clone git@github.com:macarthur-lab/hail-elasticsearch-pipelines.git

cd hail-elasticsearch-pipelines
HOST=seqr-vm   # IP address or hostname of elasticsearch instance running on google cloud
SEQR_PROJECT_GUID=R003_seqr_project3  # guid of existing seqr project
SAMPLE_TYPE=WGS   # can be WGS or WES
DATASET_TYPE=VARIANTS   # can be "VARIANTS" if the VCF contains GATK or other small variant calls, or "SV" if it contains Manta CNV calls
INPUT_VCF=gs://seqr-datasets/GRCh38/my-new-dataset.vcf.gz  

# this will create a new dataproc cluster and submit the pipeline to it
./gcloud_dataproc/load_dataset.py --genome-version 38 --host ${HOST} --project-guid ${SEQR_PROJECT_GUID} --sample-type ${SAMPLE_TYPE} --dataset-type ${DATASET_TYPE} --es-block-size 50 ${INPUT_VCF}

# after the pipeline completes successfully, you can link the new elasticsearch index to the seqr project by using the 'Edit Datasets' dialog on the project page.