Skip to content
This repository has been archived by the owner on Jan 31, 2020. It is now read-only.
Malachi Griffith edited this page Aug 15, 2014 · 42 revisions

The Genome Modeling System

The Genome Institute at Washington University has developed a high-throughput, fault-tolerant analysis information management system called the Genome Modeling System (GMS), capable of executing complex, interdependent, and automated genome analysis pipelines at a massive scale. The GMS framework provides detailed tracking of samples and data coupled with reliable and repeatable analysis pipelines. GMS includes a full system image with software and services, expandable from one workstation to a large compute cluster.

GMS also serves as a platform for bioinformatics development, allowing a large team to collaborate on data analysis, or an individual researcher to leverage the work of others effectively within its federated data management system that also supports external collaboration. Most importantly, rather than separating ad-hoc analysis from rigorous, reproducible pipelines, the GMS provides systematic integration between the two. The GMS thus promotes versioned data tracking of ad hoc analyses while facilitating rapid development of formal pipelines.

As a demonstration of the GMS, we performed an integrated analysis of whole genome, exome and transcriptome sequencing data from a breast cancer cell line (HCC1395) and matched lymphoblastoid line (HCC1395 BL) and produced an integrated analysis of these data. The results are available for users to test the software, complete tutorials and develop novel GMS pipeline configurations.

For a brief demonstration of the GMS please start with the: Quick Tour in a Pre-configured Virtual Machine.

Introduction to this documentation

The GMS is a complicated system with many components. Running GMS pipelines will allow you to perform data QC, alignment of raw data to a reference genome, summarize coverage achieved, somatic variation detection of multiple types (each with multiple callers), perform transcriptome assembly and expression estimation, differential expression, integrated analysis of orthogonal data types, and interpret the clinical relevance of 'omic' events in a patients tumor. Performing these analyses involves installation, automation, and integration of dozens of open source bioinformatics tools. These tools are incredibly heterogenous in their implementation and level of engineering. The GMS is the glue that holds all of these pieces together. This project attempts to make it as easy as possible to install and configure all necessary tools and services. Some basic bioinformatics experience and familiarity with genomics concepts of the part of the user are assumed. However, the GMS wiki attempts to document everything you will need to know to get started.

There are several alternate installation strategies that depend on the computer hardware at your disposal. These are documented in the Installation-Types-Overview

Quick navigation:

Install Docs Tutorials FAQ
Step-by-step instructions for installing the sGMS Technical documentation about the internals of the sGMS Tutorials for running different analyses using the sGMS Frequently asked questions about the sGMS

Acknowledgements:

The development of the Genome Modeling System was funded by an NHGRI Large Scale Sequencing and Analysis Center grant (U54 HG003079) to Richard K Wilson. Additional funding to make this system usable by the community was also provided by NHGRI Genome Sequencing Informatics Tools (GS-IT) Program U01 HG006517 to David J Dooling (year 1) and Li Ding (years 1-4).

Contributions:

The GMS is the result of the efforts of a dedicated group of personnel over a number of years. For a fairly detailed list of contributions please see the [contributions] (https://github.com/genome/gms/wiki/Contributions) page.

Clone this wiki locally