Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Welcome to the homepage for the Large Scale Genomics Work Stream, part of the Global Alliance for Genomics and Health. Led by Oliver Hofmann and Thomas Keane this Work Stream creates standardized methods for accessing large-scale genomic data (reads, variants, and expression data) by file-based, API-based, cloud-based, and distributed access.
To understand the role of Work Streams in GA4GH please visit the https://www.ga4gh.org/howwework.
This Work Stream meets at a high level quarterly, mainly focusing on the reporting on the developments of sub-groups to Driver Projects. The GA4GH strategic roadmap details the planned standards developments of this Work Stream. Minutes from the meetings are available here.
The work of the Large Scale Genomics Work Stream is mainly done in sub-groups that usually meet every four weeks. Each of these teams have leads. All meetings are minuted. Links to these are available for all to view.
Chair: James Bonfield (Sanger), Vice-Chair: Louis Bergelson (Broad)
This team deals with the development and maintenance of standard file formats for the following:
- standard read formats (BAM/CRAM/SAM)
- standard variant file formats (VCF/BCF) GitHub home - Meeting Minutes
Encrypted Container Formats
Chair: Alexander Senf (EMBL-EBI), Vice-Chair: Rob Davies (Sanger)
There is also a team to look at encrypted versions of these formats. Meeting Minutes
Future of VCF
Chair: Yossi Farjoun (Broad), Vice-Chair: Cristina Gonzalez (EBI)
This team looks at the longer term roadmap for variant container formats. Minutes are still part of the normal File Formats meeting. Meeting Minutes
Chair: Mike Lin (DNAnexus), Vice-Chair: Jerome Kelleher (University of Oxford)
A standardised non-file based API for securely streaming the above listed file formats
Chair: Sean Upchurch
Developing scalable ways of storing and transmitting expression information related to RNASeq data
Chair: Andy Yates (EMBL-EBI)
A framework to retrieve ‘reference sequences’ by a unique checksum, allowing users to retrieve such reference sequences without ambiguity from different databases and servers.