An introduction about the Genome in a Bottle Consortium
The Genome in a Bottle Consortium (www.genomeinabottle.org) is a collaboration between NIST, FDA, NCBI, other government agencies, academic sequencing groups, sequencing technology developers, and clinical laboratories. A principal motivation for this consortium is to develop widely accepted reference materials and accompanying performance metrics to provide a strong scientific foundation for the development of regulations and professional standards for clinical sequencing. In addition, these genomes, characterized with many methods, are being used extensively for development and optimization of technologies and bioinformatics.
NIST has developed large batches of human genome DNA from several cell lines for NIST Reference Materials (RMs), which have been characterized by the Consortium for homogeneity, stability, and sequence with as many sequencing technologies and library preparation methods as possible. Information from these datasets is being integrated to form high-confidence genotype calls, which can be used by clinical and research laboratories to understand performance of their sequencing and bioinformatics methods.
NCBI is serving as the DCC and repository for the raw sequencing reads, mapped reads, genotypes, and other details for each sample on a dedicated FTP site ( ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/ ). The pilot sample is NA12878 (HG001), and NIST received over 8,000 aliquots in April 2013, which was initially be distributed to partners in the Consortium to assist in characterization, and became available from NIST as Reference Material 8398 in May 2015. Samples from an Ashkenazim trio (son HG002-NA24385-huAA53E0, father HG003-NA24149-hu6E4515, and mother HG004-NA24143-hu8E87A9), and a Han Chinese trio (son HG005-NA24631-hu91BD69, father NA24694-huCA017E, and mother NA24695-hu38168C) from Personal Genome Project (PGP) are also candidate NIST reference materials and are currently being characterized. In early 2016, NIST plans to make the Ashkenazim trio available both as NIST RMs 8391 (son only) and 8392 (entire trio). Only the son of the Asian trio will be a NIST RM (8393). DNA and cell lines for all samples are also available from Coriell, but the NIST RMs are from a single homogenized batch of DNA, so there may be small differences between the samples at Coriell and the NIST RMs.
Details about the NIST Reference Materials, data, and future plans are at https://github.com/genome-in-a-bottle and https://sites.stanford.edu/abms/content/giab-reference-materials-and-data. When the NIST RMs are available, they can be purchased from NIST at http://www.nist.gov/srm/, where a Report of Investigation describing the DNA will also be available.
Bioproject page: http://www.ncbi.nlm.nih.gov/bioproject/PRJNA200694
SRA Run Selector page: http://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJNA200694
Amazon AWS S3 bucket: s3://giab
GIAB Main ftp site: ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/