This repository contains data indexes from NIST's Genome in a Bottle (GIAB) project. The indexes for sequences and alignments are also available under: https://ftp.ncbi.nlm.nih.gov/ReferenceSamples/giab/data_indexes .
AshkenazimTrio
Son:HG002 https://ftp.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/HG002_NA24385_son/
Father:HG003 https://ftp.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/HG003_NA24149_father/
Mother:HG004 https://ftp.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/HG004_NA24143_mother/
| Sequencing Platform | Sequence | Alignment |
|---|---|---|
| Illumina WGS 2x150bp 300X per individual | All HG002 HG003 HG004 | novoalign: All HG002 HG003 HG004 |
| Illumina 6KB Matepair | All HG002 HG003 HG004 | bwamem:hg19 All HG002 HG003 HG004 |
| Illumina WGS 2X250bp | All HG002 HG003 HG004 | isaac:hg19 All HG002 HG003 HG004 novoalign: All HG002 HG003 HG004 |
| Moleculo | All HG002 HG003 HG004 | |
| Illumina Whole Exome | - | bwamem:hg19 All HG002 HG003 HG004 |
| SOLiD 60x for son | All HG002 | LifeScope:hg19 All HG002 |
| CompleteGenomics | - | CGAtools:hg19 All HG002 HG003 HG004 |
| Ion Proton 1000x Exome | - | TMAP:hg19 All HG002 HG003 HG004 |
| 10X Genomics | - | bwamem:hg19 All HG002 HG003 HG004 |
| 10X Genomics ChromiumGenome | All HG002 | LongRanger2.0:hg19 All HG002 HG003 HG004 |
| BioNano | All:bnx HG002:bnx HG003:bnx HG004:bnx | All:cmap HG002 HG003 HG004 |
| PacBio 70x/30x/30x | All HG002 HG003 HG004 All:hdf5 HG002 HG003 HG004 |
NGMLR:hg19 All HG002 HG003 HG004 minimap2: All HG002 HG003 HG004 |
| PacBio CCS 10kb | All HG002 | pbmm2:hg19 All HG002 |
| PacBio CCS 11kb | All HG002 | pbmm2:hg19 All HG002 |
| PacBio CCS 15kb | All HG002 | pbmm2:hg19 All HG002 |
| PacBio CCS 15kb_20kb chemistry2 | All HG002 | pbmm2: All HG002 HG003 HG004 |
| Oxford Nanopore 2D | All HG002 | - |
| Oxford Nanopore ultralong (guppy-V3.2.4_2020-01-22) | All HG002 | minimap2:whatshap:hg19 All HG002 |
| Oxford Nanopore ultralong Promethion | All HG002 HG003 HG004 | - |
| BGI BGISEQ500 | All HG002 | - |
| BGI MGISEQ PCR-free | All HG002 | - |
| BGI stLFR | All HG002 HG003 HG004 | All:bwamem:hg19 HG002 HG003 HG004 |
| Strand-Seq HG002 by BCCRC | All HG002 | - |
* CompleteGenomics LFR raw or alignment data not available, but analysis results available under: https://ftp.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/analysis/CompleteGenomics_newLFR_CGAtools_06122015/
ChineseTrio
Son:HG005 https://ftp.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/ChineseTrio/HG005_NA24631_son/
Father:HG006 https://ftp.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/ChineseTrio/HG006_NA24694-huCA017E_father/
Mother:HG007 https://ftp.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/ChineseTrio/HG007_NA24695-hu38168_mother/
| Sequencing Platform | Sequence | Alignment |
|---|---|---|
| Illumina WGS 2x250bp 300X for son; 2x150bp 100x for parents |
All HG005 HG006 HG007 | novoalign: All:hg19-hg38 HG005:hg19-hg38 HG006:hg19-hg38 HG007:hg19-hg38 |
| Illumina 6KB Matepair | All HG005 HG006 HG007 | |
| Moleculo | All HG005 HG006 HG007 | |
| SOLiD 60x for son | All:xsq HG005:xsq | LifeScope: All:hg19 HG005:hg19 |
| CompleteGenomics | CGAtools: All:hg19 (RMDNA) HG005:hg19 HG006:hg19 HG007:hg19 CGAtools: All:hg19 (cellsDNA) HG005:hg19 |
|
| Illumina Whole Exome | bwamem: All:hg19 HG005:hg19 | |
| Ion Proton 1000x Exome | TMAP: All:hg19 HG005:hg19 | |
| BioNano for son | All:bnx HG005:bnx | All:hg19 (cmap) HG005:hg19 (cmap) |
| PacBio Sequel for the trio | All HG005 HG006 HG007 | |
| PacBio SequelII CCS 11kb | |
|
| BGI BGISEQ500, MGISEQ, stLFR | |
NA12878
NA12878:HG001 https://ftp.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/NA12878/
| Sequencing Platform | Sequence | Alignment |
|---|---|---|
| Illumina WGS 2x150bp 300X | HG001 | bwamem: HG001:hg19 (downsampled30x) novoalign: HG001 |
| Illumina HiSeq Exome | HG001 HG001:trimmed_fastq |
bwamem: HG001:hg19 |
| Illumina TruSeq Exome | bwamem: HG001:hg19 | |
| 10X Genomics | bwamem: HG001:hg19 bwamem: HG001:hg19 (size_selected) |
|
| 10X Genomics ChromiumGenome | LongRanger2.0: HG001:hg19-hg38 LongRanger2.1: HG001:hg19-hg38 |
|
| CompleteGenomics | CGAtools: HG001:hg19 | |
| Ion Proton 1000x Exome | TMAP: HG001:hg19 | |
| NA12878 SOLiD5500W | LifeScope: HG001:hg19 | |
| BGI BGISEQ500, MGISEQ, stLFR | ||
| PacBio 40x | HG001:hdf5 | |
| PacBio SequelII CCS 11kb | ||
| Ultralong_OxfordNanopore | - |
minimap2: HG001 |
- CompleteGenomics LFR raw or alignment data not available, but analysis results available under: https://ftp.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/NA12878/analysis/CompleteGenomics_newLFR_CGAtools_06122015/ .
Please Note:
1. If you want to use raw sequencing data (fastq, fasta, hdf5, xsq, bnx etc) for your analysis, then you can use the sequence.index.* files when you need to download the data.
2. If you want to use aligned data (bam, xmap/cmap etc.) for your analysis, then you can use the alignment.index.* files when you need to download the data.