Data and information about the Polaris study
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
cohorts/polaris-1
release-data
release-notes
.gitignore
README.md

README.md

Polaris

HiSeq<sup>™</sup> X data

HiSeq<sup>™</sup> X data

Table of Contents

Summary

The Polaris project provides

  • Population sequencing resources on high throughput Illumina sequencing platforms
  • Variant calls from multiple technologies, validated by population genetics and Mendelian methods

Further details of the sequencing resources, input data sources, genotyping methods and validation methods can be found in the project wiki.

Variant calls

Our latest variant call truth set of Structural Variants (SVs) is v2.0. Please check our release-notes/v2.0 for details.

Download the VCF

To download the data, please do:

Genome version hg38

wget https://s3-us-west-1.amazonaws.com/illumina-polaris-v2.0-release-data/all.merge.hg38.vcf.gz
wget https://s3-us-west-1.amazonaws.com/illumina-polaris-v2.0-release-data/all.merge.hg38.vcf.gz.tbi

Genome version hg19

wget https://s3-us-west-1.amazonaws.com/illumina-polaris-v2.0-release-data/all.merge.hg19.vcf.gz
wget https://s3-us-west-1.amazonaws.com/illumina-polaris-v2.0-release-data/all.merge.hg19.vcf.gz

Sequencing resources

Population cohorts with unrestricted access sequenced as part of Polaris are available through BaseSpace, the European Nucleotide Archive (ENA), and the Sequence Read Archive.

Additional cohorts are available through the EGA or dbGaP with restricted access subject to approval through a Data Access Committee. No variant calls are ever reported in Polaris for restricted access cohorts.

Further information the sequencing resources described below can be found in the [project wiki][0.3].

HiSeq X PCR-Free Data (Polaris 1)

All HiSeq X PCR-Free data was generated by Illumina Laboratory Services (ILS) with a target whole genome coverage of 30X.

There are currently four unrestricted access cohorts available in Polaris:

  1. Diversity Cohort (BaseSpace, ENA, SRA) — 150 samples selected to represent a diversity of populations
  2. Kids Cohort (BaseSpace, ENA, SRA) — 50 children whose parents were sequenced as part of the Diversity cohort
  3. PGx Cohort (BaseSpace, ENA, SRA) — 70 samples with orthogonally validated genotypes for 28 genes relevant for PGx4
  4. PGx 10X© Cohort (ENA, SRA) — the same 70 samples from the PGx cohort, prepared with the 10X Genomics Chromium Controller and sequenced on the HiSeq 4000

There is also a restricted access repeat expansion cohort available through EGA.

Associated resources

Platinum Genomes

HiSeq 2000 PCR-free

HiSeq X PCR-Free

  • Parents & grandparents
    • ENA — pending
    • BaseSpace — pending
  • Children
    • dbGaP — pending

Pending cohorts

HiSeq X PCR-Free

  • Platinum Genomes pedigree
  • NIST Ashkenazi Jewish trio

10X&copy Chromium

  • Platinum Genomes Pedigree

NovaSeq 6000 S4 PCR-Free

  • Platinum Genomes pedigree
  • NIST Ashkenazi Jewish trio

Issues

Please open an issue to provide feedback or ask questions.

References

  1. Eberle, et al (2017) A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree. Genome Res. 27:157-164. doi:10.1101/gr.210500.116
  2. English, et al (2015) Assessing structural variation in a personal genome-towards a human reference diploid genome. BMC Genomics. 16:286 doi:10.1186/s12864-015-1479-3
  3. Kehr, et al (2017) Diversity in non-repetitive human sequences not found in the reference genome. Nat Genet. 49(4):588-593. doi: 10.1038/ng.3801
  4. Pratt, et al (2016) Characterization of 137 Genomic DNA Reference Materials for 28 Pharmacogenetic Genes: A GeT-RM Collaborative Project. J Mol Diagn. 18(1):109-23. doi:10.1016/j.jmoldx.2015.08.005
  5. Sedlazeck, et al (2018) Accurate detection of complex structural variations using single-molecule sequencing. Nat Method. 15:461-468.