This is the author's version of the work. It is posted here by permission of the AAAS for personal use, not for redistribution. The definitive version was published as: Minikel et al. Quantifying prion disease penetrance using large population control cohorts. Sci. Transl. Med. 8, 322ra9 (2016). DOI: 10.1126/scitranslmed.aad5169.
Eric Vallabh Minikel†1,2,3,4, Sonia M. Vallabh1,3,4, Monkol Lek1,2, Karol Estrada1,2, Kaitlin E. Samocha1,2,3, J. Fah Sathirapongsasuti5, Cory Y. McLean5, Joyce Y. Tung5, Linda P.C. Yu5, Pierluigi Gambetti6, Janis Blevins6, Shulin Zhang7, Yvonne Cohen6, Wei Chen6, Masahito Yamada8, Tsuyoshi Hamaguchi8, Nobuo Sanjo9, Hidehiro Mizusawa10, Yosikazu Nakamura11, Tetsuyuki Kitamoto12, Steven J. Collins13, Alison Boyd13, Robert G. Will14, Richard Knight14, Claudia Ponto15, Inga Zerr15, Theo F.J. Kraus16, Sabina Eigenbrod16, Armin Giese16, Miguel Calero17, Jesús de Pedro-Cuesta17, Stéphane Haïk18,19, Jean-Louis Laplanche20, Elodie Bouaziz-Amar20, Jean-Philippe Brandel18,19, Sabina Capellari21,22, Piero Parchi21,22, Anna Poleggi23, Anna Ladogana23, Anne H. O'Donnell-Luria2,1,24, Konrad J. Karczewski2,1, Jamie L. Marshall1,2, Michael Boehnke25, Markku Laakso26, Karen L. Mohlke27, Anna Kähler28, Kimberly Chambert29, Steven McCarroll29, Patrick F. Sullivan27,28, Christina M. Hultman28, Shaun M. Purcell30, Pamela Sklar30, Sven J. van der Lee31, Annemieke Rozemuller32, Casper Jansen32, Albert Hofman31, Robert Kraaij33, Jeroen G.J. van Rooij33, M. Arfan Ikram31, André G. Uitterlinden31,33, Cornelia M. van Duijn31, Exome Aggregation Consortium (ExAC)34, Mark J. Daly2,1, Daniel G. MacArthur†2,1
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, United States
- Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, United States
- Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA 02115, United States
- Prion Alliance, Cambridge, MA 02139, United States
- Research, 23andMe Inc., Mountain View, CA 94041, United States
- National Prion Disease Pathology Surveillance Center, Cleveland, OH 44106, United States
- University Hospitals Case Medical Center, Cleveland, OH 44106, United States
- Department of Neurology and Neurobiology of Aging, Kanazawa University Graduate School of Medical Sciences, Kanazawa, Japan 920-8640
- Department of Neurology and Neurological Science, Graduate School, Tokyo Medical and Dental University, Tokyo, Japan 113-8519
- National Center Hospital, National Center of Neurology and Psychiatry, Tokyo, Japan 187-8551
- Department of Public Health, Jichi Medical University, Shimotsuke, Japan 329-0498
- Department of Neurological Science, Tohoku University Graduate School of Medicine, Sendai, Japan 980-8575
- Australian National Creutzfeldt-Jakob Disease Registry, The University of Melbourne, Parkville, Australia 3010
- National Creutzfeldt-Jakob Disease Research and Surveillance Unit, Western General Hospital, Edinburgh, United Kingdom EH4 2XU
- National Reference Center for TSE, Georg-August University, Goettingen, Germany 37073
- Center for Neuropathology and Prion Research (ZNP) at the Ludwig-Maximilians-University, Munich, Germany 81377
- Instituto de Salud Carlos III and CIBERNED, Madrid, Spain 28031
- Inserm U 1127, CNRS UMR 7225, Sorbonne Universités, UPMC Univ. Paris 06 UMR S 1127, Institut du Cerveau et de la Moelle épinière, ICM, 75013 Paris, France
- Assistance Publique-Hôpitaux de Paris, Cellule Nationale de Référence des Maladies de Creutzfeldt-Jakob, Groupe Hospitalier Pitié-Salpêtrière, F-75013 Paris, France 75010
- Assistance Publique-Hôpitaux de Paris, Service de Biochimie et Biologie moléculaire, Hôpital Lariboisière, Paris, France 75010
- IRCCS Institute of Neurological Sciences, Bologna, Italy 40123
- Department of Biomedical and Neuromotor Sciences, University of Bologna, Italy 40126
- Department of Cell Biology and Neurosciences, Istituto Superiore di Sanità, Rome, Italy 00161
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA 02115, United States
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA
- Department of Medicine, University of Eastern Finland and Kuopio University Hospital, Kuopio, Finland 70210
- Department of Genetics, University of North Carolina School of Medicine, Chapel Hill, NC 27599, United States
- Karolinska Institutet, Stockholm, Sweden SE-171 77
- Stanley Center for Psychiatric Research, Broad Institute, Cambridge, MA 02142, United States
- Icahn School of Medicine at Mount Sinai, New York, NY 10029, United States
- Department of Epidemiology, Erasmus MC, Rotterdam 3000 CA, The Netherlands
- Prion Surveillance Center, Department of Pathology, University Medical Center, Utrecht 3584 CX, Netherlands
- Department of Internal Medicine, Erasmus MC, Rotterdam 3000 CA, The Netherlands
- A list of consortium members may be found at http://exac.broadinstitute.org/about
†Correspondence may be addressed to Eric Vallabh Minikel eminikel@broadinstitute.org @cureffi or Daniel G. MacArthur macarthur@atgu.mgh.harvard.edu @dgmacarthur
More than 100,000 genetic variants are reported to cause Mendelian disease in humans [Stenson 2014, Landrum 2014], but the penetrance - the probability that a carrier of the purported disease-causing genotype will indeed develop the disease - is generally unknown. Here we assess the impact of variants in the prion protein gene (PRNP) on the risk of prion disease by analyzing 16,025 prion disease cases, 60,706 population control exomes, and 531,575 individuals genotyped by 23andMe, Inc. We show that missense variants in PRNP previously reported to be pathogenic are at least 30x more common in the population than expected based on genetic prion disease prevalence. While some of this excess can be attributed to benign variants falsely assigned as pathogenic, other variants have genuine effects on disease susceptibility but confer lifetime risks ranging from <0.1% to ~100%. We also show that truncating variants in PRNP have position-dependent effects, with true loss-of-function alleles found in healthy older individuals, supporting the safety of therapeutic suppression of prion protein expression.
The study of pedigrees with Mendelian disease has been tremendously successful in identifying variants that contribute to severe inherited disorders [Brunham & Hayden 2013, Amberger 2011, Chong 2015]. Causal variant discovery is enabled by selective ascertainment of affected individuals, and especially of multiplex families. Although efficient from a gene discovery perspective, the resulting ascertainment bias confounds efforts to accurately estimate the penetrance of disease-causing variants, with profound implications for genetic counseling [Crow 1999, Cooper 2013, Begg 2002, Goldwurm 2007]. The development of large-scale genotyping and sequencing methods has recently made it tractable to perform unbiased assessments of penetrance in population controls. In several instances, such studies have suggested that previously reported Mendelian variants, as a class, are substantially less penetrant than had been believed [Cooper 2011, Bick 2012, Flannick 2013, Kirov 2014]. To date, however, all of these studies have been limited to relatively prevalent (>0.1%) diseases, and point estimates of the penetrance of individual variants have been limited to large copy number variations [Cooper 2011, Kirov 2014].
Here we demonstrate the use of large-scale population data to infer the penetrance of variants in rare, dominant, monogenic disease, using the example of prion diseases. These invariably fatal neurodegenerative disorders are caused by misfolding of the prion protein (PrP, the product of PRNP) [Prusiner 1998] and have an annual incidence of 1 to 2 cases per 1 million population [Klug 2013]. A small, albeit infamous, minority of cases (<1% in recent years [CJD UK 2015, CJD US 2015]) are acquired through dietary or iatrogenic routes. The majority (~85%) of cases are defined as sporadic, occurring in individuals with two wild-type PRNP alleles and no known environmental exposures. Finally, ~15% of cases occur in individuals with rare, typically heterozygous, coding variants in PRNP, including missense variants, truncating variants, and octapeptide repeat insertions or deletions (Table S1). Centralized ascertainment of cases by national surveillance centers (Materials and Methods) makes prion disease a good test case for using reference datasets to assess the penetrance of these variants.
PRNP was conclusively established as a dominant disease gene due to clear Mendelian segregation of a few variants with disease [Hsiao 1989, Hsiao 1991b, Medori 1992]. Yet ascertainment bias [Minikel 2014], low rates of predictive genetic testing [Owen 2014], and frequent lack of family history [Kovacs 2005, Nozaki 2010] confound attempts [Chapman 1994, Spudich 1995, D'Alessandro 1998, Mitrova & Belay 2002, Minikel 2014] to estimate penetrance by survival analysis. Meanwhile, the existence of non-genetic etiologies leaves doubt as to whether novel variants are causal or coincidental.
A fully penetrant disease genotype should be no more common in the population than the disease that it causes. This observation allows us to leverage two large population control datasets to re-evaluate the penetrance of reported disease variants in PRNP. The recently reported Exome Aggregation Consortium (ExAC) dataset [Lek 2015] contains variant calls on 60,706 people ascertained for various common diseases, without any ascertainment on neurodegenerative disease. 23andMe’s database contains genotypes on 531,575 customers of its direct-to-consumer genotyping service who have opted in to participate in research, pruned to remove related individuals (first cousins or closer; Materials and Methods), preventing enrichment due to large families with prion disease.
We began by asking whether reportedly pathogenic variants are as rare as expected in these population control datasets. The proportion of people alive in the population today who harbor completely penetrant variants causal for prion disease can be approximated by the product of three numbers: the annual incidence of prion disease, the proportion of cases with such a genetic variant, and the life expectancy of individuals harboring these variants. Based on upper bounds of these numbers (Figure 1A), and assuming ascertainment is neutral with respect to neurodegenerative disease, we would no more than 1.7 such individuals in the 60,706 exomes in the ExAC dataset [Lek 2015], and ~15 such individuals among the ~530,000 genotyped 23andMe customers who opted to participate in research.
Through reviews [Kong 2004, Beck 2010, Mastrianni 2010] and PubMed searches, we identified 63 rare genetic variants reported to cause prion disease (Table S2). We reviewed ExAC read-level evidence for every rare (<0.1% allele frequency) variant call in PRNP (Materials and Methods; Table S3 - S4) and found that 52 individuals in ExAC harbor reportedly pathogenic missense variants (Figure 1B), at least a 30-fold excess over expectation if all such variants were fully penetrant. Similarly, in the 23andMe database we observed a total of 141 alleles of 16 reportedly pathogenic variants genotyped on their platform (Table S5).
Figure 1. Reportedly pathogenic PRNP variants are >30 times more common in controls than expected based on disease incidence. Reported prion disease incidence varies with the intensity of surveillance efforts [Klug 2013], with an apparent upper bound of ~2 cases per million population per year (Materials and Methods). In our surveillance cohorts, 65% of cases underwent PRNP open reading frame sequencing, with 12% of all cases, or 18% of sequenced cases, possessing a rare variant (Table S1), consistent with an oft-cited estimate that 15% of cases of Creutzfeldt-Jakob disease are familial [Masters 1979]. Genetic prion diseases typically strike in midlife, with mean age of onset for different variants ranging from 28 to 77 [Laplanche 1999, Nozaki 2010, Table S10]; we accepted 80, a typical human life expectancy, as an upper bound for mean age of onset, and to be additionally conservative, we assumed that all individuals in ExAC and 23andMe were below any age of onset, even though both contain elderly individuals [Servick 2015, Figure S1]. Thus, no more than ~29 people per million in the general population should harbor high-penetrance prion disease-causing variants. Therefore at most ~1.7 people in ExAC (A) and ~15 people in 23andMe would be expected to harbor such variants. In fact, reportedly pathogenic variants are seen in 52 ExAC individuals (B) and on 141 alleles in the 23andMe database.
Individuals with reportedly pathogenic PRNP variants did not cluster within any one cohort within ExAC (Table S6), arguing against enrichment due to comorbidity with a common disease ascertained for exome sequencing. ExAC does include populations, such as South Asians, in which prion disease is not closely surveilled and we cannot rule out a higher incidence than that reported in developed countries, yet the individuals with reportedly pathogenic variants in either ExAC or 23andMe were of diverse inferred ancestry (Table S7, S8, S9). These individuals’ ages were consistent with the overall ExAC age distribution (Figure S1), rather than being enriched below some age of disease onset. ExAC genotypes at the prion disease modifier polymorphism M129V [Capellari 2011] were consistent with population allele frequencies (Table S7), rather than enriched for the lower-risk heterozygous genotype. Certain PRNP variants are associated with highly atypical phenotypes [Moore 2001, Mead & Reilly 2015], which are mistakable for other dementias and may not be well ascertained by current surveillance efforts. Most of the variants found in our population control cohorts, however, have been reported in individuals with a classic, sporadic Creutzfeldt-Jakob disease phenotype [Nozaki 2010, Kong 2004, Mastrianni 2010, Zhang 2014, Tartaglia 2010, Peoc'h 2000], arguing that the discrepancy between observed and expected allele counts does not result primarily from an underappreciated prevalence of atypical prion disease.
Having observed a large excess of reportedly pathogenic variants over expectation in two datasets, and having excluded the most obvious confounders, we hypothesized that the unexpectedly high frequency of these variants in controls might arise from benign and/or low-risk variants.
We investigated which variants were responsible for the observed excess (Figure 2). Variants with the strongest prior evidence of pathogenicity are absent from ExAC and cumulatively account for ≤5 alleles in 23andMe, consistent with the known rarity of genetic prion disease. Much of the excess allele frequency in population controls is due, instead, to variants with very weak prior evidence of pathogenicity (Figure 2 and Supplementary Discussion). For four variants observed in controls (V180I, R208H, V210I, and M232R), pathogenicity is controversial [Beck 2012, Nozaki 2012] or reduced penetrance has been suggested [Capellari 2005, Ripoll 1993], but quantitative estimates of penetrance have never been produced, and the variants remain categorized as causes of genetic Creutzfeldt-Jakob disease [Kovacs 2005, Nozaki 2010]. Although we cannot prove that any one of the variants we observe in population controls is completely neutral, the list of reported pathogenic variants likely includes false positives. Indeed, the observation that 0.4% (236 / 60,706) of ExAC individuals harbor a rare (<0.1%) missense variant (Table S4) suggests that ~4 of every 1000 sporadic prion disease cases will, by chance, harbor such a variant, which in many cases will be interpreted and reported as causal given the long-standing classification of PRNP as a Mendelian disease gene.
Figure 2. Reportedly pathogenic PRNP variants include Mendelian, benign, and intermediate variants. Prior evidence of pathogenicity is extremely strong for four missense variants — P102L, A117V, D178N and E200K — each of which has been observed to segregate with disease in multiple multigenerational families [Hsiao 1989, Goldfarb 1990, Hsiao 1991a, Hsiao 1991b, Medori 1992, Medori 1992, Mastrianni 1995, Webb 2008] and to cause spontaneous disease in mouse models [Hsiao 1990, Dossena 2008, Jackson 2009, Yang 2009, Jackson 2013, Bouybayoune 2015]. These account for >50% of genetic prion disease cases (Table S1), yet are absent from ExAC (Table S3), and collectively appear on ≤5 alleles in 23andMe’s cohort (Table S5), indicating allele frequencies sufficiently low to be consistent with the prevalence of genetic prion disease (Figure 1). Conversely, the variants most common in controls and rare in cases had categorically weak prior evidence for pathogenicity. R208C (8 alleles in 23andMe) and P39L were observed in patients presenting clinically with other dementias, with prion disease suggested as an alternative diagnosis solely on the basis of finding a novel PRNP variant [Bernardi 2014, Zheng 2008]. E196A was originally reported in the literature in a single patient, with a sporadic Creutzfeldt-Jakob disease phenotype and no family history [Zhang 2014], and appeared in only 2 of 790 Chinese prion disease patients in a recent case series [Shi 2015], consistent with the ~0.1% allele frequency among Chinese individuals in ExAC (Tables S5 and S8). At least three variants (M232R, V180I, and V210I) occupy a space inconsistent with either neutrality or with complete penetrance (see main text and Figure 3). R148H, T188R, V203I, R208H and additional variants are discussed in Supplementary Discussion.
At least three variants, however (V180I, V210I, and M232R) fail to cluster with either the likely benign or likely Mendelian variants (Figure 2). Because each of these three appears primarily in one population in both cases and controls (Tables S1, S5, S7), we compared allele frequencies in matched population groups. Each has an allele frequency in controls that is too high for a fully penetrant, dominant prion disease-causing variant, and yet far lower than the corresponding allele frequency in cases (Figure 3).
Because we lack genome-wide SNP data on cases we are unable to directly correct for population stratification, which thus may contribute to the observed differences in allele frequencies. Geographic clusters of genetic prion disease have been recognized for decades [Masters 1979, Lee 1999, Mitrova & Belay 2002]. For example, nearly half of Italian prion disease cases with the V210I variant are concentrated within two regions of Italy [Ladogana 2005], so any non-uniform geographic sampling in cases versus controls would add some uncertainty to our penetrance estimates.
Nonetheless, the magnitude of the enrichment of certain variants in cases over controls in our datasets makes substructure an implausible explanation for the entire difference. In order for V210I to be neutral and yet appear with an allele frequency of 8.1% in Italian cases despite an apparent allele frequency of 0.02% in Italian controls, it would need to be fixed in a subpopulation comprising 8% of Italy’s populace. Under this scenario, this subpopulation would need to be virtually unsampled in any of our control cohorts, and V210I cases would contain many homozygotes. In reality, no cases have been reported homozygous for this variant. Conversely, if V210I were fully penetrant, family history would be positive in most cases, and the variant’s appearance on 13 alleles in 23andMe (Table S5) would indicate that this variant alone accounts for three times the known prevalence of genetic prion disease (Figure 1A). Finally, if the low family history rate were due to many de novo mutations, then V210I cases would be more uniformly distributed across populations (Table S1). Similar arguments rule out V180I being either benign or Mendelian. M232R, though clearly not Mendelian, could still be benign as it exhibits only 4- to 6-fold enrichment in cases, an amount which might conceivably be explained by Japanese population substructure alone. However, because even common variants in PRNP affect prion disease risk with odds ratios of 3 or greater [Shibuya 1998, Bishop 2009, Mead 2012], it is not implausible that M232R has a similar effect size, and our data suggest this a more likely scenario than it being neutral.
Satisfied that these three variants are likely neither benign nor Mendelian, we estimated lifetime risk in heterozygotes (Materials and Methods). The 1 in 1 million annual incidence of prion disease translates into a baseline lifetime risk of ~1 in 10,000 in the general population (Materials and Methods). Because prion diseases are so rare, even the massive enrichment of heterozygotes in cases (Figure 3), implying odds ratios on the order of 10 to 1,000, corresponds to only low penetrance, with lifetime risk for M232R, V180I and V210I estimated near 0.1%, 1%, and 10%, respectively. Although our estimates are imperfect due to population stratification, they accord well with family history rates (Figure 3) and explain the unique space that these variants occupy in the plot of case versus control allele count (Figure 2). These data indicate that PRNP missense variants occupy a risk continuum rather than a dichotomy of causal versus benign.
Figure 3. Certain variants confer intermediate amounts of lifetime risk. M232R, V180I, and V210I show varying degrees of enrichment in cases over controls, indicating a weak to moderate increase in risk. Best estimates of lifetime risk in heterozygotes (Materials and Methods) range from ~0.08% for M232R to ~7.8% for V210I, and correlate with the likelihood of family history. Allele frequencies for P102L, A117V, D178N and E200K are consistent with up to 100% penetrance, with confidence intervals including all reported estimates of E200K penetrance based on survival analysis, which range from ~60% to ~90% [Chapman 1994, Spudich 1995, D'Alessandro 1998, Mitrova & Belay 2002, Minikel 2014]. Rates of family history of neurodegenerative disease in Japanese cases are from (Table S10) and in European populations are from [Kovacs 2005], with Wilson binomial confidence intervals shown. *Based on allele counts rounded for privacy (Materials and Methods). †GSS, Gerstmann Straussler Scheinker disease associated with variants P102L, A117V and G131V. ‡FFI: fatal familial insomnia associated with a D178N cis 129M haplotype.
We asked whether the same was true of protein-truncating variants. PRNP possesses only one protein-coding exon, so premature stop codons are expected to result in truncated polypeptides rather than in nonsense-mediated decay. Prion diseases are known to arise from a gain of function, as neurodegeneration is not seen in mice, cows, or goats lacking PrP [Bueler 1992, Richt 2007, Yu 2009, Benestad 2012], and the rate of prion disease progression is tightly correlated with PrP expression level [Fischer 1996]. Yet heterozygous C-terminal (residue ≥145) truncating variants are known to cause prion disease, sometimes with peripheral amyloidosis [Mead & Reilly 2015]. These patients also experience sensorimotor neuropathy phenotypically similar to that present in homozygous, but not heterozygous, PrP knockout mice [Bremer 2010], but attributed to amyloid infiltration of peripheral nerves, rather than loss of PrP function [Mead & Reilly 2015].
We identified, for the first time, heterozygous N-terminal (residue ≤131) truncating variants in four ExAC individuals and were able to obtain Sanger validation (Figure S2) and limited phenotype data (Table S11) for three. These individuals are free of overt neurological disease at ages 79, 73, and 52, and report no personal or family history of neurodegeneration nor of peripheral neuropathy. Therefore, the pathogenicity of protein-truncating variants appears to be dictated by position within PrP’s amino acid sequence (Figure 4). Observing three PRNP nonsense variants in ExAC is consistent with the expected number (~3.9) once we adjust our model [Samocha 2014] to exclude codons ≥145, where truncations cause a dominant gain-of-function disease. Thus, we see no evidence that PRNP is constrained against truncation in its N terminus. This, combined with the lack of any obvious phenotype in individuals with N-terminal truncating variants, suggests that heterozygous loss of PrP function is tolerated.
Figure 4. Effects of truncating variants in the human prion protein are position-dependent. Truncating variants reported in prion disease cases in the literature (Table S2) and in our cohorts (Table S1) cluster exclusively in the C-terminal region (residue ≥145), while truncating variants in ExAC are more N-terminal (residue ≤131). The ortholog of each residue from 23-94 is deleted in at least one prion-susceptible transgenic mouse line [Aguzzi 2008]. C-terminal truncations abolish PrP’s glycosylphosphatidylinositol anchor but leave most of the protein intact, a combination that mediates gain of function through mislocalization, causing this normally cell-surface-anchored protein to be secreted. Consistent with this model of pathogenicity, mice expressing full-length secreted PrP develop fatal and transmissible prion disease [Chesebro 2010, Stohr 2011]. By contrast, the N-terminal truncating variants that we observe retain only residues dispensable for prion propagation, and are likely to cause a total loss of protein function.
Over 100,000 genetic variants have been reported to cause Mendelian disease in humans [Stenson 2014, Landrum 2014]. Many such reports do not meet current standards for assertions of pathogenicity [MacArthur 2014, Richards 2015], and if all such reports were believed, the cumulative frequency of these variants in the population would imply that most people have a genetic disease [Lek 2015]. It is generally unclear how much of the excess burden of purported disease variants in the population is due to benign variants falsely associated, and how much is due to variants with genuine association but incomplete penetrance.
Here we leverage newly available large genomic reference datasets to re-evaluate reported disease associations in a dominant disease gene, PRNP. We identify some missense variants as likely benign while showing that others span a spectrum from <0.1% to ~100% penetrance. Our analyses provide quantitative estimates of lifetime risk for hundreds of asymptomatic individuals who have inherited incompletely penetrant PRNP variants.
Available datasets are only now approaching the size and quality required for such analyses, resulting in limitations for our study. The confidence intervals on our lifetime risk estimates span more than an order of magnitude, and our inability to perfectly control for population stratification injects additional uncertainty. We have been unable to reclassify those PRNP variants that are very rare both in cases and in controls (Supplementary Discussion). We have avoided analysis of large insertions that are poorly called with short sequencing reads, though we note that existing literature on these insertions is consistent with a spectrum of penetrance similar to that which we observe for missense variants [Kong 2004, Mead 2006b]. Penetrance estimation in Mendelian disease will be improved by the collection of larger case series, particularly with genome-wide SNP data to allow more accurate population matching. This, coupled with continued large-scale population control sequencing and genotyping efforts, should reveal whether the dramatic variation in penetrance that we observe here is a more general feature of dominant disease genes.
Because PrP is required for prion pathogenesis and reduction in gene dosage slows disease progression [Bueler 1993, Fischer 1996, Mallucci 2003, Safar 2005], several groups have sought to therapeutically reduce PrP expression using RNAi [White 2008, Pulford 2010, Ahn 2014], antisense oligonucleotides [Nazor Friberg 2012], or small molecules [Karapetyan & Sferrazza 2013, Silber 2014]. Our discovery of heterozygous loss-of-function variants in three healthy older humans provides the first human genetic data regarding the effects of a 50% reduction in gene dosage for PRNP. Both the number of individuals and the depth of available phenotype data are limited, and lifelong heterozygous inactivation of a gene is an imperfect model of the effects of pharmacological depletion of the gene product. With those limitations, our data provide preliminary evidence that a reduction in PRNP dosage, if achievable in patients, is likely to be tolerated. Increasingly large control sequencing datasets will soon enable testing whether the same is true of other genes currently being targeted in substrate reduction therapeutic approaches for other protein-folding disorders.
Together, our findings highlight the value of large reference datasets of human genetic variation for informing both genetic counseling and therapeutic strategy.
Prion disease is considered a notifiable diagnosis in most developed countries, with mandatory reporting of all suspect cases to a centralized surveillance center. Surveillance was carried out broadly according to established guidelines [WHO 1998, WHO 2003], with specifics as described previously for Australia [Collins 2002], France [Brandel 2011], Germany [Windl 1999, Grasbon-Frodl 2004, Zerr 2009], Italy [Puopolo 2003], Japan [Nozaki 2010], and the Netherlands [Jansen 2012]. Sanger sequencing of the PRNP open reading frame was performed as described [Parchi 1999]. We included only prion disease cases classified as definite (autopsy-confirmed) or probable according to published guidelines [WHO 2003]. Criteria for genetic testing vary between countries and over the years of data collection, with testing offered only on indication of family history in some times and places, and testing of all suspect cases with tissue available in other instances. Summary statistics on the total number and proportion of cases sequenced are presented in Table S1.
The ascertainment, sequencing, and joint calling of the ExAC dataset have been described previously [Lek 2015]. We extracted all rare (<0.1%) coding variant calls in PRNP with genotype quality (GQ) ≥10, alternate allele depth (AD) ≥3 and alternate allele balance (AB) ≥20%. Read-level evidence was visualized using Integrative Genomics Viewer (IGV) [Robinson 2011] for manual review. Because most ExAC exomes were sequenced with 76bp reads and the PRNP octapeptide repeat region (codons 50-90 inclusive) is 123bp long, it was impossible to determine whether genotype calls in this region were correct, and they were not considered further. After review of IGV screenshots, 87% of genotype calls were judged to be correct and were included in Table S3. Of the genotype calls judged to be correct, 99% had genotype quality (GQ) ≥95, 99% had allelic balance (AB) between 30% and 70%, and 97% had ≥10 reads supporting the alternate allele.
All participants provided informed consent for exome sequencing and analysis. The Exome Aggregation Consortium’s aggregation and release of exome data have been approved by the Partners Healthcare Institutional Research Board (2013P001339). ExAC data have been publicly released at http://exac.broadinstitute.org/ and IGV screenshots of the rare PRNP variants deemed to be genuine and included in this study are available at https://github.com/ericminikel/prnp_penetrance/tree/master/supplement/igv
Participants were drawn from the customer base of 23andMe, Inc., a personal genetics company (accessed February 6, 2015). All participants provided informed consent under a protocol approved by an external AAHRPP-accredited IRB, Ethical & Independent Review Services (E&I Review). DNA extraction and genotyping were performed on saliva samples by National Genetics Institute (NGI), a CLIA-licensed clinical laboratory and a subsidiary of Laboratory Corporation of America. Samples were genotyped on one of four Illumina platforms (V1-V4) as described previously [Bryc 2015]. Of the PRNP SNPs considered, two (P105L and E200K) were genotyped on all four platforms while the other 14 were genotyped only on V3 and V4, resulting in differing numbers of total samples genotyped (Table S5). Genotypes were called with Illumina GenomeStudio. A 98.5% call rate were required for all samples. As with all 23andMe research participants, individuals whose genotyping analyses failed to reach the desired call rate repeatedly were recontacted to provide additional samples. A maximal set of unrelated individuals was chosen based on segmental identity-by-descent (IBD) estimation [Durand 2014a]. Individuals were defined as related if they shared more than 700 cM IBD (approximately the minimal expected sharing between first cousins). Allele counts between 1 and 5 were rounded up to 5 to protect individual privacy (Table S5). Rounding down to 1 instead would raise our estimates of penetrance for V180I to 7.7% (95%CI, 1.2% - 50%) and for P102L, A117V, D178N and E200K collectively to 100% (95%CI, 100% - 100%), but the confidence intervals would still overlap those based on ExAC allele frequencies, and the overall conclusions of our study would remain unchanged.
Ancestral origins of chromosomal segments were assigned on a continental level (European, Latino, African, and East Asian) and a country level (Japanese) as described by Durand et al [Durand 2014b]. Briefly, after phasing genotypes using an out-of-sample implementation of the Beagle algorithm [Browning & Browning 2007], a string kernel support vector machine classifier assigns tentative ancestry labels to local genomic regions. Then an autoregressive pair hidden Markov model was used to simultaneously correct phasing errors and produce reconciled local ancestry estimates and confidence scores based on the initial assignment. Finally, isotonic regression models were used to recalibrate the confidence estimates.
Europeans and East Asians were defined as individuals with more than 97% of chromosomal segments predicted as being from the respective ancestries. Because African Americans and Latinos are highly admixed, no single threshold of genome-wide ancestry is sufficient to distinguish them. However, segment length distributions of European, African, and Native American ancestries are different between African Americans and Latinos, due to distinct admixture timing in the two ethnic groups. Thus, a logistic classifier based on segment length of European, African, and Native American ancestries was used to distinguish between African Americans and Latinos.
At the country level, individuals were classified as Japanese based on the fraction of the respective local ancestry using a threshold of 90% for classifying Japanese ancestry. This threshold is based on the average fraction of local ancestry in the reference population (23andMe research participants with all four grandparents from the reference country): 94% (5% SD, N=533) for Japanese. Using the same approach, we were unable to obtain a confident set of Italian individuals for analysis of V210I due to extensive admixture. 23andMe research participants with all four grandparents from Italy only have 66% (18% SD, N=2090) Italian ancestry, and only ~60 participants have >90% Italian ancestry.
We computed ten principal components based on ~5,800 common SNPs as described [Purcell 2014, Lek 2015]. A centroid in eigenvalue-weighted principal component space was generated for each HapMap population based on 1000 Genomes individuals in ExAC. The remaining individuals in ExAC were assigned to the HapMap population with the nearest centroid according to eigenvalue-weighted Euclidean distance. Ancestries of all individuals, including those with reportedly pathogenic variants, are summarized in (Tables S7, S8).
The reported incidence of prion disease varies between countries and between years, with much of the variability explained by the intensity of surveillance, as measured by the number of cases referred to national surveillance centers [Klug 2013]. Rates of ~1 case per million population per year have been reported, for instance in the U.S. [Holman 2010] and in Japan [Nozaki 2010], however, the countries with the most intense surveillance (greatest number of referrals per capita), such as France and Austria, observe incidence figures as high as 2 cases per million population per year [Klug 2013]. Only in small countries where the statistics are dominated by a particular genetic prion disease founder mutation, such as Israel and Slovakia [Chapman 1994, Mitrova & Belay 2002], has an incidence higher than 2 per million been consistently observed [EUROCJD Surveillance Data]. We therefore accepted 2 cases per million as an upper bound for the true incidence of prion disease. Assuming an all-causes death rate of ~10 per 1,000 annually [UN Population and Vital Statistics Report], this incidence corresponds to prion disease accounting for ~0.02% of all deaths, which we accepted as the baseline disease risk in the general population.
By Bayes' theorem, the probability of disease given a genotype (penetrance or lifetime risk, P(D|G)) is equal to the proportion of individuals with the disease who have the genotype (genotype frequency in cases, P(G|D)) times the prevalence of the disease (baseline lifetime risk in the general population, P(D)), divided by the frequency of the genotype in the general population (here, population control allele frequency, P(G)). The use of this formula to estimate disease risk dates back at least to Cornfield's estimation of the probability of lung cancer in smokers [Cornfield 1951], with later contributions by Woolf [Woolf 1955] and a synthesis by C.C. Li with application to genetics [Li 1961].
We used an allelic rather than genotypic model, such that lifetime risk in an individual with one allele is equal to case allele frequency (based on the number of prion disease cases that underwent PRNP sequencing) times baseline risk divided by population control allele frequency, P(D|A) = P(A|D)×P(D)/P(A). Note that we assume that our population control datasets include individuals who will later die of prion disease, thus enabling direct use of the ExAC and 23andMe allele frequencies as the denominator P(A). Following Kirov [Kirov 2014], we compute Wilson 95% confidence intervals on the binomial proportions P(A|D) and P(A), and calculate the upper bound of the 95% confidence interval for penetrance using the upper bound on case allele frequency and the lower bound on population control allele frequency, and vice versa for the lower bound on penetrance.
Data processing, analysis, and figure generation utilized custom scripts written in Python 2.7.6 and R 3.1.2. These scripts, along with vector graphics of all figures and tab-delimited text versions of all supplementary tables, are available online at https://github.com/ericminikel/prnp_penetrance.
Research reported in this publication was partially supported by the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) and the National Institute of General Medical Sciences (NIGMS) of the National Institutes of Health, under awards U54DK105566 and R01GM104371, by Broad Institute NextGen funds, and by Prion Alliance sundry funds. Sonia Vallabh is supported by the National Science Foundation (NSF) Graduate Research Fellowship Program (GRFP) grant number 2015214731. U.S. prion surveillance work was conducted under Centers for Disease Control and Prevention (CDC) contract UR8/CCU515004. Japanese prion surveillance work was supported by a grant-in-aid from the Research Committee of Prion Disease and Slow Virus Infection, the Ministry of Health, Labour and Welfare of Japan, and from the Research Committee of Surveillance and Infection Control of Prion Disease, the Ministry of Health, Labour and Welfare of Japan. The French surveillance network is supported by the Institut National de veille Sanitaire. German prion surveillance work was supported by Robert Koch-Institute / Federal Ministry of Health grant 1369-341. The UK National CJD Research and Surveillance Unit is supported by the Department of Health and the Scottish Executive. The Australian National Creutzfeldt-Jakob Disease Registry is funded by the Commonwealth Department of Health. SJC is supported by a NHMRC Practitioner Fellowship: identification #APP1005816. Contributions at Erasmus MC were supported by Netherlands Genomics Initiative (NGI)/Netherlands Organisation for Scientific Research (NWO) sponsored Netherlands Consortium for Healthy Aging (NCHA; project 050-060-810), by the Genetic Laboratory of the Department of Internal Medicine, Erasmus MC, by a Complementation Project of the Biobanking and Biomolecular Research Infrastructure Netherlands (BBMRI-NL; www.bbmri.nl ; project number CP2010-41), by Erasmus Medical Center and Erasmus University, Rotterdam, Netherlands Organization for the Health Research and Development (ZonMw), the Research Institute for Diseases in the Elderly (RIDE), the Ministry of Education, Culture and Science, the Ministry for Health, Welfare and Sports, the European Commission (DG XII), and the Municipality of Rotterdam. We thank the customers of 23andMe, ExAC research participants, and prion disease patients and families who participated in this research.
E.V.M., S.M.V. and D.G.M. conceived and designed the study. E.V.M analyzed data, generated figures, and wrote the manuscript. S.M.V. and E.V.M. reviewed literature and IGV screenshots. K.E.S. performed constraint analyses. M.Lek, K.E., K.E.S., K.J.K., A.H.O.-L., M.J.D., and D.G.M. consulted on data analysis and interpretation. J.F.S., C.Y.M., J.Y.C., and L.P.C.Y. prepared and consulted on analysis of 23andMe data. P.G., J.B., S.Z., Y.C., W.C., M.Y., T.H., N.S., H.M., Y.N., T.K., S.J.C., A.B., R.G.W., R.Knight, C.P., I.Z., T.F.J.K., S.E., A.G., M.C., J.d.P.C., S.H., J.-L.L., E.B.-A., J.-P.B., S.C, P.P., A.L., A.P., R.Kraaij., J.G.J.v.R., S.J.v.d.L., R.M., and C.v.D. prepared and consulted on analysis of prion surveillance data. E.V.M., J.L.M., M.B., M.Laakso, K.M., A.K., K.C., S.A.M., P.S., C.M.H., S.M.P., P.S., C.v.D., F.R.R., A.H., A.I., S.J.v.d.L., J.M.V.-D., and A.G.U. prepared and consulted on analysis of data regarding protein-truncating variants. ExAC provided exome sequence data.
Table of contents:
- Supplementary Discussion
- Additional variants
- Dominant versus allelic models
- Table S1. Allele counts of rare PRNP variants in 16,025 definite and probable prion disease cases in 9 countries
- Table S2. Rare PRNP variants reported in peer-reviewed literature to cause prion disease
- Table S3. Allele counts of rare PRNP variants in 60,706 individuals in ExAC
- Table S4. Summary of rare PRNP variants by functional class in ExAC
- Table S5. Allele counts of 16 reportedly pathogenic PRNP variants in >500,000 23andMe research participants
- Table S6. Phenotypes investigated in studies in which ExAC individuals with reportedly pathogenic PRNP variants were ascertained
- Table S7. Inferred ancestry and codon 129 genotypes of ExAC individuals with reportedly pathogenic variants
- Table S8. Inferred ancestry of all ExAC individuals
- Table S9. Inferred ancestry of 23andMe research participants
- Table S10. Details of Japanese prion disease cases
- Table S11. Phenotypes of individuals with N-terminal PrP truncating variants
- Figure S1. Age of ExAC individuals with reportedly pathogenic PRNP variants versus all individuals in ExAC
- Figure S2. Sanger sequencing results for individuals with N-terminal truncating variants
Additional variants
Of the 63 reportedly pathogenic variants (Table S2), 10 are discussed in the main text. Of those 10, our data and our analysis of the literature indicate high penetrance for 4 (P102L, A117V, D178N, and E200K), intermediate penetrance for 3 (V180I, V210I, and M232R), and suggest that 3 others may be benign (P39L, E196A, and R208C). In this section we discuss four additional variants that we cannot conclusively reclassify but which are unlikely to be highly penetrant, and we also provide a brief discussion of interpretation for remaining variants.
- R148H has been reported in a two isolated patients with a sporadic Creutzfeldt-Jakob disease phenotype and negative family history [Krebs 2005, Pastore 2005] and appears one additional time in our case cohorts (Table S1). Based on its rarity in cases, lack of familial segregation and presence on 3 alleles in ExAC, it is unlikely to be a highly penetrant Mendelian variant. It might be benign or might slightly increase prion disease risk.
- T188R has been reported in two cases in the literature. One German individual presented with a sporadic Creutzfeldt-Jakob disease phenotype but no autopsy was performed; family history was negative [Windl 1999, Roeber 2008]. One Mexican-American individual had autopsy-confirmed prion disease and an ambiguous family history [Tartaglia 2010]. This variant appears 12 times in our case cohort (all in the United States) and 3 times in ExAC (all in Latino populations). Based on its allele frequency in controls, rarity in cases and lack of any clear evidence for segregation in families, T188R is unlikely to be a highly penetrant Mendelian disease variant. It is not clear whether it is benign or increases prion disease risk.
- V203I has been reported in three heterozygous patients - one Italian [Peoc'h 2000], one Korean [Jeong 2010], and one Chinese [Shi 2013], as well as in one Japanese homozygote [Komatsu 2014]. Family history is negative in all of these reported patients as well as in two additional V203I cases in our Japanese case cohort (Table S10). In our cohorts, this variant appears in a total of 16 cases from several countries; in ExAC, it appears in 3 European individuals. Based on its allele frequency in controls, rarity in cases and lack of any clear evidence for segregation in families, V203I is unlikely to be a highly penetrant Mendelian disease variant, and could be benign or could increase prion disease risk. The report of prion disease in a V203I homozygote makes us slightly inclined to favor the interpretation that V203I does increase prion disease risk.
- R208H has been reported in several isolated cases of varied ancestries, all with a negative family history [Mastrianni 1996, Capellari 2005, Roeber 2005, Basset-Leobon 2006, Chen 2011, Matej 2012, Vita 2013]. In our cohorts, it appears in 13 prion disease cases, 9 ExAC individuals and 22 individuals in the 23andMe database. Given its high frequency in controls, this variant may be benign or may slightly increase prion disease risk.
- Other variants. Excluding variants discussed in the main text and above, 0.8% (87 / 10460) of individuals in our case series harbor other rare PRNP missense variants, some of which have been reported as pathogenic (Table S2) and others of which have not. Because most of these variants are very rare both in cases and in population controls, comparisons of case and control allele frequency are not well powered to evaluate the pathogenicity of most individual variants. we are unable to reach any firm conclusions about their pathogenicity. Collectively, our data indicate that this category includes at least some variants that increase prion disease risk, because only 0.3% (187 / 60706) of ExAC individuals harbor a rare missense variant other than those discussed in the main text or above, whereas 0.8% (87 / 10460) of prion disease cases harbor one of these variants, a significant enrichment (p = 1 × 10-12, Fisher's exact test). Indeed, Mendelian segregation has been demonstrated for some of these variants, such as T183A and F198S [Nitrini 1997, Hsiao 1992]. However, the fact that, in the aggregate, we observe only modest (~3-fold) enrichment of such variants in cases versus controls suggests that this category also includes many neutral or very low-risk variants, consistent with our expectation that sporadic prion disease cases should, by chance, harbor some rare variants unassociated with disease. We also cannot exclude the possibility that some specific rare variants, particularly those observed in controls and not in cases, could be protective.
- Future novel missense variants. Additional novel missense variants in PRNP are sure to be observed in prion disease patients in the future. Our findings that some reportedly pathogenic variants are either benign or exhibit low penetrance, together with our observation that ~4 in 1000 controls harbor a rare PRNP missense variant, urge caution in the interpretation of novel variants in prion disease patients. This is consistent with current guidelines [MacArthur 2014, Richards 2015], which indicate that novel protein-altering variants, even in established disease genes, should not be assumed to be causal or highly penetrant until evidence, such as Mendelian segregation, or significant enrichment in cases over controls, can be established.
Dominant versus allelic models
Virtually all patients ever reported with genetic prion disease have been heterozygous for the putative pathogenic variants. Five individuals homozygous for E200K [Simon 2000] were reported to have a younger age of onset than heterozygotes (mean 50 vs. 59 years, p = .03), suggesting some degree of codominance. There have been individual case reports of homozygotes for Q212P [Beck 2010] and V203I [Komatsu 2014], both without a family history among heterozygote relatives, which might suggest that dosage of the mutant allele is important. We are not aware of any other reports of individuals homozygous for potentially pathogenic variants in PRNP. Regardless of whether a dominant or allelic model is assumed, our formula for lifetime risk (Materials and Methods) gives identical point estimates of penetrance and virtually identical 95% confidence intervals.
Table S1. Allele counts of rare PRNP variants in 16,025 definite and probable prion disease cases in 9 countries.
Abbreviations: OPRD, octapeptide repeat deletion; OPRI, octapeptide repeat insertion.
*V203I in Japan: two heterozygotes and one homozygote, four alleles total. All other individuals are heterozygotes.
country | Australia | France | Germany | Italy | Japan | Netherlands | Spain | U.K. | U.S. | TOTAL |
---|---|---|---|---|---|---|---|---|---|---|
Start year | 1993 | 1991 | 1993 | 1993 | 1999 | 1993 | 1993 | 1990 | 2000 | |
End year | 2014 | 2013 | 2015 | 2013 | 2014 | 2013 | 2013 | 2013 | 2014 | |
Definite plus probable cases | 553 | 2383 | 2690 | 1684 | 2144 | 409 | 1280 | 1963 | 2919 | 16025 |
Of which PRNP sequenced | 152 | 1774 | 1307 | 1054 | 1533 | 163 | 749 | 1088 | 2640 | 10460 |
Proportion sequenced | 27% | 74% | 49% | 63% | 72% | 40% | 59% | 55% | 90% | 65% |
Number with rare variants | 31 | 196 | 125 | 396 | 464 | 22 | 127 | 173 | 361 | 1895 |
Proportion with rare variants | 6% | 8% | 5% | 24% | 22% | 5% | 10% | 9% | 12% | 12% |
2-OPRD | 3 | 3 | ||||||||
1-OPRI | 2 | 1 | 4 | 7 | ||||||
2-OPRI | 1 | 5 | 6 | |||||||
3-OPRI | 1 | 1 | 2 | |||||||
4-OPRI | 1 | 3 | 2 | 13 | 4 | 23 | ||||
5-OPRI | 2 | 10 | 1 | 1 | 13 | 12 | 39 | |||
6-OPRI | 2 | 35 | 15 | 52 | ||||||
7-OPRI | 1 | 1 | 1 | 2 | 5 | |||||
8-OPRI | 10 | 10 | ||||||||
9-OPRI | 4 | 4 | ||||||||
10-OPRI | 1 | 1 | ||||||||
OPRI (length unspecified) | 9 | 8 | 17 | |||||||
A2V | 1 | 1 | ||||||||
G54S | 1 | 4 | 5 | |||||||
P84S | 1 | 1 | ||||||||
G88A | 1 | 1 | ||||||||
G94S | 1 | 1 | ||||||||
H96Y | 1 | 1 | ||||||||
P102L | 2 | 10 | 7 | 59 | 83 | 1 | 34 | 25 | 221 | |
P105L | 12 | 1 | 13 | |||||||
P105S | 1 | 1 | ||||||||
P105T | 3 | 2 | 5 | |||||||
G114V | 1 | 1 | ||||||||
A117V | 3 | 8 | 1 | 12 | 9 | 33 | ||||
G131V | 1 | 1 | ||||||||
S132I | 1 | 1 | ||||||||
A133V | 1 | 1 | 2 | |||||||
R148H | 1 | 2 | 3 | |||||||
Q160X | 1 | 1 | ||||||||
Y163X | 2 | 2 | ||||||||
D167G | 1 | 1 | ||||||||
V176G | 1 | 1 | ||||||||
D178N | 3 | 34 | 32 | 18 | 5 | 4 | 65 | 12 | 36 | 209 |
V180I | 1 | 1 | 218 | 5 | 225 | |||||
T183A | 3 | 3 | ||||||||
Q186X | 1 | 1 | ||||||||
H187A | 1 | 1 | ||||||||
H187R | 7 | 7 | ||||||||
T188A | 1 | 1 | ||||||||
T188K | 2 | 1 | 3 | |||||||
T188R | 12 | 12 | ||||||||
E196A | 1 | 1 | ||||||||
E196K | 3 | 8 | 2 | 13 | ||||||
F198S | 5 | 5 | ||||||||
E200G | 1 | 1 | ||||||||
E200K | 11 | 101 | 28 | 123 | 63 | 2 | 52 | 38 | 153 | 571 |
V203I | 5 | 3 | 4 | 5 | 17 | |||||
R208H | 1 | 2 | 7 | 1 | 4 | 15 | ||||
V210I | 4 | 13 | 19 | 171 | 1 | 3 | 36 | 247 | ||
E211Q | 5 | 2 | 3 | 1 | 11 | |||||
E211D | 1 | 1 | ||||||||
Q212P | 2 | 2 | ||||||||
I215V | 1 | 1 | ||||||||
Y218N | 1 | 1 | ||||||||
A224V | 1 | 1 | ||||||||
Y226X | 1 | 1 | ||||||||
Q227X | 1 | 1 | ||||||||
M232R | 63 | 63 | ||||||||
V180I and M232R in trans | 4 | 4 | ||||||||
Variant not specified | 5 | 5 | 2 | 12 |
Table S2. Rare PRNP variants reported in peer-reviewed literature to cause prion disease
Note: an updated version of this table is maintained in this blog post.
Table S3. Allele counts of rare PRNP variants in 60,706 individuals in ExAC.
Chromosomal positions are given in GRCh37 coordinates and HGVS notations are given relative to Ensembl transcript ENST00000379440. Mean read depth across the PRNP coding sequence was 55.21. Call rate is the proportion of ExAC individuals with a genotype call of genotype quality (GQ) ≥20 and a depth (DP) of ≥10 reads.
Chrom | Pos | Ref | Alt | HGVS | Variant | Class | Call rate | AC |
---|---|---|---|---|---|---|---|---|
20 | 4679863 | C | T | c.-4C>T | non-coding | 97% | 1 | |
20 | 4679871 | C | T | c.5C>T | A2V | missense | 97% | 2 |
20 | 4679877 | T | A | c.11T>A | L4H | missense | 98% | 3 |
20 | 4679877 | T | G | c.11T>G | L4R | missense | 98% | 1 |
20 | 4679888 | A | G | c.22A>G | M8V | missense | 98% | 1 |
20 | 4679901 | T | C | c.35T>C | F12S | missense | 98% | 1 |
20 | 4679916 | G | C | c.50G>C | S17T | missense | 98% | 10 |
20 | 4679920 | C | A | c.54C>A | D18E | missense | 98% | 2 |
20 | 4679920 | C | T | c.54C>T | D18D | synonymous | 98% | 18 |
20 | 4679927 | C | A | c.61C>A | L21I | missense | 98% | 1 |
20 | 4679932 | C | T | c.66C>T | C22C | synonymous | 98% | 2 |
20 | 4679935 | G | A | c.69G>A | K23K | synonymous | 98% | 2 |
20 | 4679939 | C | T | c.73C>T | R25C | missense | 98% | 2 |
20 | 4679944 | G | A | c.78G>A | P26P | synonymous | 98% | 6 |
20 | 4679967 | G | T | c.101G>T | G34V | missense | 98% | 1 |
20 | 4679969 | G | A | c.103G>A | G35S | missense | 98% | 1 |
20 | 4679975 | C | T | c.109C>T | R37X | nonsense | 98% | 1 |
20 | 4679982 | C | T | c.116C>T | P39L | missense | 98% | 3 |
20 | 4679983 | G | A | c.117G>A | P39P | synonymous | 98% | 8 |
20 | 4679986 | G | A | c.120G>A | G40G | synonymous | 98% | 12 |
20 | 4680005 | A | G | c.139A>G | N47D | missense | 98% | 1 |
20 | 4680026 | G | A | c.160G>A | G54S | missense | 97% | 78 |
20 | 4680028 | T | C | c.162T>C | G54G | synonymous | 97% | 5 |
20 | 4680038 | G | T | c.172G>T | G58W | missense | 97% | 1 |
20 | 4680045 | C | T | c.179C>T | P60L | missense | 96% | 1 |
20 | 4680055 | T | A | c.189T>A | G63G | synonymous | 96% | 1 |
20 | 4680077 | G | A | c.211G>A | G71S | missense | 96% | 1 |
20 | 4680089 | C | T | c.223C>T | Q75X | nonsense | 96% | 1 |
20 | 4680091 | G | A | c.225G>A | Q75Q | synonymous | 96% | 2 |
20 | 4680093 | C | G | c.227C>G | P76R | missense | 96% | 1 |
20 | 4680129 | G | C | c.263G>C | G88A | missense | 98% | 1 |
20 | 4680134 | G | A | c.268G>A | G90S | missense | 98% | 1 |
20 | 4680145 | T | G | c.279T>G | G93G | synonymous | 99% | 1 |
20 | 4680151 | C | T | c.285C>T | T95T | synonymous | 99% | 1 |
20 | 4680172 | G | A | c.306G>A | P102P | synonymous | 99% | 21 |
20 | 4680185 | A | G | c.319A>G | T107A | missense | 99% | 1 |
20 | 4680199 | C | T | c.333C>T | H111H | synonymous | 99% | 2 |
20 | 4680202 | G | A | c.336G>A | M112I | missense | 99% | 1 |
20 | 4680231 | T | G | c.365T>G | V122G | missense | 99% | 1 |
20 | 4680232 | G | T | c.366G>T | V122V | synonymous | 99% | 3 |
20 | 4680244 | C | A | c.378C>A | G126G | synonymous | 99% | 1 |
20 | 4680244 | C | T | c.378C>T | G126G | synonymous | 99% | 3 |
20 | 4680250 | C | T | c.384C>T | Y128Y | synonymous | 100% | 22 |
20 | 4680252 | T | C | c.386T>C | M129T | missense | 100% | 1 |
20 | 4680257 | G | T | c.391G>T | G131X | nonsense | 100% | 1 |
20 | 4680258 | G | T | c.392G>T | G131V | missense | 100% | 1 |
20 | 4680259 | A | G | c.393A>G | G131G | synonymous | 100% | 3 |
20 | 4680262 | T | C | c.396T>C | S132S | synonymous | 100% | 1 |
20 | 4680274 | G | A | c.408G>A | R136R | synonymous | 100% | 2 |
20 | 4680274 | G | T | c.408G>T | R136S | missense | 100% | 2 |
20 | 4680279 | T | C | c.413T>C | I138T | missense | 100% | 1 |
20 | 4680289 | C | T | c.423C>T | F141F | synonymous | 100% | 2 |
20 | 4680292 | C | T | c.426C>T | G142G | synonymous | 100% | 1 |
20 | 4680299 | T | G | c.433T>G | Y145D | missense | 100% | 1 |
20 | 4680308 | C | T | c.442C>T | R148C | missense | 100% | 1 |
20 | 4680309 | G | A | c.443G>A | R148H | missense | 100% | 3 |
20 | 4680311 | T | C | c.445T>C | Y149H | missense | 100% | 1 |
20 | 4680316 | T | C | c.450T>C | Y150Y | synonymous | 100% | 1 |
20 | 4680317 | C | T | c.451C>T | R151C | missense | 100% | 2 |
20 | 4680318 | G | A | c.452G>A | R151H | missense | 100% | 3 |
20 | 4680324 | A | G | c.458A>G | N153S | missense | 100% | 1 |
20 | 4680328 | G | A | c.462G>A | M154I | missense | 100% | 1 |
20 | 4680342 | A | G | c.476A>G | N159S | missense | 100% | 1 |
20 | 4680349 | G | A | c.483G>A | V161V | synonymous | 100% | 1 |
20 | 4680359 | C | T | c.493C>T | P165S | missense | 100% | 2 |
20 | 4680362 | A | G | c.496A>G | M166V | missense | 100% | 2 |
20 | 4680364 | G | A | c.498G>A | M166I | missense | 100% | 2 |
20 | 4680373 | C | T | c.507C>T | Y169Y | synonymous | 100% | 1 |
20 | 4680382 | G | A | c.516G>A | Q172Q | synonymous | 100% | 1 |
20 | 4680385 | C | T | c.519C>T | N173N | synonymous | 100% | 5 |
20 | 4680394 | G | A | c.528G>A | V176V | synonymous | 100% | 2 |
20 | 4680397 | C | G | c.531C>G | H177Q | missense | 100% | 1 |
20 | 4680397 | C | T | c.531C>T | H177H | synonymous | 100% | 4 |
20 | 4680403 | C | T | c.537C>T | C179C | synonymous | 100% | 1 |
20 | 4680404 | G | A | c.538G>A | V180I | missense | 100% | 6 |
20 | 4680412 | C | G | c.546C>G | I182M | missense | 100% | 2 |
20 | 4680429 | C | G | c.563C>G | T188R | missense | 100% | 3 |
20 | 4680429 | C | T | c.563C>T | T188M | missense | 100% | 4 |
20 | 4680443 | A | G | c.577A>G | T193A | missense | 100% | 2 |
20 | 4680445 | C | A | c.579C>A | T193T | synonymous | 100% | 1 |
20 | 4680449 | G | C | c.583G>C | G195R | missense | 100% | 3 |
20 | 4680451 | G | A | c.585G>A | G195G | synonymous | 100% | 3 |
20 | 4680453 | A | C | c.587A>C | E196A | missense | 100% | 9 |
20 | 4680462 | C | A | c.596C>A | T199N | missense | 100% | 1 |
20 | 4680463 | C | T | c.597C>T | T199T | synonymous | 100% | 2 |
20 | 4680467 | A | T | c.601A>T | T201S | missense | 100% | 1 |
20 | 4680469 | C | T | c.603C>T | T201T | synonymous | 100% | 3 |
20 | 4680470 | G | A | c.604G>A | D202N | missense | 100% | 1 |
20 | 4680472 | C | T | c.606C>T | D202D | synonymous | 100% | 8 |
20 | 4680473 | G | A | c.607G>A | V203I | missense | 100% | 3 |
20 | 4680488 | C | T | c.622C>T | R208C | missense | 100% | 1 |
20 | 4680489 | G | A | c.623G>A | R208H | missense | 100% | 9 |
20 | 4680490 | C | T | c.624C>T | R208R | synonymous | 100% | 4 |
20 | 4680491 | G | A | c.625G>A | V209M | missense | 100% | 1 |
20 | 4680494 | G | A | c.628G>A | V210I | missense | 100% | 2 |
20 | 4680501 | A | C | c.635A>C | Q212P | missense | 100% | 1 |
20 | 4680502 | G | A | c.636G>A | Q212Q | synonymous | 100% | 2 |
20 | 4680520 | C | T | c.654C>T | Y218Y | synonymous | 100% | 17 |
20 | 4680534 | A | T | c.668A>T | Q223L | missense | 100% | 1 |
20 | 4680539 | T | C | c.673T>C | Y225H | missense | 99% | 1 |
20 | 4680540 | A | G | c.674A>G | Y225C | missense | 99% | 1 |
20 | 4680541 | T | C | c.675T>C | Y225Y | synonymous | 99% | 3 |
20 | 4680552 | G | A | c.686G>A | G229E | missense | 98% | 1 |
20 | 4680553 | A | G | c.687A>G | G229G | synonymous | 98% | 1 |
20 | 4680561 | T | G | c.695T>G | M232R | missense | 97% | 10 |
20 | 4680566 | C | T | c.700C>T | L234F | missense | 95% | 29 |
20 | 4680590 | C | T | c.724C>T | L242F | missense | 87% | 1 |
20 | 4680598 | C | G | c.732C>G | I244M | missense | 84% | 1 |
20 | 4680598 | C | T | c.732C>T | I244I | synonymous | 84% | 1 |
20 | 4680626 | T | G | c.760T>G | X254G | read-through | 66% | 1 |
Table S4. Summary of rare PRNP variants by functional class in ExAC
Class | Total AC |
---|---|
missense | 236 |
non-coding | 1 |
nonsense | 3 |
read-through | 1 |
synonymous | 180 |
Table S5. Allele counts of 16 reportedly pathogenic PRNP variants in >500,000 23andMe research participants.
To protect the privacy of 23andMe research participants, allele count (AC) values between 1 and 5 inclusive are displayed as "1-5“ and are rounded up to 5 for the purposes of plotting. These alleles were seen almost exclusively in a heterozygous state, with fewer than 5 homozygous individuals total across all 16 variants.
Variant | dbSNP id | 23andMe id | Called genotypes | AC | Comments |
P102L | rs74315401 | i5004359 | 502075 | 1-5 total | |
A117V | rs74315402 | i5004358 | 501820 | ||
D178N | rs74315403 | i5004357 | 502450 | ||
E200K | rs28933385 | rs28933385 | 531370 | ||
M232R | rs74315409 | i5004352 | 502475 | 78 | AC=29 in 2,685 individuals with >90% Japanese ancestry |
V180I | rs74315408 | i5004353 | 502125 | 15 | AC=1-5 in 2,670 individuals with >90% Japanese ancestry |
V210I | rs74315407 | i5004354 | 502290 | 13 | AC=8 in 385,030 Europeans |
R208C | rs55826236 | rs55826236 | 501850 | 8 | |
R208H | rs74315412 | i5004349 | 501775 | 22 | AC=19 in 384,645 Europeans |
P105L | rs11538758 | rs11538758 | 531575 | 1-5 total | |
G131V | rs74315410 | i5004351 | 499455 | ||
A133V | rs74315415 | i5004347 | 502520 | ||
T183A | rs74315411 | i5004350 | 502295 | ||
F198V | rs55871421 | rs55871421 | 501540 | ||
F198S | rs74315405 | i5004356 | 502460 | ||
G217R | rs74315406 | i5004355 | 502385 |
Table S6. Phenotypes investigated in studies in which ExAC individuals with reportedly pathogenic PRNP variants were ascertained.
Note that we do not have access to phenotypic data to indicate whether a particular individual was ascertained as a case or a control. Therefore "cardiovascular" simply means an individual was ascertained in a cardiovascular disease cohort, not necessarily that the individual has cardiovascular disease. “Mixed” cohorts include controls, cardiovascular and pulmonary phenotypes.
Cohort phenotype | Total in ExAC | With reportedly pathogenic PRNP variants |
---|---|---|
Autoimmune | 1675 | 4 |
Cancer | 7601 | 3 |
Cardiovascular | 14622 | 14 |
Metabolic | 15327 | 19 |
Mixed | 3936 | 2 |
Population controls | 2215 | 6 |
Psychiatric | 15330 | 4 |
TOTAL | 60706 | 52 |
Table S7. Inferred ancestry and codon 129 genotypes of ExAC individuals with reportedly pathogenic variants.
Three-letter HapMap ancestry codes are defined in Table S8.
variant | pops | codon129 |
---|---|---|
P39L | 1 PJL, 2 TSI | 2 M/M, 1 M/V |
G131V | 1 TSI | 1 M/V |
R148H | 1 CEU, 1 IBS, 1 PJL | 3 M/M |
V180I | 1 CHB, 2 JPT, 3 PJL | 4 M/M, 1 M/V, 1 V/V |
T188R | 1 CLM, 2 MXL | 1 M/V, 2 V/V |
E196A | 3 CHB, 6 CHS | 9 M/M |
D202N | 1 TSI | 1 M/V |
V203I | 1 IBS, 2 TSI | 1 M/M, 2 M/V |
R208C | 1 ACB | 1 M/M |
R208H | 1 ACB, 2 ASW, 1 CLM, 2 IBS, 1 MSL, 2 TSI | 4 M/M, 5 M/V |
V210I | 2 TSI | 2 M/M |
Q212P | 1 CEU | 1 M/V |
M232R | 5 CHB, 5 JPT | 10 M/M |
Table S8. Inferred ancestry of all ExAC individuals.
Methods for ancestry assignment are described in Materials and Methods.
Population code | Description | Super population code | N in ExAC |
---|---|---|---|
ACB | African Caribbeans in Barbados | AFR | 2267 |
ASW | Americans of African Ancestry in SW USA | AFR | 2151 |
BEB | Bengali from Bangladesh | SAS | 483 |
CDX | Chinese Dai in Xishuangbanna, China | EAS | 19 |
CEU | Utah Residents (CEPH) with Northern and Western European ancestry | EUR | 14185 |
CHB | Han Chinese in Beijing, China | EAS | 1553 |
CHS | Southern Han Chinese | EAS | 1733 |
CLM | Colombians from Medellin, Colombia | AMR | 870 |
ESN | Esan in Nigeria | AFR | 89 |
FIN | Finnish in Finland | EUR | 3977 |
GBR | British in England and Scotland | EUR | 10358 |
GIH | Gujarati Indian from Houston, Texas | SAS | 79 |
GWD | Gambian in Western Divisions in The Gambia | AFR | 102 |
IBS | Iberian population in Spain | EUR | 3534 |
ITU | Indian Telugu from the UK | SAS | 1089 |
JPT | Japanese in Tokyo, Japan | EAS | 663 |
KHV | Kinh in Ho Chi Minh City, Vietnam | EAS | 369 |
LWK | Luhya in Webuye, Kenya | AFR | 72 |
MSL | Mende in Sierra Leone | AFR | 189 |
MXL | Mexican Ancestry from Los Angeles USA | AMR | 2658 |
PEL | Peruvians from Lima, Peru | AMR | 1900 |
PJL | Punjabi from Lahore, Pakistan | SAS | 6300 |
PUR | Puerto Ricans from Puerto Rico | AMR | 579 |
STU | Sri Lankan Tamil from the UK | SAS | 460 |
TSI | Toscani in Italia | EUR | 4795 |
YRI | Yoruba in Ibadan, Nigeria | AFR | 232 |
Table S9. Inferred ancestry of 23andMe research participants
Ancestry | Minimum called genotypes | Maximum called genotypes | Total reportedly pathogenic AC |
---|---|---|---|
European | 382865 | 408475 | 35 |
Latino | 42425 | 44480 | 10 |
African | 22945 | 23795 | 10 |
East Asian | 20255 | 21710 | 75 |
All others | 30975 | 33125 | 20 |
TOTAL | 499455 | 531575 | 140 |
Table S10. Details of Japanese prion disease cases**
- Age at onset is expressed as the mean ± SD (range) years.
- Duration between the onset and akinetic mutism or death without akinetic mutism. Duration is expressed as the mean ± SD (range) months.
- Terms:
- EE = glutamic acid homozygosity
- EK = glutamic acid/lysine heterozygosity
- KK = lysine homozygosity
- MM = methionine homozygosity
- MV = methionine/valine heterozygosity
- PSWCs = periodic synchronous wave complexes
Variant | N | Male/Female | Age at onset* | (range) | Positive family history (%) |
Insertion | 8 | 4/4 | 51.0 ± 12.0 | (26-68) | 5 (63) |
P102L | 83 | 38/45 | 55.5 ± 10.3 | (22-75) | 69 (83) |
P105L | 12 | 7/5 | 46.9 ± 8.4 | (31-61) | 11 (92) |
D178N-129M | 4 | 3/1 | 54.5 ± 5.5 | (46-61) | None |
D178N-129V | 1 | 1/0 | 74 | None | |
V180I | 218 | 84/134 | 77.4 ± 6.8 | (44-93) | 5 (2) |
E200K | 63 | 30/33 | 61.1 ± 9.9 | (31-83) | 28 (44) |
V203I | 3 | 2/1 | 73 | None | |
R208H | 1 | 0/1 | 74 | None | |
V210I | 1 | 0/1 | 55 | None | |
M232R | 63 | 32/31 | 64.4 ± 10.9 | (15-82) | 2 (3) |
V180I+M232R | 4 | 2/2 | 71.3 ± 3.6 | (65-74) | None |
Variant | Duration** | (range) | Codon 129 | Codon 219 |
Insertion | 27.8 ± 17.7 | (3-57) | MM 6; MV 1 | EE 6; KK 1 |
P102L | 48.4 ± 35.8 | (2-186) | MM 67; MV 6 | EE 70; EK 2 |
P105L | 90.2 ± 40.4 | (25-184) | MV 11 | EE 7 |
D178N-129M | 8.5 ± 4.4 | (2-13) | MM 4 | EE 4 |
D178N-129V | 24 | MV 1 | EE 1 | |
V180I | 16.4 ± 14.5 | (0-70) | MM 162; MV 54 | EE 210 |
E200K | 5.0 ± 6.0 | (1-32) | MM 58; MV 3 | EE 58; EK 3 |
V203I | 3.7 ± 2.1 | (1-6) | MM 3 | EE 3 |
R208H | 3 | MM 1 | EE 1 | |
V210I | 3 | MM 1 | EE 1 | |
M232R | 8.6 ± 12.7 | (0-78) | MM 60; MV 2 | EE 61; EK 1 |
V180I+M232R | 21.8 ± 17.7 | (1-47) | MM 4 | EE 4 |
Variant | PSWCs on EEG (%) | Hyperintensities on MRI (%) | Positive 14-3-3 protein (%) |
Insertion | 3/8 (38) | 2/7 (29) | 0/1 (0) |
P102L | 11/72 (15) | 32/76 (42) | 13/34 (38) |
P105L | 1/10 (10) | 1/11 (9) | 1/2 (50) |
D178N-129M | 0/4 (0) | 1/4 (25) | 1/2 (50) |
D178N-129V | 0/1 (0) | 0/1 (0) | 1/1 (100) |
V180I | 19/203 (9) | 212/213 (99) | 110/140 (79) |
E200K | 56/63 (89) | 56/59 (95) | 29/31 (94) |
V203I | 3/3 (100) | 2/2 (100) | 1/1 (100) |
R208H | 1/1 (100) | 1/1 (100) | 1/1 (100) |
V210I | 1/1 (100) | 1/1 (100) | not done |
M232R | 46/61 (75) | 55/60 (92) | 31/43 (72) |
V180I+M232R | 0/4 (0) | 4/4 (100) | 0/1 (0) |
Table S11. Phenotypes of individuals with N-terminal PrP truncating variants
HGVS | Variant | Zygosity | Sex | Age | Available phenotype information |
---|---|---|---|---|---|
c.59_60insC | G20Gfs84X | Het | F | 79 | Ascertained as part of the Rotterdam Study [Hofman 2015], a prospective cohort study of middle-aged and elderly persons. In good health and free of dementia as of at least age 78, at last in-person examination completion. Has 5 siblings and 2 children. Only family history noted is that one sibling has had a stroke before age 65. |
c.109C>T | R37X | Het | M | 73 | Ascertained as a control for the Swedish schizophrenia study. Underwent heart bypass surgery in 2008, has a family history of heart problems. 4 siblings. Reports no family history of neurodegeneration or neuropathy. |
c.223C>T | Q75X | Het | M | 52 | Ascertained in a study of type 2 diabetes. Has mild type 2 diabetes treated with metformin. Has children. |
c.391G>T | G131X | Het | F | None available. |
Figure S1. Age of ExAC individuals with reportedly pathogenic PRNP variants versus all individuals in ExAC.
The distribution of ages, available for 40 of 52 individuals with reportedly pathogenic PRNP variants, did not differ from the distribution overall (p = .69, Wilcoxon rank-sum test; p = .69, student's t test) nor after controlling for cohort (p = .15, linear regression).
Figure S2. Sanger sequencing results for individuals with N-terminal truncating variants
Figure S2A. G20Gfs84X reverse (top) and forward (bottom). Primers: 2a-forward: AACTTAGGGTCACATTTGTCCTTGG; 2a-reverse: GGTAACGGTGCATGTTTTCACG. 2b forward: GTGGTGGCTGGGGTCAAGG; 2b reverse: TTTCCAGTGCCCATCAGTGC.
Figure S2B. R37X - DNA from whole blood (top) and fibroblasts (bottom). Primers: PrP2-F: TGGGACTCTGACGTTCTCCT; PrP2-R: GGTGAAGTTCTCCCCCTTGG
Figure S2C. Q75X. Primers: PRNP_EX2-M13-F [TGTAAAACGACGGCCAGT] CCATTGCTATGCACTCATTCA; PRNP_EX2-M13-R [CAGGAAACAGCTATGACC] CCATGTGCTTCATGTTGGTT