Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
1214 lines (1085 sloc) 147 KB

Quantifying penetrance in a dominant disease gene using large population control cohorts

This is the author's version of the work. It is posted here by permission of the AAAS for personal use, not for redistribution. The definitive version was published as: Minikel et al. Quantifying prion disease penetrance using large population control cohorts. Sci. Transl. Med. 8, 322ra9 (2016). DOI: 10.1126/scitranslmed.aad5169.

Eric Vallabh Minikel†1,2,3,4, Sonia M. Vallabh1,3,4, Monkol Lek1,2, Karol Estrada1,2, Kaitlin E. Samocha1,2,3, J. Fah Sathirapongsasuti5, Cory Y. McLean5, Joyce Y. Tung5, Linda P.C. Yu5, Pierluigi Gambetti6, Janis Blevins6, Shulin Zhang7, Yvonne Cohen6, Wei Chen6, Masahito Yamada8, Tsuyoshi Hamaguchi8, Nobuo Sanjo9, Hidehiro Mizusawa10, Yosikazu Nakamura11, Tetsuyuki Kitamoto12, Steven J. Collins13, Alison Boyd13, Robert G. Will14, Richard Knight14, Claudia Ponto15, Inga Zerr15, Theo F.J. Kraus16, Sabina Eigenbrod16, Armin Giese16, Miguel Calero17, Jesús de Pedro-Cuesta17, Stéphane Haïk18,19, Jean-Louis Laplanche20, Elodie Bouaziz-Amar20, Jean-Philippe Brandel18,19, Sabina Capellari21,22, Piero Parchi21,22, Anna Poleggi23, Anna Ladogana23, Anne H. O'Donnell-Luria2,1,24, Konrad J. Karczewski2,1, Jamie L. Marshall1,2, Michael Boehnke25, Markku Laakso26, Karen L. Mohlke27, Anna Kähler28, Kimberly Chambert29, Steven McCarroll29, Patrick F. Sullivan27,28, Christina M. Hultman28, Shaun M. Purcell30, Pamela Sklar30, Sven J. van der Lee31, Annemieke Rozemuller32, Casper Jansen32, Albert Hofman31, Robert Kraaij33, Jeroen G.J. van Rooij33, M. Arfan Ikram31, André G. Uitterlinden31,33, Cornelia M. van Duijn31, Exome Aggregation Consortium (ExAC)34, Mark J. Daly2,1, Daniel G. MacArthur†2,1

  1. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, United States
  2. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, United States
  3. Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA 02115, United States
  4. Prion Alliance, Cambridge, MA 02139, United States
  5. Research, 23andMe Inc., Mountain View, CA 94041, United States
  6. National Prion Disease Pathology Surveillance Center, Cleveland, OH 44106, United States
  7. University Hospitals Case Medical Center, Cleveland, OH 44106, United States
  8. Department of Neurology and Neurobiology of Aging, Kanazawa University Graduate School of Medical Sciences, Kanazawa, Japan 920-8640
  9. Department of Neurology and Neurological Science, Graduate School, Tokyo Medical and Dental University, Tokyo, Japan 113-8519
  10. National Center Hospital, National Center of Neurology and Psychiatry, Tokyo, Japan 187-8551
  11. Department of Public Health, Jichi Medical University, Shimotsuke, Japan 329-0498
  12. Department of Neurological Science, Tohoku University Graduate School of Medicine, Sendai, Japan 980-8575
  13. Australian National Creutzfeldt-Jakob Disease Registry, The University of Melbourne, Parkville, Australia 3010
  14. National Creutzfeldt-Jakob Disease Research and Surveillance Unit, Western General Hospital, Edinburgh, United Kingdom EH4 2XU
  15. National Reference Center for TSE, Georg-August University, Goettingen, Germany 37073
  16. Center for Neuropathology and Prion Research (ZNP) at the Ludwig-Maximilians-University, Munich, Germany 81377
  17. Instituto de Salud Carlos III and CIBERNED, Madrid, Spain 28031
  18. Inserm U 1127, CNRS UMR 7225, Sorbonne Universités, UPMC Univ. Paris 06 UMR S 1127, Institut du Cerveau et de la Moelle épinière, ICM, 75013 Paris, France
  19. Assistance Publique-Hôpitaux de Paris, Cellule Nationale de Référence des Maladies de Creutzfeldt-Jakob, Groupe Hospitalier Pitié-Salpêtrière, F-75013 Paris, France 75010
  20. Assistance Publique-Hôpitaux de Paris, Service de Biochimie et Biologie moléculaire, Hôpital Lariboisière, Paris, France 75010
  21. IRCCS Institute of Neurological Sciences, Bologna, Italy 40123
  22. Department of Biomedical and Neuromotor Sciences, University of Bologna, Italy 40126
  23. Department of Cell Biology and Neurosciences, Istituto Superiore di Sanità, Rome, Italy 00161
  24. Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA 02115, United States
  25. Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA
  26. Department of Medicine, University of Eastern Finland and Kuopio University Hospital, Kuopio, Finland 70210
  27. Department of Genetics, University of North Carolina School of Medicine, Chapel Hill, NC 27599, United States
  28. Karolinska Institutet, Stockholm, Sweden SE-171 77
  29. Stanley Center for Psychiatric Research, Broad Institute, Cambridge, MA 02142, United States
  30. Icahn School of Medicine at Mount Sinai, New York, NY 10029, United States
  31. Department of Epidemiology, Erasmus MC, Rotterdam 3000 CA, The Netherlands
  32. Prion Surveillance Center, Department of Pathology, University Medical Center, Utrecht 3584 CX, Netherlands
  33. Department of Internal Medicine, Erasmus MC, Rotterdam 3000 CA, The Netherlands
  34. A list of consortium members may be found at http://exac.broadinstitute.org/about

†Correspondence may be addressed to Eric Vallabh Minikel eminikel@broadinstitute.org @cureffi or Daniel G. MacArthur macarthur@atgu.mgh.harvard.edu @dgmacarthur

Abstract

More than 100,000 genetic variants are reported to cause Mendelian disease in humans [Stenson 2014, Landrum 2014], but the penetrance - the probability that a carrier of the purported disease-causing genotype will indeed develop the disease - is generally unknown. Here we assess the impact of variants in the prion protein gene (PRNP) on the risk of prion disease by analyzing 16,025 prion disease cases, 60,706 population control exomes, and 531,575 individuals genotyped by 23andMe, Inc. We show that missense variants in PRNP previously reported to be pathogenic are at least 30x more common in the population than expected based on genetic prion disease prevalence. While some of this excess can be attributed to benign variants falsely assigned as pathogenic, other variants have genuine effects on disease susceptibility but confer lifetime risks ranging from <0.1% to ~100%. We also show that truncating variants in PRNP have position-dependent effects, with true loss-of-function alleles found in healthy older individuals, supporting the safety of therapeutic suppression of prion protein expression.

Introduction

The study of pedigrees with Mendelian disease has been tremendously successful in identifying variants that contribute to severe inherited disorders [Brunham & Hayden 2013, Amberger 2011, Chong 2015]. Causal variant discovery is enabled by selective ascertainment of affected individuals, and especially of multiplex families. Although efficient from a gene discovery perspective, the resulting ascertainment bias confounds efforts to accurately estimate the penetrance of disease-causing variants, with profound implications for genetic counseling [Crow 1999, Cooper 2013, Begg 2002, Goldwurm 2007]. The development of large-scale genotyping and sequencing methods has recently made it tractable to perform unbiased assessments of penetrance in population controls. In several instances, such studies have suggested that previously reported Mendelian variants, as a class, are substantially less penetrant than had been believed [Cooper 2011, Bick 2012, Flannick 2013, Kirov 2014]. To date, however, all of these studies have been limited to relatively prevalent (>0.1%) diseases, and point estimates of the penetrance of individual variants have been limited to large copy number variations [Cooper 2011, Kirov 2014].

Here we demonstrate the use of large-scale population data to infer the penetrance of variants in rare, dominant, monogenic disease, using the example of prion diseases. These invariably fatal neurodegenerative disorders are caused by misfolding of the prion protein (PrP, the product of PRNP) [Prusiner 1998] and have an annual incidence of 1 to 2 cases per 1 million population [Klug 2013]. A small, albeit infamous, minority of cases (<1% in recent years [CJD UK 2015, CJD US 2015]) are acquired through dietary or iatrogenic routes. The majority (~85%) of cases are defined as sporadic, occurring in individuals with two wild-type PRNP alleles and no known environmental exposures. Finally, ~15% of cases occur in individuals with rare, typically heterozygous, coding variants in PRNP, including missense variants, truncating variants, and octapeptide repeat insertions or deletions (Table S1). Centralized ascertainment of cases by national surveillance centers (Materials and Methods) makes prion disease a good test case for using reference datasets to assess the penetrance of these variants.

PRNP was conclusively established as a dominant disease gene due to clear Mendelian segregation of a few variants with disease [Hsiao 1989, Hsiao 1991b, Medori 1992]. Yet ascertainment bias [Minikel 2014], low rates of predictive genetic testing [Owen 2014], and frequent lack of family history [Kovacs 2005, Nozaki 2010] confound attempts [Chapman 1994, Spudich 1995, D'Alessandro 1998, Mitrova & Belay 2002, Minikel 2014] to estimate penetrance by survival analysis. Meanwhile, the existence of non-genetic etiologies leaves doubt as to whether novel variants are causal or coincidental.

A fully penetrant disease genotype should be no more common in the population than the disease that it causes. This observation allows us to leverage two large population control datasets to re-evaluate the penetrance of reported disease variants in PRNP. The recently reported Exome Aggregation Consortium (ExAC) dataset [Lek 2015] contains variant calls on 60,706 people ascertained for various common diseases, without any ascertainment on neurodegenerative disease. 23andMe’s database contains genotypes on 531,575 customers of its direct-to-consumer genotyping service who have opted in to participate in research, pruned to remove related individuals (first cousins or closer; Materials and Methods), preventing enrichment due to large families with prion disease.

Results

We began by asking whether reportedly pathogenic variants are as rare as expected in these population control datasets. The proportion of people alive in the population today who harbor completely penetrant variants causal for prion disease can be approximated by the product of three numbers: the annual incidence of prion disease, the proportion of cases with such a genetic variant, and the life expectancy of individuals harboring these variants. Based on upper bounds of these numbers (Figure 1A), and assuming ascertainment is neutral with respect to neurodegenerative disease, we would no more than 1.7 such individuals in the 60,706 exomes in the ExAC dataset [Lek 2015], and ~15 such individuals among the ~530,000 genotyped 23andMe customers who opted to participate in research.

Through reviews [Kong 2004, Beck 2010, Mastrianni 2010] and PubMed searches, we identified 63 rare genetic variants reported to cause prion disease (Table S2). We reviewed ExAC read-level evidence for every rare (<0.1% allele frequency) variant call in PRNP (Materials and Methods; Table S3 - S4) and found that 52 individuals in ExAC harbor reportedly pathogenic missense variants (Figure 1B), at least a 30-fold excess over expectation if all such variants were fully penetrant. Similarly, in the 23andMe database we observed a total of 141 alleles of 16 reportedly pathogenic variants genotyped on their platform (Table S5).

Figure 1. Reportedly pathogenic PRNP variants are >30 times more common in controls than expected based on disease incidence. Reported prion disease incidence varies with the intensity of surveillance efforts [Klug 2013], with an apparent upper bound of ~2 cases per million population per year (Materials and Methods). In our surveillance cohorts, 65% of cases underwent PRNP open reading frame sequencing, with 12% of all cases, or 18% of sequenced cases, possessing a rare variant (Table S1), consistent with an oft-cited estimate that 15% of cases of Creutzfeldt-Jakob disease are familial [Masters 1979]. Genetic prion diseases typically strike in midlife, with mean age of onset for different variants ranging from 28 to 77 [Laplanche 1999, Nozaki 2010, Table S10]; we accepted 80, a typical human life expectancy, as an upper bound for mean age of onset, and to be additionally conservative, we assumed that all individuals in ExAC and 23andMe were below any age of onset, even though both contain elderly individuals [Servick 2015, Figure S1]. Thus, no more than ~29 people per million in the general population should harbor high-penetrance prion disease-causing variants. Therefore at most ~1.7 people in ExAC (A) and ~15 people in 23andMe would be expected to harbor such variants. In fact, reportedly pathogenic variants are seen in 52 ExAC individuals (B) and on 141 alleles in the 23andMe database.

Individuals with reportedly pathogenic PRNP variants did not cluster within any one cohort within ExAC (Table S6), arguing against enrichment due to comorbidity with a common disease ascertained for exome sequencing. ExAC does include populations, such as South Asians, in which prion disease is not closely surveilled and we cannot rule out a higher incidence than that reported in developed countries, yet the individuals with reportedly pathogenic variants in either ExAC or 23andMe were of diverse inferred ancestry (Table S7, S8, S9). These individuals’ ages were consistent with the overall ExAC age distribution (Figure S1), rather than being enriched below some age of disease onset. ExAC genotypes at the prion disease modifier polymorphism M129V [Capellari 2011] were consistent with population allele frequencies (Table S7), rather than enriched for the lower-risk heterozygous genotype. Certain PRNP variants are associated with highly atypical phenotypes [Moore 2001, Mead & Reilly 2015], which are mistakable for other dementias and may not be well ascertained by current surveillance efforts. Most of the variants found in our population control cohorts, however, have been reported in individuals with a classic, sporadic Creutzfeldt-Jakob disease phenotype [Nozaki 2010, Kong 2004, Mastrianni 2010, Zhang 2014, Tartaglia 2010, Peoc'h 2000], arguing that the discrepancy between observed and expected allele counts does not result primarily from an underappreciated prevalence of atypical prion disease.

Having observed a large excess of reportedly pathogenic variants over expectation in two datasets, and having excluded the most obvious confounders, we hypothesized that the unexpectedly high frequency of these variants in controls might arise from benign and/or low-risk variants.

We investigated which variants were responsible for the observed excess (Figure 2). Variants with the strongest prior evidence of pathogenicity are absent from ExAC and cumulatively account for ≤5 alleles in 23andMe, consistent with the known rarity of genetic prion disease. Much of the excess allele frequency in population controls is due, instead, to variants with very weak prior evidence of pathogenicity (Figure 2 and Supplementary Discussion). For four variants observed in controls (V180I, R208H, V210I, and M232R), pathogenicity is controversial [Beck 2012, Nozaki 2012] or reduced penetrance has been suggested [Capellari 2005, Ripoll 1993], but quantitative estimates of penetrance have never been produced, and the variants remain categorized as causes of genetic Creutzfeldt-Jakob disease [Kovacs 2005, Nozaki 2010]. Although we cannot prove that any one of the variants we observe in population controls is completely neutral, the list of reported pathogenic variants likely includes false positives. Indeed, the observation that 0.4% (236 / 60,706) of ExAC individuals harbor a rare (<0.1%) missense variant (Table S4) suggests that ~4 of every 1000 sporadic prion disease cases will, by chance, harbor such a variant, which in many cases will be interpreted and reported as causal given the long-standing classification of PRNP as a Mendelian disease gene.

Figure 2. Reportedly pathogenic PRNP variants include Mendelian, benign, and intermediate variants. Prior evidence of pathogenicity is extremely strong for four missense variants — P102L, A117V, D178N and E200K — each of which has been observed to segregate with disease in multiple multigenerational families [Hsiao 1989, Goldfarb 1990, Hsiao 1991a, Hsiao 1991b, Medori 1992, Medori 1992, Mastrianni 1995, Webb 2008] and to cause spontaneous disease in mouse models [Hsiao 1990, Dossena 2008, Jackson 2009, Yang 2009, Jackson 2013, Bouybayoune 2015]. These account for >50% of genetic prion disease cases (Table S1), yet are absent from ExAC (Table S3), and collectively appear on ≤5 alleles in 23andMe’s cohort (Table S5), indicating allele frequencies sufficiently low to be consistent with the prevalence of genetic prion disease (Figure 1). Conversely, the variants most common in controls and rare in cases had categorically weak prior evidence for pathogenicity. R208C (8 alleles in 23andMe) and P39L were observed in patients presenting clinically with other dementias, with prion disease suggested as an alternative diagnosis solely on the basis of finding a novel PRNP variant [Bernardi 2014, Zheng 2008]. E196A was originally reported in the literature in a single patient, with a sporadic Creutzfeldt-Jakob disease phenotype and no family history [Zhang 2014], and appeared in only 2 of 790 Chinese prion disease patients in a recent case series [Shi 2015], consistent with the ~0.1% allele frequency among Chinese individuals in ExAC (Tables S5 and S8). At least three variants (M232R, V180I, and V210I) occupy a space inconsistent with either neutrality or with complete penetrance (see main text and Figure 3). R148H, T188R, V203I, R208H and additional variants are discussed in Supplementary Discussion.

At least three variants, however (V180I, V210I, and M232R) fail to cluster with either the likely benign or likely Mendelian variants (Figure 2). Because each of these three appears primarily in one population in both cases and controls (Tables S1, S5, S7), we compared allele frequencies in matched population groups. Each has an allele frequency in controls that is too high for a fully penetrant, dominant prion disease-causing variant, and yet far lower than the corresponding allele frequency in cases (Figure 3).

Because we lack genome-wide SNP data on cases we are unable to directly correct for population stratification, which thus may contribute to the observed differences in allele frequencies. Geographic clusters of genetic prion disease have been recognized for decades [Masters 1979, Lee 1999, Mitrova & Belay 2002]. For example, nearly half of Italian prion disease cases with the V210I variant are concentrated within two regions of Italy [Ladogana 2005], so any non-uniform geographic sampling in cases versus controls would add some uncertainty to our penetrance estimates.

Nonetheless, the magnitude of the enrichment of certain variants in cases over controls in our datasets makes substructure an implausible explanation for the entire difference. In order for V210I to be neutral and yet appear with an allele frequency of 8.1% in Italian cases despite an apparent allele frequency of 0.02% in Italian controls, it would need to be fixed in a subpopulation comprising 8% of Italy’s populace. Under this scenario, this subpopulation would need to be virtually unsampled in any of our control cohorts, and V210I cases would contain many homozygotes. In reality, no cases have been reported homozygous for this variant. Conversely, if V210I were fully penetrant, family history would be positive in most cases, and the variant’s appearance on 13 alleles in 23andMe (Table S5) would indicate that this variant alone accounts for three times the known prevalence of genetic prion disease (Figure 1A). Finally, if the low family history rate were due to many de novo mutations, then V210I cases would be more uniformly distributed across populations (Table S1). Similar arguments rule out V180I being either benign or Mendelian. M232R, though clearly not Mendelian, could still be benign as it exhibits only 4- to 6-fold enrichment in cases, an amount which might conceivably be explained by Japanese population substructure alone. However, because even common variants in PRNP affect prion disease risk with odds ratios of 3 or greater [Shibuya 1998, Bishop 2009, Mead 2012], it is not implausible that M232R has a similar effect size, and our data suggest this a more likely scenario than it being neutral.

Satisfied that these three variants are likely neither benign nor Mendelian, we estimated lifetime risk in heterozygotes (Materials and Methods). The 1 in 1 million annual incidence of prion disease translates into a baseline lifetime risk of ~1 in 10,000 in the general population (Materials and Methods). Because prion diseases are so rare, even the massive enrichment of heterozygotes in cases (Figure 3), implying odds ratios on the order of 10 to 1,000, corresponds to only low penetrance, with lifetime risk for M232R, V180I and V210I estimated near 0.1%, 1%, and 10%, respectively. Although our estimates are imperfect due to population stratification, they accord well with family history rates (Figure 3) and explain the unique space that these variants occupy in the plot of case versus control allele count (Figure 2). These data indicate that PRNP missense variants occupy a risk continuum rather than a dichotomy of causal versus benign.

Figure 3. Certain variants confer intermediate amounts of lifetime risk. M232R, V180I, and V210I show varying degrees of enrichment in cases over controls, indicating a weak to moderate increase in risk. Best estimates of lifetime risk in heterozygotes (Materials and Methods) range from ~0.08% for M232R to ~7.8% for V210I, and correlate with the likelihood of family history. Allele frequencies for P102L, A117V, D178N and E200K are consistent with up to 100% penetrance, with confidence intervals including all reported estimates of E200K penetrance based on survival analysis, which range from ~60% to ~90% [Chapman 1994, Spudich 1995, D'Alessandro 1998, Mitrova & Belay 2002, Minikel 2014]. Rates of family history of neurodegenerative disease in Japanese cases are from (Table S10) and in European populations are from [Kovacs 2005], with Wilson binomial confidence intervals shown. *Based on allele counts rounded for privacy (Materials and Methods). †GSS, Gerstmann Straussler Scheinker disease associated with variants P102L, A117V and G131V. ‡FFI: fatal familial insomnia associated with a D178N cis 129M haplotype.

We asked whether the same was true of protein-truncating variants. PRNP possesses only one protein-coding exon, so premature stop codons are expected to result in truncated polypeptides rather than in nonsense-mediated decay. Prion diseases are known to arise from a gain of function, as neurodegeneration is not seen in mice, cows, or goats lacking PrP [Bueler 1992, Richt 2007, Yu 2009, Benestad 2012], and the rate of prion disease progression is tightly correlated with PrP expression level [Fischer 1996]. Yet heterozygous C-terminal (residue ≥145) truncating variants are known to cause prion disease, sometimes with peripheral amyloidosis [Mead & Reilly 2015]. These patients also experience sensorimotor neuropathy phenotypically similar to that present in homozygous, but not heterozygous, PrP knockout mice [Bremer 2010], but attributed to amyloid infiltration of peripheral nerves, rather than loss of PrP function [Mead & Reilly 2015].

We identified, for the first time, heterozygous N-terminal (residue ≤131) truncating variants in four ExAC individuals and were able to obtain Sanger validation (Figure S2) and limited phenotype data (Table S11) for three. These individuals are free of overt neurological disease at ages 79, 73, and 52, and report no personal or family history of neurodegeneration nor of peripheral neuropathy. Therefore, the pathogenicity of protein-truncating variants appears to be dictated by position within PrP’s amino acid sequence (Figure 4). Observing three PRNP nonsense variants in ExAC is consistent with the expected number (~3.9) once we adjust our model [Samocha 2014] to exclude codons ≥145, where truncations cause a dominant gain-of-function disease. Thus, we see no evidence that PRNP is constrained against truncation in its N terminus. This, combined with the lack of any obvious phenotype in individuals with N-terminal truncating variants, suggests that heterozygous loss of PrP function is tolerated.

Figure 4. Effects of truncating variants in the human prion protein are position-dependent. Truncating variants reported in prion disease cases in the literature (Table S2) and in our cohorts (Table S1) cluster exclusively in the C-terminal region (residue ≥145), while truncating variants in ExAC are more N-terminal (residue ≤131). The ortholog of each residue from 23-94 is deleted in at least one prion-susceptible transgenic mouse line [Aguzzi 2008]. C-terminal truncations abolish PrP’s glycosylphosphatidylinositol anchor but leave most of the protein intact, a combination that mediates gain of function through mislocalization, causing this normally cell-surface-anchored protein to be secreted. Consistent with this model of pathogenicity, mice expressing full-length secreted PrP develop fatal and transmissible prion disease [Chesebro 2010, Stohr 2011]. By contrast, the N-terminal truncating variants that we observe retain only residues dispensable for prion propagation, and are likely to cause a total loss of protein function.

Discussion

Over 100,000 genetic variants have been reported to cause Mendelian disease in humans [Stenson 2014, Landrum 2014]. Many such reports do not meet current standards for assertions of pathogenicity [MacArthur 2014, Richards 2015], and if all such reports were believed, the cumulative frequency of these variants in the population would imply that most people have a genetic disease [Lek 2015]. It is generally unclear how much of the excess burden of purported disease variants in the population is due to benign variants falsely associated, and how much is due to variants with genuine association but incomplete penetrance.

Here we leverage newly available large genomic reference datasets to re-evaluate reported disease associations in a dominant disease gene, PRNP. We identify some missense variants as likely benign while showing that others span a spectrum from <0.1% to ~100% penetrance. Our analyses provide quantitative estimates of lifetime risk for hundreds of asymptomatic individuals who have inherited incompletely penetrant PRNP variants.

Available datasets are only now approaching the size and quality required for such analyses, resulting in limitations for our study. The confidence intervals on our lifetime risk estimates span more than an order of magnitude, and our inability to perfectly control for population stratification injects additional uncertainty. We have been unable to reclassify those PRNP variants that are very rare both in cases and in controls (Supplementary Discussion). We have avoided analysis of large insertions that are poorly called with short sequencing reads, though we note that existing literature on these insertions is consistent with a spectrum of penetrance similar to that which we observe for missense variants [Kong 2004, Mead 2006b]. Penetrance estimation in Mendelian disease will be improved by the collection of larger case series, particularly with genome-wide SNP data to allow more accurate population matching. This, coupled with continued large-scale population control sequencing and genotyping efforts, should reveal whether the dramatic variation in penetrance that we observe here is a more general feature of dominant disease genes.

Because PrP is required for prion pathogenesis and reduction in gene dosage slows disease progression [Bueler 1993, Fischer 1996, Mallucci 2003, Safar 2005], several groups have sought to therapeutically reduce PrP expression using RNAi [White 2008, Pulford 2010, Ahn 2014], antisense oligonucleotides [Nazor Friberg 2012], or small molecules [Karapetyan & Sferrazza 2013, Silber 2014]. Our discovery of heterozygous loss-of-function variants in three healthy older humans provides the first human genetic data regarding the effects of a 50% reduction in gene dosage for PRNP. Both the number of individuals and the depth of available phenotype data are limited, and lifelong heterozygous inactivation of a gene is an imperfect model of the effects of pharmacological depletion of the gene product. With those limitations, our data provide preliminary evidence that a reduction in PRNP dosage, if achievable in patients, is likely to be tolerated. Increasingly large control sequencing datasets will soon enable testing whether the same is true of other genes currently being targeted in substrate reduction therapeutic approaches for other protein-folding disorders.

Together, our findings highlight the value of large reference datasets of human genetic variation for informing both genetic counseling and therapeutic strategy.

Materials and methods

Prion disease case series

Prion disease is considered a notifiable diagnosis in most developed countries, with mandatory reporting of all suspect cases to a centralized surveillance center. Surveillance was carried out broadly according to established guidelines [WHO 1998, WHO 2003], with specifics as described previously for Australia [Collins 2002], France [Brandel 2011], Germany [Windl 1999, Grasbon-Frodl 2004, Zerr 2009], Italy [Puopolo 2003], Japan [Nozaki 2010], and the Netherlands [Jansen 2012]. Sanger sequencing of the PRNP open reading frame was performed as described [Parchi 1999]. We included only prion disease cases classified as definite (autopsy-confirmed) or probable according to published guidelines [WHO 2003]. Criteria for genetic testing vary between countries and over the years of data collection, with testing offered only on indication of family history in some times and places, and testing of all suspect cases with tissue available in other instances. Summary statistics on the total number and proportion of cases sequenced are presented in Table S1.

Exome sequencing and analysis

The ascertainment, sequencing, and joint calling of the ExAC dataset have been described previously [Lek 2015]. We extracted all rare (<0.1%) coding variant calls in PRNP with genotype quality (GQ) ≥10, alternate allele depth (AD) ≥3 and alternate allele balance (AB) ≥20%. Read-level evidence was visualized using Integrative Genomics Viewer (IGV) [Robinson 2011] for manual review. Because most ExAC exomes were sequenced with 76bp reads and the PRNP octapeptide repeat region (codons 50-90 inclusive) is 123bp long, it was impossible to determine whether genotype calls in this region were correct, and they were not considered further. After review of IGV screenshots, 87% of genotype calls were judged to be correct and were included in Table S3. Of the genotype calls judged to be correct, 99% had genotype quality (GQ) ≥95, 99% had allelic balance (AB) between 30% and 70%, and 97% had ≥10 reads supporting the alternate allele.

All participants provided informed consent for exome sequencing and analysis. The Exome Aggregation Consortium’s aggregation and release of exome data have been approved by the Partners Healthcare Institutional Research Board (2013P001339). ExAC data have been publicly released at http://exac.broadinstitute.org/ and IGV screenshots of the rare PRNP variants deemed to be genuine and included in this study are available at https://github.com/ericminikel/prnp_penetrance/tree/master/supplement/igv

23andMe research participants and genotyping

Participants were drawn from the customer base of 23andMe, Inc., a personal genetics company (accessed February 6, 2015). All participants provided informed consent under a protocol approved by an external AAHRPP-accredited IRB, Ethical & Independent Review Services (E&I Review). DNA extraction and genotyping were performed on saliva samples by National Genetics Institute (NGI), a CLIA-licensed clinical laboratory and a subsidiary of Laboratory Corporation of America. Samples were genotyped on one of four Illumina platforms (V1-V4) as described previously [Bryc 2015]. Of the PRNP SNPs considered, two (P105L and E200K) were genotyped on all four platforms while the other 14 were genotyped only on V3 and V4, resulting in differing numbers of total samples genotyped (Table S5). Genotypes were called with Illumina GenomeStudio. A 98.5% call rate were required for all samples. As with all 23andMe research participants, individuals whose genotyping analyses failed to reach the desired call rate repeatedly were recontacted to provide additional samples. A maximal set of unrelated individuals was chosen based on segmental identity-by-descent (IBD) estimation [Durand 2014a]. Individuals were defined as related if they shared more than 700 cM IBD (approximately the minimal expected sharing between first cousins). Allele counts between 1 and 5 were rounded up to 5 to protect individual privacy (Table S5). Rounding down to 1 instead would raise our estimates of penetrance for V180I to 7.7% (95%CI, 1.2% - 50%) and for P102L, A117V, D178N and E200K collectively to 100% (95%CI, 100% - 100%), but the confidence intervals would still overlap those based on ExAC allele frequencies, and the overall conclusions of our study would remain unchanged.

23andMe ancestry composition

Ancestral origins of chromosomal segments were assigned on a continental level (European, Latino, African, and East Asian) and a country level (Japanese) as described by Durand et al [Durand 2014b]. Briefly, after phasing genotypes using an out-of-sample implementation of the Beagle algorithm [Browning & Browning 2007], a string kernel support vector machine classifier assigns tentative ancestry labels to local genomic regions. Then an autoregressive pair hidden Markov model was used to simultaneously correct phasing errors and produce reconciled local ancestry estimates and confidence scores based on the initial assignment. Finally, isotonic regression models were used to recalibrate the confidence estimates.

Europeans and East Asians were defined as individuals with more than 97% of chromosomal segments predicted as being from the respective ancestries. Because African Americans and Latinos are highly admixed, no single threshold of genome-wide ancestry is sufficient to distinguish them. However, segment length distributions of European, African, and Native American ancestries are different between African Americans and Latinos, due to distinct admixture timing in the two ethnic groups. Thus, a logistic classifier based on segment length of European, African, and Native American ancestries was used to distinguish between African Americans and Latinos.

At the country level, individuals were classified as Japanese based on the fraction of the respective local ancestry using a threshold of 90% for classifying Japanese ancestry. This threshold is based on the average fraction of local ancestry in the reference population (23andMe research participants with all four grandparents from the reference country): 94% (5% SD, N=533) for Japanese. Using the same approach, we were unable to obtain a confident set of Italian individuals for analysis of V210I due to extensive admixture. 23andMe research participants with all four grandparents from Italy only have 66% (18% SD, N=2090) Italian ancestry, and only ~60 participants have >90% Italian ancestry.

ExAC ancestry inference

We computed ten principal components based on ~5,800 common SNPs as described [Purcell 2014, Lek 2015]. A centroid in eigenvalue-weighted principal component space was generated for each HapMap population based on 1000 Genomes individuals in ExAC. The remaining individuals in ExAC were assigned to the HapMap population with the nearest centroid according to eigenvalue-weighted Euclidean distance. Ancestries of all individuals, including those with reportedly pathogenic variants, are summarized in (Tables S7, S8).

Prion disease incidence and baseline risk

The reported incidence of prion disease varies between countries and between years, with much of the variability explained by the intensity of surveillance, as measured by the number of cases referred to national surveillance centers [Klug 2013]. Rates of ~1 case per million population per year have been reported, for instance in the U.S. [Holman 2010] and in Japan [Nozaki 2010], however, the countries with the most intense surveillance (greatest number of referrals per capita), such as France and Austria, observe incidence figures as high as 2 cases per million population per year [Klug 2013]. Only in small countries where the statistics are dominated by a particular genetic prion disease founder mutation, such as Israel and Slovakia [Chapman 1994, Mitrova & Belay 2002], has an incidence higher than 2 per million been consistently observed [EUROCJD Surveillance Data]. We therefore accepted 2 cases per million as an upper bound for the true incidence of prion disease. Assuming an all-causes death rate of ~10 per 1,000 annually [UN Population and Vital Statistics Report], this incidence corresponds to prion disease accounting for ~0.02% of all deaths, which we accepted as the baseline disease risk in the general population.

Lifetime risk estimation

By Bayes' theorem, the probability of disease given a genotype (penetrance or lifetime risk, P(D|G)) is equal to the proportion of individuals with the disease who have the genotype (genotype frequency in cases, P(G|D)) times the prevalence of the disease (baseline lifetime risk in the general population, P(D)), divided by the frequency of the genotype in the general population (here, population control allele frequency, P(G)). The use of this formula to estimate disease risk dates back at least to Cornfield's estimation of the probability of lung cancer in smokers [Cornfield 1951], with later contributions by Woolf [Woolf 1955] and a synthesis by C.C. Li with application to genetics [Li 1961].

We used an allelic rather than genotypic model, such that lifetime risk in an individual with one allele is equal to case allele frequency (based on the number of prion disease cases that underwent PRNP sequencing) times baseline risk divided by population control allele frequency, P(D|A) = P(A|D)×P(D)/P(A). Note that we assume that our population control datasets include individuals who will later die of prion disease, thus enabling direct use of the ExAC and 23andMe allele frequencies as the denominator P(A). Following Kirov [Kirov 2014], we compute Wilson 95% confidence intervals on the binomial proportions P(A|D) and P(A), and calculate the upper bound of the 95% confidence interval for penetrance using the upper bound on case allele frequency and the lower bound on population control allele frequency, and vice versa for the lower bound on penetrance.

Source code availability

Data processing, analysis, and figure generation utilized custom scripts written in Python 2.7.6 and R 3.1.2. These scripts, along with vector graphics of all figures and tab-delimited text versions of all supplementary tables, are available online at https://github.com/ericminikel/prnp_penetrance.

Acknowledgments

Funding

Research reported in this publication was partially supported by the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) and the National Institute of General Medical Sciences (NIGMS) of the National Institutes of Health, under awards U54DK105566 and R01GM104371, by Broad Institute NextGen funds, and by Prion Alliance sundry funds. Sonia Vallabh is supported by the National Science Foundation (NSF) Graduate Research Fellowship Program (GRFP) grant number 2015214731. U.S. prion surveillance work was conducted under Centers for Disease Control and Prevention (CDC) contract UR8/CCU515004. Japanese prion surveillance work was supported by a grant-in-aid from the Research Committee of Prion Disease and Slow Virus Infection, the Ministry of Health, Labour and Welfare of Japan, and from the Research Committee of Surveillance and Infection Control of Prion Disease, the Ministry of Health, Labour and Welfare of Japan. The French surveillance network is supported by the Institut National de veille Sanitaire. German prion surveillance work was supported by Robert Koch-Institute / Federal Ministry of Health grant 1369-341. The UK National CJD Research and Surveillance Unit is supported by the Department of Health and the Scottish Executive. The Australian National Creutzfeldt-Jakob Disease Registry is funded by the Commonwealth Department of Health. SJC is supported by a NHMRC Practitioner Fellowship: identification #APP1005816. Contributions at Erasmus MC were supported by Netherlands Genomics Initiative (NGI)/Netherlands Organisation for Scientific Research (NWO) sponsored Netherlands Consortium for Healthy Aging (NCHA; project 050-060-810), by the Genetic Laboratory of the Department of Internal Medicine, Erasmus MC, by a Complementation Project of the Biobanking and Biomolecular Research Infrastructure Netherlands (BBMRI-NL; www.bbmri.nl ; project number CP2010-41), by Erasmus Medical Center and Erasmus University, Rotterdam, Netherlands Organization for the Health Research and Development (ZonMw), the Research Institute for Diseases in the Elderly (RIDE), the Ministry of Education, Culture and Science, the Ministry for Health, Welfare and Sports, the European Commission (DG XII), and the Municipality of Rotterdam. We thank the customers of 23andMe, ExAC research participants, and prion disease patients and families who participated in this research.

Author contributions

E.V.M., S.M.V. and D.G.M. conceived and designed the study. E.V.M analyzed data, generated figures, and wrote the manuscript. S.M.V. and E.V.M. reviewed literature and IGV screenshots. K.E.S. performed constraint analyses. M.Lek, K.E., K.E.S., K.J.K., A.H.O.-L., M.J.D., and D.G.M. consulted on data analysis and interpretation. J.F.S., C.Y.M., J.Y.C., and L.P.C.Y. prepared and consulted on analysis of 23andMe data. P.G., J.B., S.Z., Y.C., W.C., M.Y., T.H., N.S., H.M., Y.N., T.K., S.J.C., A.B., R.G.W., R.Knight, C.P., I.Z., T.F.J.K., S.E., A.G., M.C., J.d.P.C., S.H., J.-L.L., E.B.-A., J.-P.B., S.C, P.P., A.L., A.P., R.Kraaij., J.G.J.v.R., S.J.v.d.L., R.M., and C.v.D. prepared and consulted on analysis of prion surveillance data. E.V.M., J.L.M., M.B., M.Laakso, K.M., A.K., K.C., S.A.M., P.S., C.M.H., S.M.P., P.S., C.v.D., F.R.R., A.H., A.I., S.J.v.d.L., J.M.V.-D., and A.G.U. prepared and consulted on analysis of data regarding protein-truncating variants. ExAC provided exome sequence data.

Supplemental information

Table of contents:

  • Supplementary Discussion
    • Additional variants
    • Dominant versus allelic models
  • Table S1. Allele counts of rare PRNP variants in 16,025 definite and probable prion disease cases in 9 countries
  • Table S2. Rare PRNP variants reported in peer-reviewed literature to cause prion disease
  • Table S3. Allele counts of rare PRNP variants in 60,706 individuals in ExAC
  • Table S4. Summary of rare PRNP variants by functional class in ExAC
  • Table S5. Allele counts of 16 reportedly pathogenic PRNP variants in >500,000 23andMe research participants
  • Table S6. Phenotypes investigated in studies in which ExAC individuals with reportedly pathogenic PRNP variants were ascertained
  • Table S7. Inferred ancestry and codon 129 genotypes of ExAC individuals with reportedly pathogenic variants
  • Table S8. Inferred ancestry of all ExAC individuals
  • Table S9. Inferred ancestry of 23andMe research participants
  • Table S10. Details of Japanese prion disease cases
  • Table S11. Phenotypes of individuals with N-terminal PrP truncating variants
  • Figure S1. Age of ExAC individuals with reportedly pathogenic PRNP variants versus all individuals in ExAC
  • Figure S2. Sanger sequencing results for individuals with N-terminal truncating variants

Supplementary discussion

Additional variants

Of the 63 reportedly pathogenic variants (Table S2), 10 are discussed in the main text. Of those 10, our data and our analysis of the literature indicate high penetrance for 4 (P102L, A117V, D178N, and E200K), intermediate penetrance for 3 (V180I, V210I, and M232R), and suggest that 3 others may be benign (P39L, E196A, and R208C). In this section we discuss four additional variants that we cannot conclusively reclassify but which are unlikely to be highly penetrant, and we also provide a brief discussion of interpretation for remaining variants.

  • R148H has been reported in a two isolated patients with a sporadic Creutzfeldt-Jakob disease phenotype and negative family history [Krebs 2005, Pastore 2005] and appears one additional time in our case cohorts (Table S1). Based on its rarity in cases, lack of familial segregation and presence on 3 alleles in ExAC, it is unlikely to be a highly penetrant Mendelian variant. It might be benign or might slightly increase prion disease risk.
  • T188R has been reported in two cases in the literature. One German individual presented with a sporadic Creutzfeldt-Jakob disease phenotype but no autopsy was performed; family history was negative [Windl 1999, Roeber 2008]. One Mexican-American individual had autopsy-confirmed prion disease and an ambiguous family history [Tartaglia 2010]. This variant appears 12 times in our case cohort (all in the United States) and 3 times in ExAC (all in Latino populations). Based on its allele frequency in controls, rarity in cases and lack of any clear evidence for segregation in families, T188R is unlikely to be a highly penetrant Mendelian disease variant. It is not clear whether it is benign or increases prion disease risk.
  • V203I has been reported in three heterozygous patients - one Italian [Peoc'h 2000], one Korean [Jeong 2010], and one Chinese [Shi 2013], as well as in one Japanese homozygote [Komatsu 2014]. Family history is negative in all of these reported patients as well as in two additional V203I cases in our Japanese case cohort (Table S10). In our cohorts, this variant appears in a total of 16 cases from several countries; in ExAC, it appears in 3 European individuals. Based on its allele frequency in controls, rarity in cases and lack of any clear evidence for segregation in families, V203I is unlikely to be a highly penetrant Mendelian disease variant, and could be benign or could increase prion disease risk. The report of prion disease in a V203I homozygote makes us slightly inclined to favor the interpretation that V203I does increase prion disease risk.
  • R208H has been reported in several isolated cases of varied ancestries, all with a negative family history [Mastrianni 1996, Capellari 2005, Roeber 2005, Basset-Leobon 2006, Chen 2011, Matej 2012, Vita 2013]. In our cohorts, it appears in 13 prion disease cases, 9 ExAC individuals and 22 individuals in the 23andMe database. Given its high frequency in controls, this variant may be benign or may slightly increase prion disease risk.
  • Other variants. Excluding variants discussed in the main text and above, 0.8% (87 / 10460) of individuals in our case series harbor other rare PRNP missense variants, some of which have been reported as pathogenic (Table S2) and others of which have not. Because most of these variants are very rare both in cases and in population controls, comparisons of case and control allele frequency are not well powered to evaluate the pathogenicity of most individual variants. we are unable to reach any firm conclusions about their pathogenicity. Collectively, our data indicate that this category includes at least some variants that increase prion disease risk, because only 0.3% (187 / 60706) of ExAC individuals harbor a rare missense variant other than those discussed in the main text or above, whereas 0.8% (87 / 10460) of prion disease cases harbor one of these variants, a significant enrichment (p = 1 × 10-12, Fisher's exact test). Indeed, Mendelian segregation has been demonstrated for some of these variants, such as T183A and F198S [Nitrini 1997, Hsiao 1992]. However, the fact that, in the aggregate, we observe only modest (~3-fold) enrichment of such variants in cases versus controls suggests that this category also includes many neutral or very low-risk variants, consistent with our expectation that sporadic prion disease cases should, by chance, harbor some rare variants unassociated with disease. We also cannot exclude the possibility that some specific rare variants, particularly those observed in controls and not in cases, could be protective.
  • Future novel missense variants. Additional novel missense variants in PRNP are sure to be observed in prion disease patients in the future. Our findings that some reportedly pathogenic variants are either benign or exhibit low penetrance, together with our observation that ~4 in 1000 controls harbor a rare PRNP missense variant, urge caution in the interpretation of novel variants in prion disease patients. This is consistent with current guidelines [MacArthur 2014, Richards 2015], which indicate that novel protein-altering variants, even in established disease genes, should not be assumed to be causal or highly penetrant until evidence, such as Mendelian segregation, or significant enrichment in cases over controls, can be established.

Dominant versus allelic models

Virtually all patients ever reported with genetic prion disease have been heterozygous for the putative pathogenic variants. Five individuals homozygous for E200K [Simon 2000] were reported to have a younger age of onset than heterozygotes (mean 50 vs. 59 years, p = .03), suggesting some degree of codominance. There have been individual case reports of homozygotes for Q212P [Beck 2010] and V203I [Komatsu 2014], both without a family history among heterozygote relatives, which might suggest that dosage of the mutant allele is important. We are not aware of any other reports of individuals homozygous for potentially pathogenic variants in PRNP. Regardless of whether a dominant or allelic model is assumed, our formula for lifetime risk (Materials and Methods) gives identical point estimates of penetrance and virtually identical 95% confidence intervals.

Table S1. Allele counts of rare PRNP variants in 16,025 definite and probable prion disease cases in 9 countries.

Abbreviations: OPRD, octapeptide repeat deletion; OPRI, octapeptide repeat insertion.

*V203I in Japan: two heterozygotes and one homozygote, four alleles total. All other individuals are heterozygotes.

country Australia France Germany Italy Japan Netherlands Spain U.K. U.S. TOTAL
Start year 1993 1991 1993 1993 1999 1993 1993 1990 2000
End year 2014 2013 2015 2013 2014 2013 2013 2013 2014
Definite plus probable cases 553 2383 2690 1684 2144 409 1280 1963 2919 16025
Of which PRNP sequenced 152 1774 1307 1054 1533 163 749 1088 2640 10460
Proportion sequenced 27% 74% 49% 63% 72% 40% 59% 55% 90% 65%
Number with rare variants 31 196 125 396 464 22 127 173 361 1895
Proportion with rare variants 6% 8% 5% 24% 22% 5% 10% 9% 12% 12%
2-OPRD 3 3
1-OPRI 2 1 4 7
2-OPRI 1 5 6
3-OPRI 1 1 2
4-OPRI 1 3 2 13 4 23
5-OPRI 2 10 1 1 13 12 39
6-OPRI 2 35 15 52
7-OPRI 1 1 1 2 5
8-OPRI 10 10
9-OPRI 4 4
10-OPRI 1 1
OPRI (length unspecified) 9 8 17
A2V 1 1
G54S 1 4 5
P84S 1 1
G88A 1 1
G94S 1 1
H96Y 1 1
P102L 2 10 7 59 83 1 34 25 221
P105L 12 1 13
P105S 1 1
P105T 3 2 5
G114V 1 1
A117V 3 8 1 12 9 33
G131V 1 1
S132I 1 1
A133V 1 1 2
R148H 1 2 3
Q160X 1 1
Y163X 2 2
D167G 1 1
V176G 1 1
D178N 3 34 32 18 5 4 65 12 36 209
V180I 1 1 218 5 225
T183A 3 3
Q186X 1 1
H187A 1 1
H187R 7 7
T188A 1 1
T188K 2 1 3
T188R 12 12
E196A 1 1
E196K 3 8 2 13
F198S 5 5
E200G 1 1
E200K 11 101 28 123 63 2 52 38 153 571
V203I 5 3 4 5 17
R208H 1 2 7 1 4 15
V210I 4 13 19 171 1 3 36 247
E211Q 5 2 3 1 11
E211D 1 1
Q212P 2 2
I215V 1 1
Y218N 1 1
A224V 1 1
Y226X 1 1
Q227X 1 1
M232R 63 63
V180I and M232R in trans 4 4
Variant not specified 5 5 2 12

Table S2. Rare PRNP variants reported in peer-reviewed literature to cause prion disease

Note: an updated version of this table is maintained in this blog post.

variant first report see also
P39L Bernardi 2014
2-OPRD Beck 2001 Capellari 2002
1-OPRI Laplanche 1995 Pietrini 2003
2-OPRI Hill 2006
3-OPRI Nishida 2004
4-OPRI Laplanche 1995 Campbell 1996, Kaski 2011
5-OPRI Goldfarb 1991b
6-OPRI Owen 1990 Mead 2006b
7-OPRI Goldfarb 1991b Lewis 2003
8-OPRI Goldfarb 1991b Laplanche 1999
9-OPRI Krasemann 1995
12-OPRI Kumar 2011
P84S Jones 2014
S97N Zheng 2008
P102L Goldgaber 1989 Hsiao 1992
P105L Yamada 1993 Yamada 1999
P105S Tunnell 2008
P105T Rogaeva 2006 Polymenidou 2011
G114V Rodriguez 2005 Liu 2010
A117V Tateishi 1990 Hsiao 1991a
129insLGGLGGYV Hinnell 2011
G131V Panegyres 2001 Jansen 2012
S132I Hilton 2009
A133V Rowe 2007
Y145X Kitamoto 1993
R148H Krebs 2005 Pastore 2005
Q160X Finckh 2000 Jayadev 2011
Y163X Revesz 2009 Mead 2013
D167G Bishop 2009
D167N Beck 2010
V176G Simpson 2013
D178Efs25X Matsuzono 2013
D178N Goldfarb 1991a Medori 1992, Goldfarb 1992
V180I Hitoshi 1993 Chasseigneaux 2006
T183A Nitrini 1997 Grasbon-Frodl 2004
H187R Butefisch 2000
T188A Collins 2000
T188K Finckh 2000 Roeber 2008
T188R Windl 1999 Roeber 2008, Tartaglia 2010
T193I Kotta 2006
E196A Zhang 2014
E196K Peoc'h 2000
F198S Farlow 1989 Hsiao 1992
F198V Zheng 2008
E200G Kim 2013
E200K Goldgaber 1989 Hsiao 1991b
D202G Heinemann 2008
D202N Piccardo 1998
V203I Peoc'h 2000
R208C Zheng 2008
R208H Mastrianni 1996 Capellari 2005, Roeber 2005
V210I Ripoll 1993 Pocchiari 1993, Mouillet-Richard 1999
E211D Peoc'h 2012
E211Q Peoc'h 2000
Q212P Piccardo 1998
I215V Munoz-Nieto 2013
Q217R Hsiao 1992
Y218N Alzualde 2010
Y226X Jansen 2010
Q227X Jansen 2010
M232R Hitoshi 1993 Hoque 1996
M232T Bratosiewicz 2000
P238S Windl 1999

Table S3. Allele counts of rare PRNP variants in 60,706 individuals in ExAC.

Chromosomal positions are given in GRCh37 coordinates and HGVS notations are given relative to Ensembl transcript ENST00000379440. Mean read depth across the PRNP coding sequence was 55.21. Call rate is the proportion of ExAC individuals with a genotype call of genotype quality (GQ) ≥20 and a depth (DP) of ≥10 reads.

Chrom Pos Ref Alt HGVS Variant Class Call rate AC
20 4679863 C T c.-4C>T non-coding 97% 1
20 4679871 C T c.5C>T A2V missense 97% 2
20 4679877 T A c.11T>A L4H missense 98% 3
20 4679877 T G c.11T>G L4R missense 98% 1
20 4679888 A G c.22A>G M8V missense 98% 1
20 4679901 T C c.35T>C F12S missense 98% 1
20 4679916 G C c.50G>C S17T missense 98% 10
20 4679920 C A c.54C>A D18E missense 98% 2
20 4679920 C T c.54C>T D18D synonymous 98% 18
20 4679927 C A c.61C>A L21I missense 98% 1
20 4679932 C T c.66C>T C22C synonymous 98% 2
20 4679935 G A c.69G>A K23K synonymous 98% 2
20 4679939 C T c.73C>T R25C missense 98% 2
20 4679944 G A c.78G>A P26P synonymous 98% 6
20 4679967 G T c.101G>T G34V missense 98% 1
20 4679969 G A c.103G>A G35S missense 98% 1
20 4679975 C T c.109C>T R37X nonsense 98% 1
20 4679982 C T c.116C>T P39L missense 98% 3
20 4679983 G A c.117G>A P39P synonymous 98% 8
20 4679986 G A c.120G>A G40G synonymous 98% 12
20 4680005 A G c.139A>G N47D missense 98% 1
20 4680026 G A c.160G>A G54S missense 97% 78
20 4680028 T C c.162T>C G54G synonymous 97% 5
20 4680038 G T c.172G>T G58W missense 97% 1
20 4680045 C T c.179C>T P60L missense 96% 1
20 4680055 T A c.189T>A G63G synonymous 96% 1
20 4680077 G A c.211G>A G71S missense 96% 1
20 4680089 C T c.223C>T Q75X nonsense 96% 1
20 4680091 G A c.225G>A Q75Q synonymous 96% 2
20 4680093 C G c.227C>G P76R missense 96% 1
20 4680129 G C c.263G>C G88A missense 98% 1
20 4680134 G A c.268G>A G90S missense 98% 1
20 4680145 T G c.279T>G G93G synonymous 99% 1
20 4680151 C T c.285C>T T95T synonymous 99% 1
20 4680172 G A c.306G>A P102P synonymous 99% 21
20 4680185 A G c.319A>G T107A missense 99% 1
20 4680199 C T c.333C>T H111H synonymous 99% 2
20 4680202 G A c.336G>A M112I missense 99% 1
20 4680231 T G c.365T>G V122G missense 99% 1
20 4680232 G T c.366G>T V122V synonymous 99% 3
20 4680244 C A c.378C>A G126G synonymous 99% 1
20 4680244 C T c.378C>T G126G synonymous 99% 3
20 4680250 C T c.384C>T Y128Y synonymous 100% 22
20 4680252 T C c.386T>C M129T missense 100% 1
20 4680257 G T c.391G>T G131X nonsense 100% 1
20 4680258 G T c.392G>T G131V missense 100% 1
20 4680259 A G c.393A>G G131G synonymous 100% 3
20 4680262 T C c.396T>C S132S synonymous 100% 1
20 4680274 G A c.408G>A R136R synonymous 100% 2
20 4680274 G T c.408G>T R136S missense 100% 2
20 4680279 T C c.413T>C I138T missense 100% 1
20 4680289 C T c.423C>T F141F synonymous 100% 2
20 4680292 C T c.426C>T G142G synonymous 100% 1
20 4680299 T G c.433T>G Y145D missense 100% 1
20 4680308 C T c.442C>T R148C missense 100% 1
20 4680309 G A c.443G>A R148H missense 100% 3
20 4680311 T C c.445T>C Y149H missense 100% 1
20 4680316 T C c.450T>C Y150Y synonymous 100% 1
20 4680317 C T c.451C>T R151C missense 100% 2
20 4680318 G A c.452G>A R151H missense 100% 3
20 4680324 A G c.458A>G N153S missense 100% 1
20 4680328 G A c.462G>A M154I missense 100% 1
20 4680342 A G c.476A>G N159S missense 100% 1
20 4680349 G A c.483G>A V161V synonymous 100% 1
20 4680359 C T c.493C>T P165S missense 100% 2
20 4680362 A G c.496A>G M166V missense 100% 2
20 4680364 G A c.498G>A M166I missense 100% 2
20 4680373 C T c.507C>T Y169Y synonymous 100% 1
20 4680382 G A c.516G>A Q172Q synonymous 100% 1
20 4680385 C T c.519C>T N173N synonymous 100% 5
20 4680394 G A c.528G>A V176V synonymous 100% 2
20 4680397 C G c.531C>G H177Q missense 100% 1
20 4680397 C T c.531C>T H177H synonymous 100% 4
20 4680403 C T c.537C>T C179C synonymous 100% 1
20 4680404 G A c.538G>A V180I missense 100% 6
20 4680412 C G c.546C>G I182M missense 100% 2
20 4680429 C G c.563C>G T188R missense 100% 3
20 4680429 C T c.563C>T T188M missense 100% 4
20 4680443 A G c.577A>G T193A missense 100% 2
20 4680445 C A c.579C>A T193T synonymous 100% 1
20 4680449 G C c.583G>C G195R missense 100% 3
20 4680451 G A c.585G>A G195G synonymous 100% 3
20 4680453 A C c.587A>C E196A missense 100% 9
20 4680462 C A c.596C>A T199N missense 100% 1
20 4680463 C T c.597C>T T199T synonymous 100% 2
20 4680467 A T c.601A>T T201S missense 100% 1
20 4680469 C T c.603C>T T201T synonymous 100% 3
20 4680470 G A c.604G>A D202N missense 100% 1
20 4680472 C T c.606C>T D202D synonymous 100% 8
20 4680473 G A c.607G>A V203I missense 100% 3
20 4680488 C T c.622C>T R208C missense 100% 1
20 4680489 G A c.623G>A R208H missense 100% 9
20 4680490 C T c.624C>T R208R synonymous 100% 4
20 4680491 G A c.625G>A V209M missense 100% 1
20 4680494 G A c.628G>A V210I missense 100% 2
20 4680501 A C c.635A>C Q212P missense 100% 1
20 4680502 G A c.636G>A Q212Q synonymous 100% 2
20 4680520 C T c.654C>T Y218Y synonymous 100% 17
20 4680534 A T c.668A>T Q223L missense 100% 1
20 4680539 T C c.673T>C Y225H missense 99% 1
20 4680540 A G c.674A>G Y225C missense 99% 1
20 4680541 T C c.675T>C Y225Y synonymous 99% 3
20 4680552 G A c.686G>A G229E missense 98% 1
20 4680553 A G c.687A>G G229G synonymous 98% 1
20 4680561 T G c.695T>G M232R missense 97% 10
20 4680566 C T c.700C>T L234F missense 95% 29
20 4680590 C T c.724C>T L242F missense 87% 1
20 4680598 C G c.732C>G I244M missense 84% 1
20 4680598 C T c.732C>T I244I synonymous 84% 1
20 4680626 T G c.760T>G X254G read-through 66% 1

Table S4. Summary of rare PRNP variants by functional class in ExAC

Class Total AC
missense 236
non-coding 1
nonsense 3
read-through 1
synonymous 180

Table S5. Allele counts of 16 reportedly pathogenic PRNP variants in >500,000 23andMe research participants.

To protect the privacy of 23andMe research participants, allele count (AC) values between 1 and 5 inclusive are displayed as "1-5“ and are rounded up to 5 for the purposes of plotting. These alleles were seen almost exclusively in a heterozygous state, with fewer than 5 homozygous individuals total across all 16 variants.

Variant dbSNP id 23andMe id Called genotypes AC Comments
P102L rs74315401 i5004359 502075 1-5 total
A117V rs74315402 i5004358 501820
D178N rs74315403 i5004357 502450
E200K rs28933385 rs28933385 531370
M232R rs74315409 i5004352 502475 78 AC=29 in 2,685 individuals with >90% Japanese ancestry
V180I rs74315408 i5004353 502125 15 AC=1-5 in 2,670 individuals with >90% Japanese ancestry
V210I rs74315407 i5004354 502290 13 AC=8 in 385,030 Europeans
R208C rs55826236 rs55826236 501850 8
R208H rs74315412 i5004349 501775 22 AC=19 in 384,645 Europeans
P105L rs11538758 rs11538758 531575 1-5 total
G131V rs74315410 i5004351 499455
A133V rs74315415 i5004347 502520
T183A rs74315411 i5004350 502295
F198V rs55871421 rs55871421 501540
F198S rs74315405 i5004356 502460
G217R rs74315406 i5004355 502385

Table S6. Phenotypes investigated in studies in which ExAC individuals with reportedly pathogenic PRNP variants were ascertained.

Note that we do not have access to phenotypic data to indicate whether a particular individual was ascertained as a case or a control. Therefore "cardiovascular" simply means an individual was ascertained in a cardiovascular disease cohort, not necessarily that the individual has cardiovascular disease. “Mixed” cohorts include controls, cardiovascular and pulmonary phenotypes.

Cohort phenotype Total in ExAC With reportedly pathogenic PRNP variants
Autoimmune 1675 4
Cancer 7601 3
Cardiovascular 14622 14
Metabolic 15327 19
Mixed 3936 2
Population controls 2215 6
Psychiatric 15330 4
TOTAL 60706 52

Table S7. Inferred ancestry and codon 129 genotypes of ExAC individuals with reportedly pathogenic variants.

Three-letter HapMap ancestry codes are defined in Table S8.

variant pops codon129
P39L 1 PJL, 2 TSI 2 M/M, 1 M/V
G131V 1 TSI 1 M/V
R148H 1 CEU, 1 IBS, 1 PJL 3 M/M
V180I 1 CHB, 2 JPT, 3 PJL 4 M/M, 1 M/V, 1 V/V
T188R 1 CLM, 2 MXL 1 M/V, 2 V/V
E196A 3 CHB, 6 CHS 9 M/M
D202N 1 TSI 1 M/V
V203I 1 IBS, 2 TSI 1 M/M, 2 M/V
R208C 1 ACB 1 M/M
R208H 1 ACB, 2 ASW, 1 CLM, 2 IBS, 1 MSL, 2 TSI 4 M/M, 5 M/V
V210I 2 TSI 2 M/M
Q212P 1 CEU 1 M/V
M232R 5 CHB, 5 JPT 10 M/M

Table S8. Inferred ancestry of all ExAC individuals.

Methods for ancestry assignment are described in Materials and Methods.

Population code Description Super population code N in ExAC
ACB African Caribbeans in Barbados AFR 2267
ASW Americans of African Ancestry in SW USA AFR 2151
BEB Bengali from Bangladesh SAS 483
CDX Chinese Dai in Xishuangbanna, China EAS 19
CEU Utah Residents (CEPH) with Northern and Western European ancestry EUR 14185
CHB Han Chinese in Beijing, China EAS 1553
CHS Southern Han Chinese EAS 1733
CLM Colombians from Medellin, Colombia AMR 870
ESN Esan in Nigeria AFR 89
FIN Finnish in Finland EUR 3977
GBR British in England and Scotland EUR 10358
GIH Gujarati Indian from Houston, Texas SAS 79
GWD Gambian in Western Divisions in The Gambia AFR 102
IBS Iberian population in Spain EUR 3534
ITU Indian Telugu from the UK SAS 1089
JPT Japanese in Tokyo, Japan EAS 663
KHV Kinh in Ho Chi Minh City, Vietnam EAS 369
LWK Luhya in Webuye, Kenya AFR 72
MSL Mende in Sierra Leone AFR 189
MXL Mexican Ancestry from Los Angeles USA AMR 2658
PEL Peruvians from Lima, Peru AMR 1900
PJL Punjabi from Lahore, Pakistan SAS 6300
PUR Puerto Ricans from Puerto Rico AMR 579
STU Sri Lankan Tamil from the UK SAS 460
TSI Toscani in Italia EUR 4795
YRI Yoruba in Ibadan, Nigeria AFR 232

Table S9. Inferred ancestry of 23andMe research participants

Ancestry Minimum called genotypes Maximum called genotypes Total reportedly pathogenic AC
European 382865 408475 35
Latino 42425 44480 10
African 22945 23795 10
East Asian 20255 21710 75
All others 30975 33125 20
TOTAL 499455 531575 140

Table S10. Details of Japanese prion disease cases**

  • Age at onset is expressed as the mean ± SD (range) years.
  • Duration between the onset and akinetic mutism or death without akinetic mutism. Duration is expressed as the mean ± SD (range) months.
  • Terms:
    • EE = glutamic acid homozygosity
    • EK = glutamic acid/lysine heterozygosity
    • KK = lysine homozygosity
    • MM = methionine homozygosity
    • MV = methionine/valine heterozygosity
    • PSWCs = periodic synchronous wave complexes
Variant N Male/Female Age at onset* (range) Positive family history (%)
Insertion 8 4/4 51.0 ± 12.0 (26-68) 5 (63)
P102L 83 38/45 55.5 ± 10.3 (22-75) 69 (83)
P105L 12 7/5 46.9 ± 8.4 (31-61) 11 (92)
D178N-129M 4 3/1 54.5 ± 5.5 (46-61) None
D178N-129V 1 1/0 74 None
V180I 218 84/134 77.4 ± 6.8 (44-93) 5 (2)
E200K 63 30/33 61.1 ± 9.9 (31-83) 28 (44)
V203I 3 2/1 73 None
R208H 1 0/1 74 None
V210I 1 0/1 55 None
M232R 63 32/31 64.4 ± 10.9 (15-82) 2 (3)
V180I+M232R 4 2/2 71.3 ± 3.6 (65-74) None
Variant Duration** (range) Codon 129 Codon 219
Insertion 27.8 ± 17.7 (3-57) MM 6; MV 1 EE 6; KK 1
P102L 48.4 ± 35.8 (2-186) MM 67; MV 6 EE 70; EK 2
P105L 90.2 ± 40.4 (25-184) MV 11 EE 7
D178N-129M 8.5 ± 4.4 (2-13) MM 4 EE 4
D178N-129V 24 MV 1 EE 1
V180I 16.4 ± 14.5 (0-70) MM 162; MV 54 EE 210
E200K 5.0 ± 6.0 (1-32) MM 58; MV 3 EE 58; EK 3
V203I 3.7 ± 2.1 (1-6) MM 3 EE 3
R208H 3 MM 1 EE 1
V210I 3 MM 1 EE 1
M232R 8.6 ± 12.7 (0-78) MM 60; MV 2 EE 61; EK 1
V180I+M232R 21.8 ± 17.7 (1-47) MM 4 EE 4
Variant PSWCs on EEG (%) Hyperintensities on MRI (%) Positive 14-3-3 protein (%)
Insertion 3/8 (38) 2/7 (29) 0/1 (0)
P102L 11/72 (15) 32/76 (42) 13/34 (38)
P105L 1/10 (10) 1/11 (9) 1/2 (50)
D178N-129M 0/4 (0) 1/4 (25) 1/2 (50)
D178N-129V 0/1 (0) 0/1 (0) 1/1 (100)
V180I 19/203 (9) 212/213 (99) 110/140 (79)
E200K 56/63 (89) 56/59 (95) 29/31 (94)
V203I 3/3 (100) 2/2 (100) 1/1 (100)
R208H 1/1 (100) 1/1 (100) 1/1 (100)
V210I 1/1 (100) 1/1 (100) not done
M232R 46/61 (75) 55/60 (92) 31/43 (72)
V180I+M232R 0/4 (0) 4/4 (100) 0/1 (0)

Table S11. Phenotypes of individuals with N-terminal PrP truncating variants

HGVS Variant Zygosity Sex Age Available phenotype information
c.59_60insC G20Gfs84X Het F 79 Ascertained as part of the Rotterdam Study [Hofman 2015], a prospective cohort study of middle-aged and elderly persons. In good health and free of dementia as of at least age 78, at last in-person examination completion. Has 5 siblings and 2 children. Only family history noted is that one sibling has had a stroke before age 65.
c.109C>T R37X Het M 73 Ascertained as a control for the Swedish schizophrenia study. Underwent heart bypass surgery in 2008, has a family history of heart problems. 4 siblings. Reports no family history of neurodegeneration or neuropathy.
c.223C>T Q75X Het M 52 Ascertained in a study of type 2 diabetes. Has mild type 2 diabetes treated with metformin. Has children.
c.391G>T G131X Het F None available.

Figure S1. Age of ExAC individuals with reportedly pathogenic PRNP variants versus all individuals in ExAC.

The distribution of ages, available for 40 of 52 individuals with reportedly pathogenic PRNP variants, did not differ from the distribution overall (p = .69, Wilcoxon rank-sum test; p = .69, student's t test) nor after controlling for cohort (p = .15, linear regression).

Figure S1

Figure S2. Sanger sequencing results for individuals with N-terminal truncating variants

Figure S2A. G20Gfs84X reverse (top) and forward (bottom). Primers: 2a-forward: AACTTAGGGTCACATTTGTCCTTGG; 2a-reverse: GGTAACGGTGCATGTTTTCACG. 2b forward: GTGGTGGCTGGGGTCAAGG; 2b reverse: TTTCCAGTGCCCATCAGTGC.

Figure S2A

Figure S2B. R37X - DNA from whole blood (top) and fibroblasts (bottom). Primers: PrP2-F: TGGGACTCTGACGTTCTCCT; PrP2-R: GGTGAAGTTCTCCCCCTTGG

Figure S2B

Figure S2C. Q75X. Primers: PRNP_EX2-M13-F [TGTAAAACGACGGCCAGT] CCATTGCTATGCACTCATTCA; PRNP_EX2-M13-R [CAGGAAACAGCTATGACC] CCATGTGCTTCATGTTGGTT

Figure S2C