Permalink
Cannot retrieve contributors at this time
Fetching contributors…
| <?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.1d1 20130915//EN" "JATS-archivearticle1.dtd"><article article-type="research-article" dtd-version="1.1d1" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><front><journal-meta><journal-id journal-id-type="nlm-ta">elife</journal-id><journal-id journal-id-type="hwp">eLife</journal-id><journal-id journal-id-type="publisher-id">eLife</journal-id><journal-title-group><journal-title>eLife</journal-title></journal-title-group><issn publication-format="electronic">2050-084X</issn><publisher><publisher-name>eLife Sciences Publications, Ltd</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="publisher-id">03300</article-id><article-id pub-id-type="doi">10.7554/eLife.03300</article-id><article-categories><subj-group subj-group-type="display-channel"><subject>Research article</subject></subj-group><subj-group subj-group-type="heading"><subject>Genomics and evolutionary biology</subject></subj-group><subj-group subj-group-type="heading"><subject>Microbiology and infectious disease</subject></subj-group></article-categories><title-group><article-title>The inherent mutational tolerance and antigenic evolvability of influenza hemagglutinin</article-title></title-group><contrib-group><contrib contrib-type="author" id="author-13919"><name><surname>Thyagarajan</surname><given-names>Bargavi</given-names></name><contrib-id contrib-id-type="orcid">http://orcid.org/0000-0003-3871-6410</contrib-id><xref ref-type="aff" rid="aff1"/><xref ref-type="aff" rid="aff2"/><xref ref-type="fn" rid="con1"/><xref ref-type="fn" rid="conf1"/><xref ref-type="other" rid="dataro1"/></contrib><contrib contrib-type="author" corresp="yes" id="author-4031"><name><surname>Bloom</surname><given-names>Jesse D</given-names></name><contrib-id contrib-id-type="orcid">http://orcid.org/0000-0003-1267-3408</contrib-id><xref ref-type="aff" rid="aff1"/><xref ref-type="aff" rid="aff2"/><xref ref-type="corresp" rid="cor1">*</xref><xref ref-type="other" rid="par-1"/><xref ref-type="other" rid="par-2"/><xref ref-type="fn" rid="con2"/><xref ref-type="fn" rid="conf1"/><xref ref-type="other" rid="dataro1"/></contrib><aff id="aff1"><institution content-type="dept">Division of Basic Sciences</institution>, <institution>Fred Hutchinson Cancer Research Center</institution>, <addr-line><named-content content-type="city">Seattle</named-content></addr-line>, <country>United States</country></aff><aff id="aff2"><institution content-type="dept">Computational Biology Program</institution>, <institution>Fred Hutchinson Cancer Research Center</institution>, <addr-line><named-content content-type="city">Seattle</named-content></addr-line>, <country>United States</country></aff></contrib-group><contrib-group content-type="section"><contrib contrib-type="editor"><name><surname>Pascual</surname><given-names>Mercedes</given-names></name><role>Reviewing editor</role><aff><institution>University of Michigan</institution>, <country>United States</country></aff></contrib></contrib-group><author-notes><corresp id="cor1"><label>*</label>For correspondence: <email>jbloom@fhcrc.org</email></corresp></author-notes><pub-date date-type="pub" publication-format="electronic"><day>08</day><month>07</month><year>2014</year></pub-date><pub-date pub-type="collection"><year>2014</year></pub-date><volume>3</volume><elocation-id>e03300</elocation-id><history><date date-type="received"><day>07</day><month>05</month><year>2014</year></date><date date-type="accepted"><day>03</day><month>07</month><year>2014</year></date></history><permissions><copyright-statement>© 2014, Thyagarajan and Bloom</copyright-statement><copyright-year>2014</copyright-year><copyright-holder>Thyagarajan and Bloom</copyright-holder><license xlink:href="http://creativecommons.org/licenses/by/4.0/"><license-p>This article is distributed under the terms of the <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution License</ext-link>, which permits unrestricted use and redistribution provided that the original author and source are credited.</license-p></license></permissions><self-uri content-type="pdf" xlink:href="elife03300.pdf"/><abstract><object-id pub-id-type="doi">10.7554/eLife.03300.001</object-id><p>Influenza is notable for its evolutionary capacity to escape immunity targeting the viral hemagglutinin. We used deep mutational scanning to examine the extent to which a high inherent mutational tolerance contributes to this antigenic evolvability. We created mutant viruses that incorporate most of the ≈10<sup>4</sup> amino-acid mutations to hemagglutinin from A/WSN/1933 (H1N1) influenza. After passaging these viruses in tissue culture to select for functional variants, we used deep sequencing to quantify mutation frequencies before and after selection. These data enable us to infer the preference for each amino acid at each site in hemagglutinin. These inferences are consistent with existing knowledge about the protein's structure and function, and can be used to create a model that describes hemagglutinin's evolution far better than existing phylogenetic models. We show that hemagglutinin has a high inherent tolerance for mutations at antigenic sites, suggesting that this is one factor contributing to influenza's antigenic evolution.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03300.001">http://dx.doi.org/10.7554/eLife.03300.001</ext-link></p></abstract><abstract abstract-type="executive-summary"><object-id pub-id-type="doi">10.7554/eLife.03300.002</object-id><title>eLife digest</title><p>Influenza is a major threat to human health largely because the flu virus evolves rapidly to escape recognition by the immune system. These ongoing changes also explain why flu vaccines become less effective over time and need to be reformulated every year.</p><p>Hemagglutinin is a protein on the surface of the flu virus that helps the virus bind to and infect host cells. The surface proteins of most viruses are recognized by the immune system, and influenza hemagglutinin is no exception. However, hemagglutinin is unusual in that it evolves exceptionally rapidly to avoid being recognized by the immune system. This raises an important question: what is it about the influenza hemagglutinin protein that allows it to change so readily?</p><p>Thyagarajan and Bloom address this question by making mutant copies of the gene that encodes the hemagglutinin protein. There are over 10,000 ways in which the protein can be mutated, and Thyagarajan and Bloom managed to make the vast majority of the possible changes. The mutated genes were then re-introduced into the virus, and the mutant viruses were allowed to replicate in cells for several generations.</p><p>Thyagarajan and Bloom sequenced the viruses that had replicated—which meant that the mutant copies of the hemagglutinin protein in these viruses still worked—and looked to see where in the protein the changes had occurred. Those regions that rarely changed included the part of the protein that binds to host cells, whereas other regions—especially those that are recognized by the immune system—were much more likely to contain mutations. Thyagarajan and Bloom then went on to show that not all influenza proteins share hemaglutinin's capacity to change the regions targeted by the immune system, suggesting that this capacity is possibly a unique feature of this protein.</p><p>Thyagarajan and Bloom also suggest that this capacity to tolerate mutations in parts of proteins that are recognized by the immune system might be important for shaping a virus's ability to evolve to escape this recognition. Future work is now needed to see how tolerant to mutations other viral proteins are, and to reveal which properties of a protein determine its tolerance to mutations.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03300.002">http://dx.doi.org/10.7554/eLife.03300.002</ext-link></p></abstract><kwd-group kwd-group-type="author-keywords"><title>Author keywords</title><kwd>influenza</kwd><kwd>hemagglutinin</kwd><kwd>phylogenetics</kwd><kwd>evolvability</kwd><kwd>antigenic evolution</kwd><kwd>deep mutational scanning</kwd></kwd-group><kwd-group kwd-group-type="research-organism"><title>Research organism</title><kwd>viruses</kwd></kwd-group><funding-group><award-group id="par-1"><funding-source><institution-wrap><institution-id institution-id-type="FundRef">http://dx.doi.org/10.13039/100000002</institution-id><institution>National Institutes of Health</institution></institution-wrap></funding-source><award-id>R01GM102198</award-id><principal-award-recipient><name><surname>Bloom</surname><given-names>Jesse D</given-names></name></principal-award-recipient></award-group><award-group id="par-2"><funding-source><institution-wrap><institution-id institution-id-type="FundRef">http://dx.doi.org/10.13039/100005665</institution-id><institution>Kinship Foundation</institution></institution-wrap></funding-source><award-id>Searle Scholarship</award-id><principal-award-recipient><name><surname>Bloom</surname><given-names>Jesse D</given-names></name></principal-award-recipient></award-group><funding-statement>The funder had no role in study design, data collection and interpretation, or the decision to submit the work for publication.</funding-statement></funding-group><custom-meta-group><custom-meta><meta-name>elife-xml-version</meta-name><meta-value>2</meta-value></custom-meta><custom-meta specific-use="meta-only"><meta-name>Author impact statement</meta-name><meta-value>Deep mutational scanning was used to comprehensively quantify the effects of mutations to influenza hemagglutinin and shows that the virus possesses a high inherent mutational tolerance at key antigenic sites.</meta-value></custom-meta></custom-meta-group></article-meta></front><body><sec id="s1" sec-type="intro"><title>Introduction</title><p>Epidemic influenza poses an annual threat to human health largely because the virus rapidly evolves to escape the immunity elicited by previous infections or vaccinations. The most potent form of anti-influenza immunity is antibodies targeting the virus’s hemagglutinin (HA) protein (<xref ref-type="bibr" rid="bib60">Yewdell et al., 1979</xref>; <xref ref-type="bibr" rid="bib56">Wiley et al., 1981</xref>; <xref ref-type="bibr" rid="bib12">Caton et al., 1982</xref>). The virus evades these antibodies primarily by accumulating amino-acid substitutions in HA's antigenic sites (<xref ref-type="bibr" rid="bib50">Smith et al., 2004</xref>; <xref ref-type="bibr" rid="bib13">Das et al., 2013</xref>; <xref ref-type="bibr" rid="bib31">Koel et al., 2013</xref>; <xref ref-type="bibr" rid="bib4">Bedford et al., 2014</xref>). Remarkably, HA undergoes this rapid evolution while retaining the ability to fold to a highly conserved structure that performs two functions essential for viral replication: receptor binding and membrane fusion (<xref ref-type="bibr" rid="bib57">Wiley and Skehel, 1987</xref>; <xref ref-type="bibr" rid="bib48">Russell et al., 2004</xref>). HA is therefore highly ‘antigenically evolvable’ in the sense that it can accommodate rapid antigenic change without compromising its structural and functional properties.</p><p>Two factors that undoubtedly contribute to HA's rapid antigenic evolution are influenza's high mutation rate and the strong selection that immunity exerts on the virus. However, it is unclear whether these factors are sufficient to fully explain HA's antigenic evolution. For instance, while some other error-prone viruses (such as HIV and hepatitis C) also exhibit rapid antigenic evolution of their surface proteins (<xref ref-type="bibr" rid="bib10">Burton et al., 2012</xref>), other viruses with comparable mutation rates (such as measles) show little propensity for antigenic change (<xref ref-type="bibr" rid="bib49">Sheshberadaran et al., 1983</xref>; <xref ref-type="bibr" rid="bib15">Duffy et al., 2008</xref>), despite the fact that evasion of immunity would presumably confer a selective benefit. A variety of explanations ranging in scale from ecological to molecular can be posited to account for these differences in rates of antigenic evolution (<xref ref-type="bibr" rid="bib34">Lipsitch and O’Hagan, 2007</xref>; <xref ref-type="bibr" rid="bib32">Koelle et al., 2006</xref>; <xref ref-type="bibr" rid="bib27">Heaton et al., 2013</xref>). One hypothesis is that HA has a high inherent tolerance for mutations in its antigenic sites, thereby conferring on influenza the evolutionary capacity to escape from anti-HA antibodies with relative ease.</p><p>Testing this hypothesis requires quantifying the inherent mutational tolerance of each site in HA. This cannot be done simply by examining variability among naturally occurring viruses, since the evolution of influenza in nature is shaped by a combination of inherent mutational tolerance and external immune selection. For example, the rapid evolution of HA's antigenic sites in nature could reflect the fact that these sites are especially tolerant of mutations, or it could be purely a consequence of strong immune selection. Traditional experimental approaches using site-directed mutagenesis or serial viral passage are also inadequate to quantify inherent mutational tolerance—while such experimental techniques have been used to determine the effect of specific mutations on HA, they cannot feasibly be applied to all possible individual amino-acid mutations. Recently <xref ref-type="bibr" rid="bib27">Heaton et al. (2013)</xref> used transposon mutagenesis to show that HA is tolerant to the random insertion of five to six amino-acid sequences at several locations in the protein. However, the relevance of this tolerance to insertional mutations is unclear, since HA's actual antigenic evolution involves almost entirely point substitutions, with only a very low rate of insertions and deletions.</p><p>Here we use the new high-throughput experimental technique of deep mutational scanning (<xref ref-type="bibr" rid="bib19">Fowler et al., 2010</xref>; <xref ref-type="bibr" rid="bib2">Araya and Fowler, 2011</xref>) to comprehensively quantify the tolerance of HA to amino-acid mutations. Specifically, we create mutant libraries of the HA gene from the H1N1 strain A/WSN/1933 (WSN) that contain virtually all of the ≈4 × 10<sup>4</sup> possible individual codon mutations, and therefore virtually all of the ≈10<sup>4</sup> possible amino-acid mutations. We use these mutant libraries to generate pools of mutant influenza viruses, which we estimate incorporate at least 85% of the possible HA codon mutations and 97% of the possible amino-acid mutations. We then passage these viruses to select for functional variants, and use Illumina deep sequencing to determine the frequency of each HA mutation before and after this selection for viral growth. Since these experiments measure the impact of mutations in the absence of immune selection, they enable us to quantify HA's inherent preference for each amino acid at each site in the protein. We show that these quantitative measurements are consistent with existing knowledge about HA structure and function, and can be used to create an evolutionary model that describes HA's natural evolution far better than existing models of sequence evolution. Finally, we use our results to show that HA's antigenic sites are disproportionately tolerant of mutations, suggesting that a high inherent tolerance for mutations at key positions targeted by the immune system is one factor that contributes to influenza's antigenic evolvability.</p></sec><sec id="s2" sec-type="results"><title>Results</title><sec id="s2-1"><title>Strategy for deep mutational scanning of HA</title><p>Our strategy for deep mutational scanning (<xref ref-type="bibr" rid="bib19">Fowler et al., 2010</xref>; <xref ref-type="bibr" rid="bib2">Araya and Fowler, 2011</xref>) of HA is outlined in <xref ref-type="fig" rid="fig1">Figure 1</xref>. The wildtype WSN HA gene was mutagenized to create a diverse library of mutant HA genes. This library of mutant genes was then used to generate a pool of mutant viruses by reverse genetics (<xref ref-type="bibr" rid="bib28">Hoffmann et al., 2000</xref>). The mutant viruses were passaged at a low multiplicity of infection to ensure a linkage between genotype and phenotype. The frequencies of mutations before and after selection for viral growth were quantified by Illumina deep sequencing of the mutant genes (the <bold>mutDNA</bold> sample in <xref ref-type="fig" rid="fig1">Figure 1</xref>) and the mutant viruses (the <bold>mutvirus</bold> sample in <xref ref-type="fig" rid="fig1">Figure 1</xref>). An identical process was performed in parallel using the unmutated wildtype HA gene to generate unmutated viruses in order to quantify the error rates associated with sequencing, reverse transcription, and virus growth (these are the <bold>DNA</bold> and <bold>virus</bold> samples in <xref ref-type="fig" rid="fig1">Figure 1</xref>). The entire process in <xref ref-type="fig" rid="fig1">Figure 1</xref> was performed in full biological triplicate (the replicates are referred to as #1, #2, and #3). In addition, a repeat of the Illumina sample preparation and deep sequencing was performed for replicate #1 to quantify the technical variation associated with these processes.<fig id="fig1" position="float"><object-id pub-id-type="doi">10.7554/eLife.03300.003</object-id><label>Figure 1.</label><caption><title>Schematic of the deep mutational scanning experiment.</title><p>The Illumina deep-sequencing samples are shown in yellow boxes (<bold>DNA</bold>, <bold>mutDNA</bold>, <bold>virus</bold>, <bold>mutvirus</bold>). Experimental steps and associated sources of mutations are shown in blue text, while sources of error during Illumina sample preparation and sequencing are shown in red text. This entire process was performed in biological triplicate.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03300.003">http://dx.doi.org/10.7554/eLife.03300.003</ext-link></p></caption><graphic xlink:href="elife03300f001"/></fig></p></sec><sec id="s2-2"><title>Creation of HA codon-mutant libraries</title><p>The deep mutational scanning strategy in <xref ref-type="fig" rid="fig1">Figure 1</xref> requires creating mutant libraries of the HA gene. We wanted to assess the impact of all possible amino-acid mutations. Most mutagenesis techniques operate at the nucleotide level, and so frequently introduce single-nucleotide codon changes (e.g., GGA → cGA) but only very rarely introduce multi-nucleotide codon changes (e.g., GGA → cat). However, several PCR-based techniques have recently been developed to introduce random codon mutations into full-length genes (<xref ref-type="bibr" rid="bib18">Firnberg and Ostermeier, 2012</xref>; <xref ref-type="bibr" rid="bib9">Bloom, 2014</xref>; <xref ref-type="bibr" rid="bib29">Jain and Varadarajan, 2014</xref>). We used one of these techniques (<xref ref-type="bibr" rid="bib9">Bloom, 2014</xref>) to create three replicate codon-mutant libraries of the WSN HA gene (<xref ref-type="supplementary-material" rid="SD1-data">Supplementary file 1</xref>).</p><p>Sanger sequencing of 34 individual clones indicated that the libraries contained an average of slightly over two codon mutations per gene, with a very low rate of insertions and deletions (less than 0.1 per gene). The number of mutations per clone was distributed around this average in an approximately Poisson fashion (<xref ref-type="fig" rid="fig2">Figure 2</xref>). The mutations consisted of a mix of one-, two-, and three-nucleotide codon changes, and were roughly uniform in their nucleotide composition and location in the gene (<xref ref-type="fig" rid="fig2">Figure 2</xref>).<fig id="fig2" position="float"><object-id pub-id-type="doi">10.7554/eLife.03300.004</object-id><label>Figure 2.</label><caption><title>Properties of the HA codon-mutant library as assessed by Sanger sequencing of 34 individual clones drawn roughly evenly from the three experimental replicates.</title><p>(<bold>A</bold>) There are an average of 2.1 codon mutations per clone, with the number per clone following a roughly Poisson distribution. (<bold>B</bold>) The codon mutations involve a mix of one-, two-, and three-nucleotide mutations. (<bold>C</bold>) The nucleotide composition of the mutant codons is roughly uniform. (<bold>D</bold>) The mutations are distributed uniformly along HA's primary sequence. (<bold>E</bold>) There is no tendency for mutations to cluster in primary sequence. Shown is distribution of observed pairwise distances between mutations in multiply mutated clones vs the expected distribution when the mutations are placed independently in the clones. All plots show results only for substitution mutations; insertion/deletion mutations are not shown. However, only two insertion/deletion mutations (0.06 per clone) were identified. The data and computer code used to generate this figure are at <ext-link ext-link-type="uri" xlink:href="https://github.com/jbloom/SangerMutantLibraryAnalysis/tree/v0.2">https://github.com/jbloom/SangerMutantLibraryAnalysis/tree/v0.2</ext-link>.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03300.004">http://dx.doi.org/10.7554/eLife.03300.004</ext-link></p></caption><graphic xlink:href="elife03300f002"/></fig></p><p>The genes in each mutant library were cloned at high efficiency into a bidirectional influenza reverse-genetics plasmid (<xref ref-type="bibr" rid="bib28">Hoffmann et al., 2000</xref>). Each of the library replicates contained at least six-million unique clones—a diversity that far exceeds the 10<sup>4</sup> unique single amino-acid mutations and the ≈4 × 10<sup>4</sup> unique single codon mutations to the HA gene. The vast majority of possible codon and amino-acid mutations are therefore represented many times in each plasmid mutant library, both individually and in combination with other mutations.</p></sec><sec id="s2-3"><title>Generation of mutant viruses by reverse genetics</title><p>The HA plasmid mutant libraries were used to generate pools of mutant influenza viruses by reverse genetics (<xref ref-type="bibr" rid="bib28">Hoffmann et al., 2000</xref>). Briefly, this process involves transfecting cells with the HA plasmid mutant library along with plasmids encoding the other seven genes from the WSN strain of influenza. Although the cells were transfected with a very large diversity of HA plasmids, we were uncertain what fraction of the genes encoded on these plasmids would actually be productively packaged into a virus. In an attempt to maximize the diversity in the viral pools, the mutant viruses were generated by transfecting several dozen wells of cells. The logic behind this scheme was to maintain substantial diversity even if only a subset of viral mutants stochastically predominated in each individual well of cells. A different replicate virus pool was generated for each of the three HA plasmid mutant libraries.</p><p>The mutant viruses generated for each replicate were passaged at a relatively low multiplicity of infection (MOI) of 0.1 to reduce the probability of co-infection, thereby creating a link between viral genotype and phenotype. This genotype-phenotype link is essential to ensure that the sequenced HA gene matches the protein on the surface of the virus. Our previous work with NP (<xref ref-type="bibr" rid="bib9">Bloom, 2014</xref>) indicates that one low MOI passage is sufficient to create a strong genotype-phenotype link, since that previous work found that the results obtained after one viral passage were extremely similar to the results obtained after two viral passages. In order to maintain a diversity of over two-million infectious viral particles, we performed the passaging in a total of 2.4 × 10<sup>7</sup> cells.</p></sec><sec id="s2-4"><title>Deep sequencing reveals purifying selection against many mutations</title><p>We used Illumina sequencing to quantify the frequencies of mutations before and after selection for viral growth. For each replicate, we sequenced HA from the unmutated plasmid, the plasmid mutant library, virus produced from the unmutated plasmid, and mutant virus produced from the plasmid mutant library—these are the <bold>DNA</bold>, <bold>mutDNA</bold>, <bold>virus</bold>, and <bold>mutvirus</bold> samples in <xref ref-type="fig" rid="fig1">Figure 1</xref>. For the <bold>DNA</bold> and <bold>mutDNA</bold> samples, the HA gene was amplified directly from the plasmids by PCR. For the <bold>virus</bold> and <bold>mutvirus</bold> samples, the HA gene was first reverse-transcribed from viral RNA and was then amplified by PCR. In all cases, template quantification was performed prior to PCR to ensure that >10<sup>6</sup> initial HA molecules were used as templates for subsequent amplification.</p><p>In order to reduce the sequencing error rate, the HA molecules were fragmented to roughly 50 nucleotide fragments using Illumina's transposon-based Nextera kit, and then sequenced with overlapping paired-end reads (<xref ref-type="fig" rid="fig3s1">Figure 3—figure supplement 1</xref>). We only called codon identities for which both paired reads concur—this strategy substantially increases the sequencing fidelity, since it is rare for the same sequencing error to occur in both reads. For each sample, we obtained in excess of 10<sup>7</sup> overlapping paired-end reads that could be aligned to HA (<xref ref-type="fig" rid="fig3s2">Figure 3—figure supplement 2</xref>). As shown in <xref ref-type="fig" rid="fig3s3">Figure 3—figure supplement 3</xref>, the read depth varied somewhat along the primary sequence, presumably due to known weak biases in the insertion sites for the Nextera transposon (<xref ref-type="bibr" rid="bib1">Adey et al., 2010</xref>). However, these biases were fairly mild, and so we obtained well over 2 × 10<sup>5</sup> unique paired reads for nearly all HA codons.</p><p><xref ref-type="fig" rid="fig3">Figure 3</xref> shows the frequency of mutations in each sample as quantified by deep sequencing. The <bold>DNA</bold> samples derived from unmutated HA plasmid show a low frequency of apparent mutations which are almost exclusively composed of single-nucleotide codon changes—the frequency of these apparent mutations reflects the rate of errors from the PCR amplification and subsequent deep sequencing. The <bold>virus</bold> samples created from the unmutated plasmid show only a slightly higher frequency of mutations, indicating that reverse-transcription and viral replication introduce only a small number of additional mutations. As expected, the <bold>mutDNA</bold> samples derived from the plasmid mutant libraries show a high rate of one-, two-, and three-nucleotide mutations, as all three types of mutations were introduced during the codon mutagenesis. The <bold>mutvirus</bold> samples derived from the mutant virus pools exhibit a mutation rate that is substantially lower than that of the <bold>mutDNA</bold> samples. Most of the reduction in mutation frequency in the <bold>mutvirus</bold> samples is due to decreased frequencies of nonsynonymous and stop-codon mutations; synonymous mutations are only slightly depressed in frequency. As stop-codon and nonsynonymous mutations are much more likely than synonymous mutations to substantially impair viral fitness, these results are consistent with purifying selection purging deleterious mutations during viral growth.<fig-group><fig id="fig3" position="float"><object-id pub-id-type="doi">10.7554/eLife.03300.005</object-id><label>Figure 3.</label><caption><title>The per-codon frequencies of mutations in the samples.</title><p>The samples are named as in <xref ref-type="fig" rid="fig1">Figure 1</xref>, with the experimental replicate indicated with the numeric label. The <bold>DNA</bold> samples have a low frequency of mutations, and these mutations are composed almost entirely of single-nucleotide codon changes—these samples quantify the baseline error rate from PCR and deep sequencing. The mutation frequency is only slightly elevated in <bold>virus</bold> samples, indicating that viral replication and reverse transcription introduce only a small number of additional mutations. The <bold>mutDNA</bold> samples have a high frequency of single- and multi-nucleotide codon mutations, as expected from the codon mutagenesis procedure. The <bold>mutvirus</bold> samples have a lower mutation frequency, with most of the reduction due to fewer stop-codon and nonsynonymous mutations—consistent with purifying selection purging deleterious mutations. The data and code used to create this plot is available via <ext-link ext-link-type="uri" xlink:href="http://jbloom.github.io/mapmuts/example_WSN_HA_2014Analysis.html">http://jbloom.github.io/mapmuts/example_WSN_HA_2014Analysis.html</ext-link>; this plot is the file <italic>parsesummary_codon_types_and_nmuts.pdf</italic> described therein. The sequencing accuracy was increased by using overlapping paired-end reads as illustrated in <xref ref-type="fig" rid="fig3s1">Figure 3—figure supplement 1</xref>. The overall number of overlapping paired-end reads for each sample is shown in <xref ref-type="fig" rid="fig3s2">Figure 3—figure supplement 2</xref>. A representative plot of the read depth across the primary sequence is shown in <xref ref-type="fig" rid="fig3s3">Figure 3—figure supplement 3</xref>.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03300.005">http://dx.doi.org/10.7554/eLife.03300.005</ext-link></p></caption><graphic xlink:href="elife03300f003"/></fig><fig id="fig3s1" position="float" specific-use="child-fig"><object-id pub-id-type="doi">10.7554/eLife.03300.006</object-id><label>Figure 3—figure supplement 1.</label><caption><title>The overlapping paired-end Illumina sequencing strategy.</title><p>(<bold>A</bold>) Sequencing accuracy was increased by fragmenting the HA gene to pieces roughly 50 nucleotides in length, and then using overlapping paired-end 50 nucleotide Illumina sequencing reads. Codon identities were only called if the reads overlapped and concurred on the codon identity. (<bold>B</bold>) The distribution of actual HA fragment lengths for a representative sample. The plot in (<bold>B</bold>) is the file <italic>replicate_3/DNA/replicate_3_DNA_insertlengths.pdf</italic> described at <monospace><ext-link ext-link-type="uri" xlink:href="http://jbloom.github.io/mapmuts/example_WSN_HA_2014Analysis.html">http://jbloom.github.io/mapmuts/example_WSN_HA_2014Analysis.html</ext-link>.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03300.006">http://dx.doi.org/10.7554/eLife.03300.006</ext-link></p></caption><graphic xlink:href="elife03300fs001"/></fig><fig id="fig3s2" position="float" specific-use="child-fig"><object-id pub-id-type="doi">10.7554/eLife.03300.007</object-id><label>Figure 3—figure supplement 2.</label><caption><title>The total number of reads for each sample.</title><p>For all samples, the majority of reads could be paired and aligned to the HA sequence. However, the exact fraction of reads that could be paired varied somewhat among samples due to variation in the efficiency with which the HA gene was fragmented to the target length of 50 nucleotides. This plot is the file <italic>alignmentsummaryplot.pdf</italic> described at <ext-link ext-link-type="uri" xlink:href="http://jbloom.github.io/mapmuts/example_WSN_HA_2014Analysis.html">http://jbloom.github.io/mapmuts/example_WSN_HA_2014Analysis.html</ext-link>.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03300.007">http://dx.doi.org/10.7554/eLife.03300.007</ext-link></p></caption><graphic xlink:href="elife03300fs002"/></fig><fig id="fig3s3" position="float" specific-use="child-fig"><object-id pub-id-type="doi">10.7554/eLife.03300.008</object-id><label>Figure 3—figure supplement 3.</label><caption><title>The per-codon read depth as a function of primary sequence.</title><p>This plot is typical of the samples. The read depth varied fairly consistently as a function of primary sequence, presumably due to biases in the positions at which the HA gene tended to fragment. This plot is the file <italic>replicate_3/DNA/replicate_3_DNA_codondepth.pdf</italic> described at <ext-link ext-link-type="uri" xlink:href="http://jbloom.github.io/mapmuts/example_WSN_HA_2014Analysis.html">http://jbloom.github.io/mapmuts/example_WSN_HA_2014Analysis.html</ext-link>.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03300.008">http://dx.doi.org/10.7554/eLife.03300.008</ext-link></p></caption><graphic xlink:href="elife03300fs003"/></fig></fig-group></p><p>Inspection of <xref ref-type="fig" rid="fig3">Figure 3</xref> also demonstrates an important advantage of introducing the mutations at the codon rather than the nucleotide level. While there is a low but non-zero rate of errors (from sequencing, PCR, or reverse-transcription) that lead to single-nucleotide codon changes (as judged by the <bold>DNA</bold> and <bold>virus</bold> samples), errors that lead to multi-nucleotide codon changes are negligible because it is extremely rare for a single codon to experience two errors. We similarly expect that any <italic>do novo</italic> mutations or reversions that arise during viral growth should be limited to single-nucleotide changes given the short duration of viral passage in our experiments. The fact that our mutant libraries were constructed at the codon rather than the nucleotide level means that the vast majority (54 of 63) possible mutations to each codon involve multiple nucleotide changes, and so the sequencing results for these mutations can be analyzed essentially at face value, without having to worry about confounding errors. For the remaining (9 of 63) possible mutations that only involve a single-nucleotide codon change, we have attempted to statistically correct for the error rates estimated from our controls as described in the ‘Materials and methods’.</p></sec><sec id="s2-5"><title>Most mutations are sampled by the experiments</title><p>It is important to assess the completeness with which the experiments sampled all possible HA mutations. Several problems could limit mutational sampling: mutations might be absent from the plasmid mutant libraries due to biases in the codon mutagenesis, mutations that are present in the plasmid mutants might fail to be incorporated into viruses due to stochastic bottlenecks during virus generation by reverse genetics, or the sequencing read depth might be inadequate to sample the mutations that are present. The most straightforward way to assess these issues is to quantify the number of times that each possible multi-nucleotide codon mutation is observed in the <bold>mutDNA</bold> and <bold>mutvirus</bold> samples. Restricting the analysis to multi-nucleotide codon mutations avoids the confounding effects of sequencing and reverse-transcription errors, which cause almost exclusively single-nucleotide changes.</p><p><xref ref-type="fig" rid="fig4">Figure 4</xref> shows the number of times that each mutation was observed in the combined sequencing data for the three biological replicates; <xref ref-type="fig" rid="fig4s1">Figure 4—figure supplement 1</xref> shows the same data for the replicates individually. More than 99.5% of multi-nucleotide codon mutations are observed at least five times in the combined sequencing data from the plasmid mutant libraries (<bold>mutDNA</bold> samples), and ≈ 97.5% of all such mutations are observed at least five times in sequencing of the <bold>mutDNA</bold> for each individual replicate. These results indicate that the vast majority of codon mutations are represented in the plasmid mutant libraries.<fig-group><fig id="fig4" position="float"><object-id pub-id-type="doi">10.7554/eLife.03300.009</object-id><label>Figure 4.</label><caption><title>The number of times that each possible multi-nucleotide codon mutation was observed in each sample after combining the data for the three biological replicates.</title><p>Nearly all mutations were observed many times in the <bold>mutDNA</bold> samples, indicating that the codon mutagenesis was comprehensive. Only about half of the mutations were observed at least five times in the <bold>mutvirus</bold> samples, indicating either a bottleneck during virus generation or purifying selection against many of the mutations. If the analysis is restricted to synonymous multi-nucleotide codon mutations, then about 85% of mutations are observed at least five times in the <bold>mutvirus</bold> samples. Since synonymous mutations are less likely to be eliminated by purifying selection, this latter number provides a lower bound on the fraction of codon mutations that were sampled by the mutant viruses. The redundancy of the genetic code means that the fraction of amino-acid mutations sampled is higher. The data and code used to create this figure are available via <ext-link ext-link-type="uri" xlink:href="http://jbloom.github.io/mapmuts/example_WSN_HA_2014Analysis.html">http://jbloom.github.io/mapmuts/example_WSN_HA_2014Analysis.html</ext-link>; this plot is the file <italic>countparsedmuts_multi-nt-codonmutcounts.pdf</italic> described therein. Similar plots for the individual replicates are shown in <xref ref-type="fig" rid="fig4s1">Figure 4—figure supplement 1</xref>.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03300.009">http://dx.doi.org/10.7554/eLife.03300.009</ext-link></p></caption><graphic xlink:href="elife03300f004"/></fig><fig id="fig4s1" position="float" specific-use="child-fig"><object-id pub-id-type="doi">10.7554/eLife.03300.010</object-id><label>Figure 4—figure supplement 1.</label><caption><title>Plots like those in <xref ref-type="fig" rid="fig4">Figure 4</xref> for the individual biological replicates.</title><p>(<bold>A</bold>) replicate 1, (<bold>B</bold>) replicate 2, and (<bold>C</bold>) replicate 3. These plots are the files <italic>replicate_1/countparsedmuts_multi-nt-codonmutcounts.pdf</italic>, <italic>replicate_2/countparsedmuts_multi-nt-codonmutcounts.pdf</italic>, and <italic>replicate_3/countparsedmuts_multi-nt-codonmutcounts.pdf</italic> described at <ext-link ext-link-type="uri" xlink:href="http://jbloom.github.io/mapmuts/example_WSN_HA_2014Analysis.html">http://jbloom.github.io/mapmuts/example_WSN_HA_2014Analysis.html</ext-link>.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03300.010">http://dx.doi.org/10.7554/eLife.03300.010</ext-link></p></caption><graphic xlink:href="elife03300fs004"/></fig></fig-group></p><p>In contrast, only 53% of multi-nucleotide codon mutations are observed at least five times in the combined sequencing data for the mutant viruses (<bold>mutvirus</bold> samples), and only ≈ 26% of such mutations are observed at least five times in sequencing of the <bold>mutvirus</bold> for each individual replicate (<xref ref-type="fig" rid="fig4">Figure 4</xref>, <xref ref-type="fig" rid="fig4s1">Figure 4—figure supplement 1</xref>). However, these numbers are confounded by the fact that many mutations are deleterious, and so may be absent because purifying selection has purged them from the mutant viruses. A less confounded measure is the frequency of <italic>synonymous</italic> multi-nucleotide mutations, since synonymous mutations are less likely to be strongly deleterious. About 85% of such mutations are observed at least five times in the combined <bold>mutvirus</bold> samples, and ≈ 51% of such mutations are observed at least five times in the <bold>mutvirus</bold> samples for the individual replicates (<xref ref-type="fig" rid="fig4">Figure 4</xref>, <xref ref-type="fig" rid="fig4s1">Figure 4—figure supplement 1</xref>). Note that these numbers are only a lower bound on the fraction of codon mutations sampled by the mutant viruses—even synonymous mutations to influenza are sometimes strongly deleterious (<xref ref-type="bibr" rid="bib37">Marsh et al., 2008</xref>), and so some of the missing synonymous codon mutations may have been introduced into mutant viruses but then purged by purifying selection. Furthermore, the redundancy of the genetic code means that the fraction of possible amino-acid mutations sampled is substantially higher than the fraction of codon mutations sampled. Specifically, if 85% of possible codon mutations are sampled at least five times in the combined libraries (as <xref ref-type="fig" rid="fig4">Figure 4</xref> indicates), then our simulations suggest that ≈ 97% of possible amino-acid mutations will have also been sampled at least five times (‘Materials and methods’).</p><p>Overall, these results indicate that nearly all mutations are represented in the plasmid mutant libraries. Virus generation by reverse genetics does introduce a bottleneck—but fortunately, this bottleneck is sufficiently mild that at least half of all possible codon mutations are still sampled at least five times by the mutant viruses in each individual replicate. Combining the data for the three replicates brings the coverage of possible codon mutations to around 85%, and the coverage of possible amino-acid mutations to 97%. Therefore, the sampling of mutations is sufficiently complete to provide information on the effects of most amino-acid mutations when the data from the three experimental replicates are combined.</p></sec><sec id="s2-6"><title>Estimation of the effects of each amino-acid mutation to HA</title><p>We quantified the effects of mutations in terms of site-specific amino-acid ‘preferences’. These preferences are the expected frequency of each amino acid at each site in the mutant viruses in a hypothetical situation in which all amino acids are introduced at that site at equal frequency in the initial plasmid mutant library (<xref ref-type="bibr" rid="bib9">Bloom, 2014</xref>). Because many of the HAs in our libraries contain several mutations, these preferences do not simply correspond to the fitness effect of each individual mutation to the WSN HA—rather, they represent the average effect of each mutation in a collection of closely related HA mutants. Mutations to amino acids with high preferences are favored by selection, while mutations to amino acids with low preferences are disfavored. The amino-acid preferences are inferred from the deep sequencing data using a Bayesian statistical framework in which the observed counts are treated as draws from multinomial distributions with unknown parameters representing the initial mutagenesis rate, the various error rates, and selection as represented by the preferences (see ‘Materials and methods’ for details).</p><p><xref ref-type="fig" rid="fig5">Figure 5</xref> shows the amino-acid preferences for the entire HA gene inferred from the combined data from all three biological replicates. As can be seen from this figure, some sites have strong preferences for one specific amino acid, while other sites are tolerant of a variety of different amino acids. As described in <xref ref-type="table" rid="tbl1">Table 1</xref>, the inferred amino-acid preferences are consistent with existing knowledge about mutations and residues affecting HA stability, membrane fusion, proteolytic activation, and receptor binding (<xref ref-type="bibr" rid="bib39">Nakajima et al., 1986</xref>; <xref ref-type="bibr" rid="bib38">Martin et al., 1998</xref>; <xref ref-type="bibr" rid="bib44">Qiao et al., 1999</xref>; <xref ref-type="bibr" rid="bib51">Stech et al., 2005</xref>). This concordance suggests that the deep mutational scanning effectively captures many of the structural and functional constraints on HA.<fig-group><fig id="fig5" position="float"><object-id pub-id-type="doi">10.7554/eLife.03300.011</object-id><label>Figure 5.</label><caption><title>The amino-acid preferences inferred using the combined data from the three biological replicates.</title><p>The letters have heights proportional to the preference for that amino acid, and are colored by hydrophobicity. The first overlay bar shows the relative solvent accessibility (RSA) for residues in the HA crystal structure. The second overlay bar indicates Caton et al. antigenic sites or conserved receptor-binding residues. The sequence is numbered sequentially beginning with 1 at the N-terminal methionine—however, this first methionine is not shown as it was not mutagenized. <xref ref-type="fig" rid="fig5s1">Figure 5—figure supplement 1</xref> shows the same data with H3 numbering of the sequence. The data and code used to create this figure are available via <ext-link ext-link-type="uri" xlink:href="http://jbloom.github.io/mapmuts/example_WSN_HA_2014Analysis.html">http://jbloom.github.io/mapmuts/example_WSN_HA_2014Analysis.html</ext-link>; this plot is the file <italic>sequentialnumbering_site_preferences_logoplot.pdf</italic> described therein.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03300.011">http://dx.doi.org/10.7554/eLife.03300.011</ext-link></p></caption><graphic xlink:href="elife03300f005"/></fig><fig id="fig5s1" position="float" specific-use="child-fig"><object-id pub-id-type="doi">10.7554/eLife.03300.012</object-id><label>Figure 5—figure supplement 1.</label><caption><title>A plot matching that shown in <xref ref-type="fig" rid="fig5">Figure 5</xref> except that the HA sequence is numbered using the H3 numbering scheme.</title><p>This plot is the file <italic>H3numbering_site_preferences_logoplot.pdf</italic> described at <ext-link ext-link-type="uri" xlink:href="http://jbloom.github.io/mapmuts/example_WSN_HA_2014Analysis.html">http://jbloom.github.io/mapmuts/example_WSN_HA_2014Analysis.html</ext-link>.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03300.012">http://dx.doi.org/10.7554/eLife.03300.012</ext-link></p></caption><graphic xlink:href="elife03300fs005"/></fig></fig-group><table-wrap id="tbl1" position="float"><object-id pub-id-type="doi">10.7554/eLife.03300.013</object-id><label>Table 1.</label><caption><p>The amino-acid preferences inferred from the combined experimental replicates are consistent with existing knowledge about HA structure and function</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03300.013">http://dx.doi.org/10.7554/eLife.03300.013</ext-link></p></caption><table frame="hsides" rules="groups"><thead><tr><th>Site in sequential numbering</th><th>Site in H3 numbering</th><th>Existing knowledge</th><th>Inferred amino-acid preferences</th></tr></thead><tbody><tr><td>127</td><td>117 (HA1)</td><td>Mutation from S to P creates a temperature-sensitive defect (<xref ref-type="bibr" rid="bib39">Nakajima et al., 1986</xref>)</td><td>The preference for S is 30 times higher than the preference for P</td></tr><tr><td>174</td><td>161 (HA1)</td><td>Mutation from Y to H creates a temperature-sensitive defect (<xref ref-type="bibr" rid="bib39">Nakajima et al., 1986</xref>)</td><td>The preference for Y is 25 times higher than the preference for H</td></tr><tr><td>344</td><td>1 (HA2)</td><td>Mutation from G to E abolishes HA fusion activity (<xref ref-type="bibr" rid="bib44">Qiao et al., 1999</xref>)</td><td>The preference for G is 11 times higher than for E</td></tr><tr><td>343</td><td>327 (HA1)</td><td>A basic residue (R or K) is required for HA proteolytic activation (<xref ref-type="bibr" rid="bib51">Stech et al., 2005</xref>)</td><td>The combined preferences for R and K (0.87) far exceed those of all other amino acids combined</td></tr><tr><td>108</td><td>98 (HA1)</td><td>Receptor-binding residue, is Y in >99% of natural H1 HAs</td><td>The preference for Y (0.61) exceeds those of all other amino acids combined</td></tr><tr><td>166</td><td>153 (HA1)</td><td>Receptor-binding residue, is W in >99% of natural H1 HAs</td><td>The preference for W (0.65) exceeds those of all other amino acids combined</td></tr><tr><td>196</td><td>183 (HA1)</td><td>Receptor-binding residue, is H in >99% of natural H1 HAs</td><td>The preference for H (0.69) exceeds those of all other amino acids combined</td></tr><tr><td>203</td><td>190 (HA1)</td><td>Receptor-binding residue, is D in 90% of natural H1 HAs</td><td>The highest preference is for the chemically similar E</td></tr><tr><td>207</td><td>194 (HA1)</td><td>Receptor-binding residue, is L in 97% of natural H1 HAs</td><td>The preference for L (0.55) exceeds those of all other amino acids combined</td></tr><tr><td>208</td><td>195 (HA1)</td><td>Receptor-binding residue, is Y in >99% of natural H1 HAs</td><td>The preference for Y (0.72) exceeds those of all other amino acids combined</td></tr><tr><td>239</td><td>226 (HA1)</td><td>Receptor-binding residue, is Q in ≈99% of natural H1 HAs</td><td>Q is one of three amino acids with a high preference</td></tr><tr><td>241</td><td>228 (HA1)</td><td>Receptor-binding residue, is G in >99% of natural H1 HAs</td><td>The preference for G (0.57) exceeds those of all other amino acids combined</td></tr></tbody></table><table-wrap-foot><fn><p>The conserved receptor-binding residues listed in this table are those delineated in the first table of <xref ref-type="bibr" rid="bib38">Martin et al. (1998)</xref> that also have at least 90% conservation among all naturally occurring H1 HAs in the Influenza Virus Resource (<xref ref-type="bibr" rid="bib3">Bao et al., 2008</xref>).</p></fn></table-wrap-foot></table-wrap></p><p>Despite the general concordance between the inferred amino-acid preferences and existing knowledge, it is important to quantify the experimental error associated with the deep mutational scanning. We sought to quantify two factors: <italic>technical</italic> variation due to inaccuracies and statistical limitations during Illumina sample preparation and deep sequencing, and <italic>biological</italic> variation due to stochasticity in the viral mutants that were generated and enriched during each replicate of the experiment. <xref ref-type="fig" rid="fig6">Figure 6A</xref> shows the correlation between biological replicate #1 and a technical repeat of the Illumina sample preparation and deep sequencing for this biological replicate. There is a very high correlation between the preferences inferred from these two repeats, indicating that technical variation has only a very minor influence on the final inferred amino-acid preferences. <xref ref-type="fig" rid="fig6">Figure 6B–D</xref> show the correlation among the three different biological replicates. Although the biological replicates are substantially correlated, there is also clear variation. Most of this variation is attributable to amino acids which in one replicate are inferred to have preferences near the a priori expectation of 0.05 (there are 20 amino acids, which in the absence of data are all initially assumed to have an equal preference of <inline-formula><mml:math id="inf1"><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mn>20</mml:mn></mml:mrow></mml:mfrac></mml:mrow></mml:math></inline-formula>), but in another replicate are inferred to have a much higher or lower preference. Such variation arises because the mutant viruses for each biological replicate only sample about 50% of the possible codon mutations (see previous section), meaning that there is little data for some mutations in any given replicate. Fortunately, combining the three biological replicates greatly increases the coverage of possible mutations (see previous section). Therefore, inferences made from the combined data (as in <xref ref-type="fig" rid="fig5">Figure 5</xref>) should be substantially more accurate than inferences from any of the individual replicates. This idea is supported by the results below, which quantify the extent to which the inferred preferences accurately describe natural HA evolution.<fig id="fig6" position="float"><object-id pub-id-type="doi">10.7554/eLife.03300.014</object-id><label>Figure 6.</label><caption><title>Correlations among the amino-acid preferences inferred using data from the individual biological replicates.</title><p>(<bold>A</bold>) The preferences from two technical repeats of the sample preparation and deep sequencing of biological replicate #1 are highly correlated. (<bold>B</bold>)<bold>–</bold>(<bold>D</bold>) The preferences from the three biological replicates are substantially but imperfectly correlated. Overall, these results indicate that technical variation in sample preparation and sequencing is minimal, but that there is substantial variation between biological replicates due to stochastic differences in which mutant viruses predominate during the initial reverse-genetics step. The Pearson correlation coefficient (<italic>R</italic>) and associated p-value are shown in the upper-left corner of each plot. The data and code used to create this figure are available via <ext-link ext-link-type="uri" xlink:href="http://jbloom.github.io/mapmuts/example_WSN_HA_2014Analysis.html">http://jbloom.github.io/mapmuts/example_WSN_HA_2014Analysis.html</ext-link>; these plots are the files <italic>correlations/replicate_1_vs_replicate_1_repeat.pdf</italic>, <italic>correlations/replicate_1_vs_replicate_2.pdf</italic>, <italic>correlations/replicate_1_vs_replicate_3.pdf</italic>, and <italic>correlations/replicate_2_vs_replicate_3.pdf</italic> described therein.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03300.014">http://dx.doi.org/10.7554/eLife.03300.014</ext-link></p></caption><graphic xlink:href="elife03300f006"/></fig></p></sec><sec id="s2-7"><title>Comparison to another high-throughput study of mutations to HA</title><p>As our paper was under review, <xref ref-type="bibr" rid="bib58">Wu et al. (2014)</xref> published the results of using a similar strategy to examine the effects of mutations to the WSN HA. In their study, the HA gene was mutated at the nucleotide level, so their experiments surveyed only amino-acid mutations accessible by single-nucleotide codon changes. As a result, they provide data on the effects of only about 20% of the 19 × 564 = 10716 amino-acid mutations examined in our study. Despite this limitation, their study provides a large dataset of mutational effects to which we can compare our results.</p><p><xref ref-type="fig" rid="fig7">Figure 7</xref> compares the mutational effects determined in our study to those from <xref ref-type="bibr" rid="bib58">Wu et al. (2014)</xref>. There is a highly significant correlation between the results of the two studies—but the inferred mutational effects are certainly not identical. Because <xref ref-type="bibr" rid="bib58">Wu et al. (2014)</xref> do not provide the data for replicates of their experiment, we are unable to assess whether the variability between the two different studies exceeds the variability between experimental replicates within each study. So one can imagine both biologically interesting and uninteresting explanations for the imperfect correlation between the results of the two studies. The interesting explanation is that differences in experimental methodology could lead to different selection pressures on specific mutations: for instance, <xref ref-type="bibr" rid="bib58">Wu et al. (2014)</xref> use A549 cells while we use MDCK-SIAT1 cells, and perhaps the impact of certain mutations is dependent on the cell line. The uninteresting explanation is that the imperfect correlation is simply due to noise in the experimental measurements. Unfortunately, it is not straightforward to distinguish between these two explanations. This difficulty in pinpointing reasons for inter-study variation highlights a limitation of the high-throughput experimental methodology employed by ourselves and <xref ref-type="bibr" rid="bib58">Wu et al. (2014)</xref>: while such experiments provide a wealth of data, numerous factors can create noise in these data (sequencing errors, population bottlenecks, epistasis among mutations, <italic>etc</italic>). Realizing the full potential of such studies will therefore require extensive experimental controls and biological replicates to quantify errors and noise to enable comparisons across data sets.<fig id="fig7" position="float"><object-id pub-id-type="doi">10.7554/eLife.03300.015</object-id><label>Figure 7.</label><caption><title>Correlation of the site-specific amino-acid preferences determined in our study with the “relative fitness” (RF) values reported by <xref ref-type="bibr" rid="bib58">Wu et al. (2014</xref>). <xref ref-type="bibr" rid="bib58">Wu et al. (2014)</xref> report RF values for 2350 of the 564×19 = 10716 possible amino-acid mutations to the WSN HA examined in our study (they only examine single-nucleotide changes and disregard certain types of mutations due to oxidative damage of their DNA).</title><p>To compare across the data sets, we have normalized their RF values by the RF value for the wildtype amino-acid (which they provide for only 2264 of the 2350 mutations). We then correlate on a logarithmic scale these normalized RF values with the ratio of our measurement of the preference for the mutant amino acid divided by the preference for the wildtype amino acid, using the preferences from our combined replicates. For mutations for which <xref ref-type="bibr" rid="bib58">Wu et al. (2014)</xref> report an RF of zero, we assign a normalized RF equal to the smallest value for their entire data set. There is a significant Pearson correlation of 0.48 between the data sets, indicating that both our experiments and those of <xref ref-type="bibr" rid="bib58">Wu et al. (2014)</xref> are capturing many of the same constraints on HA. The data and code used to create this figure are available via <ext-link ext-link-type="uri" xlink:href="http://jbloom.github.io/mapmuts/example_WSN_HA_2014Analysis.html">http://jbloom.github.io/mapmuts/example_WSN_HA_2014Analysis.html</ext-link>; this plot is the file <italic>correlation_with_Wu_et_al.pdf</italic> described therein.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03300.015">http://dx.doi.org/10.7554/eLife.03300.015</ext-link></p></caption><graphic xlink:href="elife03300f007"/></fig></p><p>Nonetheless, <xref ref-type="fig" rid="fig7">Figure 7</xref> shows that there is a highly significant correlation between the results of these two high-throughput studies, despite differences in experimental methodology and unquantified sources of experimental noise. This fact suggests that both studies capture fundamental constraints on HA’s mutational tolerance. In the remaining sections, we apply the more comprehensive data generated by our study to address questions about HA’s natural evolution and antigenic evolvability.</p></sec><sec id="s2-8"><title>Experimental inferences are consistent with HA’s natural evolution</title><p>Do the results of our deep mutational scanning experiment accurately reflect the real constraints on HA? <xref ref-type="table" rid="tbl1">Table 1</xref> uses an anecdotal comparison to a small number of existing experimental studies to suggest that they do. However, a more systematic way to address this question is to compare the inferred amino-acid preferences to the actual patterns of HA evolution in nature.</p><p>To make such a comparison, we created an alignment of HA sequences from human and swine influenza viruses descended from a common ancestor closely related to the virus that caused the 1918 influenza pandemic. <xref ref-type="fig" rid="fig8">Figure 8</xref> shows a phylogenetic tree of these sequences. The WSN HA used in our deep mutational scanning falls relatively close to the root of this tree.<fig-group><fig id="fig8" position="float"><object-id pub-id-type="doi">10.7554/eLife.03300.016</object-id><label>Figure 8.</label><caption><title>A phylogenetic tree of human and swine H1 HA sequences descended from a common ancestor closely related to the 1918 virus.</title><p>The WSN virus used in the experiments here is a lab-adapted version of the <italic>A/Wilson Smith/1933</italic> strain. Human H1N1 that circulated from 1918 until 1957 is shown in blue. Human seasonal H1N1 that reappeared in 1977 is shown in purple. Swine H1N1 is shown in red. The 2009 pandemic H1N1 is shown in green. This tree was constructed using <italic>codonPhyML</italic> (<xref ref-type="bibr" rid="bib21">Gil et al., 2013</xref>) with the substitution model of <xref ref-type="bibr" rid="bib22">Goldman and Yang (1994)</xref>. This plot is the file <italic>CodonPhyML_Tree_H1_HumanSwine_GY94/annotated_tree.pdf</italic> described at <ext-link ext-link-type="uri" xlink:href="http://jbloom.github.io/phyloExpCM/example_2014Analysis_Influenza_H1_HA.html">http://jbloom.github.io/phyloExpCM/example_2014Analysis_Influenza_H1_HA.html</ext-link>. <xref ref-type="fig" rid="fig8s1">Figure 8—figure supplement 1</xref> shows a tree estimated for the same sequences using the substitution model of <xref ref-type="bibr" rid="bib33">Kosiol et al. (2007)</xref>.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03300.016">http://dx.doi.org/10.7554/eLife.03300.016</ext-link></p></caption><graphic xlink:href="elife03300f008"/></fig><fig id="fig8s1" position="float" specific-use="child-fig"><object-id pub-id-type="doi">10.7554/eLife.03300.017</object-id><label>Figure 8—figure supplement 1.</label><caption><title>A phylogenetic tree of the same sequences shown in <xref ref-type="fig" rid="fig8">Figure 8</xref>, this time inferred using the substitution model of <xref ref-type="bibr" rid="bib33">Kosiol et al. (2007)</xref>.</title><p>This tree is extremely similar to that in <xref ref-type="fig" rid="fig8">Figure 8</xref>, indicating the inferred topology is robust to the exact choice of codon-substitution model. This plot is the file <italic>CodonPhyML_Tree_H1_HumanSwine_KOSI07/annotated_tree.pdf</italic> described at <ext-link ext-link-type="uri" xlink:href="http://jbloom.github.io/phyloExpCM/example_2014Analysis_Influenza_H1_HA.html">http://jbloom.github.io/phyloExpCM/example_2014Analysis_Influenza_H1_HA.html</ext-link>.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03300.017">http://dx.doi.org/10.7554/eLife.03300.017</ext-link></p></caption><graphic xlink:href="elife03300fs006"/></fig></fig-group></p><p>The crudest comparison is simply to correlate amino-acid frequencies in the natural sequences to the experimentally inferred amino-acid preferences. <xref ref-type="fig" rid="fig9">Figure 9</xref> shows that the inferred preferences are substantially although imperfectly correlated with the natural amino-acid frequencies. However, this comparison is problematic because it fails to account for the contingent and limited sampling of mutations by natural evolution. While the deep mutational scanning is designed to sample all possible mutations, only a fraction of theoretically tolerable mutations have fixed in natural H1 HAs due to the finite timespan during which evolution has been exploring possible sequences (in other words, evolution is not at equilibrium; see <xref ref-type="bibr" rid="bib43">Povolotskaya and Kondrashov, 2010</xref>). Therefore, an amino-acid frequency of close to one among the natural HA sequences in <xref ref-type="fig" rid="fig8">Figure 8</xref> might imply an absolute functional requirement for that amino acid—or it might simply mean that natural evolution has not yet happened to fix a mutation to another tolerable amino acid at that site.<fig id="fig9" position="float"><object-id pub-id-type="doi">10.7554/eLife.03300.018</object-id><label>Figure 9.</label><caption><title>The frequencies of amino acids among the naturally occurring HA sequences in <xref ref-type="fig" rid="fig8">Figure 8</xref> vs the amino-acid preferences inferred from the combined replicates (<xref ref-type="fig" rid="fig5">Figure 5</xref>).</title><p>Note that a natural frequency close to one or zero could indicate absolute selection for or against a specific amino acid, but could also simply result from the fact that natural evolution has not completely sampled all possible mutations compatible with HA structure and function. The Pearson correlation coefficient (<italic>R</italic>) and associated p-value are shown on the plot. This plot is the file <italic>natural_frequency_vs_preference.pdf</italic> described at <ext-link ext-link-type="uri" xlink:href="http://jbloom.github.io/phyloExpCM/example_2014Analysis_Influenza_H1_HA.html">http://jbloom.github.io/phyloExpCM/example_2014Analysis_Influenza_H1_HA.html</ext-link>.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03300.018">http://dx.doi.org/10.7554/eLife.03300.018</ext-link></p></caption><graphic xlink:href="elife03300f009"/></fig></p><p>A better approach is therefore to treat natural evolution as a non-equilibrium dynamic process, and ask whether the inferred amino-acid preferences accurately describe this process. This type of analysis can be done using the likelihood-based statistical framework for phylogenetics developed by Felsenstein (<xref ref-type="bibr" rid="bib16">1973</xref>, <xref ref-type="bibr" rid="bib17">1981</xref>). Specifically, we fix the phylogenetic tree topology to that shown in <xref ref-type="fig" rid="fig8">Figure 8</xref> and then assess the likelihood of the natural sequences given a specific evolutionary model after optimizing the branch lengths of the tree. Evolutionary models that more accurately describe HA sequence evolution will have higher likelihoods, and the relative accuracy of models can be quantified by comparing their likelihoods after correcting for the number of free parameters using AIC (<xref ref-type="bibr" rid="bib42">Posada and Buckley, 2004</xref>). Previous work has described how experimental measurements of amino-acid preferences can be combined with known mutation rates to create a parameter-free phylogenetic evolutionary model from deep mutational scanning data (<xref ref-type="bibr" rid="bib9">Bloom, 2014</xref>).</p><p><xref ref-type="table" rid="tbl2">Table 2</xref> and <xref ref-type="table" rid="tbl3">Table 3</xref> compare the fit of evolutionary models based on the experimentally inferred amino-acid preferences with several existing state-of-the-art models that do not utilize this experimental information (<xref ref-type="bibr" rid="bib22">Goldman and Yang, 1994</xref>; <xref ref-type="bibr" rid="bib33">Kosiol et al., 2007</xref>). The model based on amino-acid preferences inferred from the combined experimental data from the three replicates describes the evolution of the naturally occurring HA sequences far better than the alternative models, despite the fact that the latter have a variety of free parameters that are optimized to improve the fit. Models based on amino-acid preferences inferred from the individual experimental replicates also fit the data better than existing models—however, the fit is poorer than for the model that utilizes the data from all three replicates. This result is consistent with the fact that the individual replicates are incomplete in their sampling of the mutational effects, meaning that aggregating the data from several replicates improves the accuracy of inferred preferences. Overall, these comparisons show that the deep mutational scanning reflects the actual constraints on HA evolution substantially better than existing quantitative evolutionary models.<table-wrap id="tbl2" position="float"><object-id pub-id-type="doi">10.7554/eLife.03300.019</object-id><label>Table 2.</label><caption><p>An evolutionary model derived from the experimentally inferred amino-acid preferences describes the HA sequence phylogeny in <xref ref-type="fig" rid="fig8">Figure 8</xref> far better than a variety of existing state-of-the-art models</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03300.019">http://dx.doi.org/10.7554/eLife.03300.019</ext-link></p></caption><table frame="hsides" rules="groups"><thead><tr><th>Model</th><th>Δ AIC</th><th>Log likelihood</th><th>Parameters (optimized + empirical)</th></tr></thead><tbody><tr><td>Combined</td><td>0.0</td><td>−24088.7</td><td>0 (0 + 0)</td></tr><tr><td>replicate 3</td><td>303.2</td><td>−24240.3</td><td>0 (0 + 0)</td></tr><tr><td>Combined, Halpern and Bruno</td><td>500.6</td><td>−24339.0</td><td>0 (0 + 0)</td></tr><tr><td>replicate 1</td><td>535.4</td><td>−24356.4</td><td>0 (0 + 0)</td></tr><tr><td>replicate 3, Halpern and Bruno</td><td>657.8</td><td>−24417.6</td><td>0 (0 + 0)</td></tr><tr><td>replicate 2</td><td>876.2</td><td>−24526.8</td><td>0 (0 + 0)</td></tr><tr><td>GY94, gamma <italic>ω</italic>, gamma rates</td><td>882.6</td><td>−24517.0</td><td>13 (4 + 9)</td></tr><tr><td>replicate 1, Halpern and Bruno</td><td>983.2</td><td>−24580.3</td><td>0 (0 + 0)</td></tr><tr><td>GY94, gamma <italic>ω</italic>, one rate</td><td>1109.7</td><td>−24631.5</td><td>12 (3 + 9)</td></tr><tr><td>replicate 2, Halpern and Bruno</td><td>1190.0</td><td>−24683.7</td><td>0 (0 + 0)</td></tr><tr><td>KOSI07, gamma <italic>ω</italic>, gamma rates</td><td>1620.5</td><td>−24834.9</td><td>64 (4 + 60)</td></tr><tr><td>GY94, one <italic>ω</italic>, gamma rates</td><td>1859.4</td><td>−25006.4</td><td>12 (3 + 9)</td></tr><tr><td>KOSI07, gamma <italic>ω</italic>, one rate</td><td>1883.0</td><td>−24967.2</td><td>63 (3 + 60)</td></tr><tr><td>KOSI07, one <italic>ω</italic>, gamma rates</td><td>2378.8</td><td>−25215.1</td><td>63 (3 + 60)</td></tr><tr><td>GY94, one <italic>ω</italic>, one rate</td><td>2544.5</td><td>−25350.0</td><td>11 (2 + 9)</td></tr><tr><td>KOSI07, one <italic>ω</italic>, one rate</td><td>3040.0</td><td>−25546.7</td><td>62 (2 + 60)</td></tr><tr><td>combined, randomized</td><td>5632.8</td><td>−26905.1</td><td>0 (0 + 0)</td></tr><tr><td>replicate 1, randomized</td><td>6002.4</td><td>−27089.9</td><td>0 (0 + 0)</td></tr><tr><td>replicate 3, randomized</td><td>6138.8</td><td>−27158.1</td><td>0 (0 + 0)</td></tr><tr><td>replicate 2, randomized</td><td>6477.8</td><td>−27327.6</td><td>0 (0 + 0)</td></tr><tr><td>combined, randomized, Halpern and Bruno</td><td>7072.8</td><td>−27625.1</td><td>0 (0 + 0)</td></tr><tr><td>replicate 1, randomized, Halpern and Bruno</td><td>7795.0</td><td>−27986.2</td><td>0 (0 + 0)</td></tr><tr><td>replicate 3, randomized, Halpern and Bruno</td><td>7891.8</td><td>−28034.6</td><td>0 (0 + 0)</td></tr><tr><td>replicate 2, randomized, Halpern and Bruno</td><td>8494.4</td><td>−28335.9</td><td>0 (0 + 0)</td></tr></tbody></table><table-wrap-foot><fn><p>The model is most accurate if it utilizes data from the combined experimental replicates, but it also outperforms existing models even if the data are only derived from individual replicates. Models are ranked by AIC (<xref ref-type="bibr" rid="bib42">Posada and Buckley, 2004</xref>). <italic>GY94</italic> indicates the model of <xref ref-type="bibr" rid="bib22">Goldman and Yang (1994)</xref>, and <italic>KOSI07</italic> indicates the model of <xref ref-type="bibr" rid="bib33">Kosiol et al. (2007)</xref>. The nonsynonymous/synonymous ratio (<italic>ω</italic>) and the substitution rate are either estimated as a single value or drawn from a four-category gamma distribution. Randomizing the experimentally inferred preferences among sites makes the models far worse. The models work best fixation probabilities are computed from the preferences using the first equation proposed in <xref ref-type="bibr" rid="bib9">Bloom (2014)</xref>. The table also shows the results if the fixation probabilities are instead computed using the equation of <xref ref-type="bibr" rid="bib26">Halpern and Bruno (1998)</xref> as described in <xref ref-type="bibr" rid="bib9">Bloom (2014)</xref>. This table is the file <italic>H1_HumanSwine_GY94_summary.tex</italic> described at <ext-link ext-link-type="uri" xlink:href="http://jbloom.github.io/phyloExpCM/example_2014Analysis_Influenza_H1_HA.html">http://jbloom.github.io/phyloExpCM/example_2014Analysis_Influenza_H1_HA.html</ext-link>. <xref ref-type="table" rid="tbl3">Table 3</xref> shows the results when the tree topology is instead estimated using the substitution model of <xref ref-type="bibr" rid="bib33">Kosiol et al. (2007)</xref>.</p></fn></table-wrap-foot></table-wrap><table-wrap id="tbl3" position="float"><object-id pub-id-type="doi">10.7554/eLife.03300.020</object-id><label>Table 3.</label><caption><p>An evolutionary model derived from the experimentally inferred amino-acid preferences also outperforms existing models for the tree topology in <xref ref-type="fig" rid="fig8s1">Figure 8—figure supplement 1</xref></p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03300.020">http://dx.doi.org/10.7554/eLife.03300.020</ext-link></p></caption><table frame="hsides" rules="groups"><thead><tr><th>Model</th><th>ΔAIC</th><th>Log likelihood</th><th>Parameters (optimized + empirical)</th></tr></thead><tbody><tr><td>Combined</td><td>0.0</td><td>−24082.5</td><td>0 (0 + 0)</td></tr><tr><td>replicate 3</td><td>304.8</td><td>−24234.9</td><td>0 (0 + 0)</td></tr><tr><td>Combined, Halpern and Bruno</td><td>494.4</td><td>−24329.7</td><td>0 (0 + 0)</td></tr><tr><td>replicate 1</td><td>534.2</td><td>−24349.6</td><td>0 (0 + 0)</td></tr><tr><td>replicate 3, Halpern and Bruno</td><td>653.2</td><td>−24409.1</td><td>0 (0 + 0)</td></tr><tr><td>replicate 2</td><td>869.4</td><td>−24517.2</td><td>0 (0 + 0)</td></tr><tr><td>GY94, gamma <italic>ω</italic>, gamma rates</td><td>876.7</td><td>−24507.8</td><td>13 (4 + 9)</td></tr><tr><td>replicate 1, Halpern and Bruno</td><td>976.8</td><td>−24570.9</td><td>0 (0 + 0)</td></tr><tr><td>GY94, gamma <italic>ω</italic>, one rate</td><td>1101.0</td><td>−24621.0</td><td>12 (3 + 9)</td></tr><tr><td>replicate 2, Halpern and Bruno</td><td>1180.4</td><td>−24672.7</td><td>0 (0 + 0)</td></tr><tr><td>KOSI07, gamma <italic>ω</italic>, gamma rates</td><td>1609.0</td><td>−24823.0</td><td>64 (4 + 60)</td></tr><tr><td>GY94, one <italic>ω</italic>, gamma rates</td><td>1856.2</td><td>−24998.6</td><td>12 (3 + 9)</td></tr><tr><td>KOSI07, gamma <italic>ω</italic>, one rate</td><td>1867.3</td><td>−24953.1</td><td>63 (3 + 60)</td></tr><tr><td>KOSI07, one <italic>ω</italic>, gamma rates</td><td>2367.9</td><td>−25203.4</td><td>63 (3 + 60)</td></tr><tr><td>GY94, one <italic>ω</italic>, one rate</td><td>2548.3</td><td>−25345.6</td><td>11 (2 + 9)</td></tr><tr><td>KOSI07, one <italic>ω</italic>, one rate</td><td>3028.0</td><td>−25534.5</td><td>62 (2 + 60)</td></tr><tr><td>Combined, randomized</td><td>5628.0</td><td>−26896.5</td><td>0 (0 + 0)</td></tr><tr><td>replicate 1, randomized</td><td>5993.6</td><td>−27079.3</td><td>0 (0 + 0)</td></tr><tr><td>replicate 3, randomized</td><td>6138.0</td><td>−27151.5</td><td>0 (0 + 0)</td></tr><tr><td>replicate 2, randomized</td><td>6475.2</td><td>−27320.1</td><td>0 (0 + 0)</td></tr><tr><td>combined, randomized, Halpern and Bruno</td><td>7069.4</td><td>−27617.2</td><td>0 (0 + 0)</td></tr><tr><td>replicate 1, randomized, Halpern and Bruno</td><td>7786.8</td><td>−27975.9</td><td>0 (0 + 0)</td></tr><tr><td>replicate 3, randomized, Halpern and Bruno</td><td>7889.2</td><td>−28027.1</td><td>0 (0 + 0)</td></tr><tr><td>replicate 2, randomized, Halpern and Bruno</td><td>8496.0</td><td>−28330.5</td><td>0 (0 + 0)</td></tr></tbody></table><table-wrap-foot><fn><p>This table differs from <xref ref-type="table" rid="tbl2">Table 2</xref> in that it uses the tree topology inferred with the model of <xref ref-type="bibr" rid="bib33">Kosiol et al. (2007)</xref> rather than <xref ref-type="bibr" rid="bib22">Goldman and Yang (1994)</xref>. This table is the file <italic>H1_HumanSwine_KOSI07_summary.tex</italic> described at <ext-link ext-link-type="uri" xlink:href="http://jbloom.github.io/phyloExpCM/example_2014Analysis_Influenza_H1_HA.html">http://jbloom.github.io/phyloExpCM/example_2014Analysis_Influenza_H1_HA.html</ext-link>.</p></fn></table-wrap-foot></table-wrap></p></sec><sec id="s2-9"><title>The inherent evolvability of antigenic sites on HA</title><p>The amino-acid preferences inferred from the deep mutational scanning reflect the inherent mutational tolerance of sites in HA. In contrast, the evolution of HA in nature is shaped by a combination of HA's inherent mutational tolerance and external selection pressures. Specifically, the evolution of HA in humans is strongly driven by selection for mutations that alter antigenicity (<xref ref-type="bibr" rid="bib60">Yewdell et al., 1979</xref>; <xref ref-type="bibr" rid="bib56">Wiley et al., 1981</xref>; <xref ref-type="bibr" rid="bib12">Caton et al., 1982</xref>; <xref ref-type="bibr" rid="bib50">Smith et al., 2004</xref>; <xref ref-type="bibr" rid="bib13">Das et al., 2013</xref>; <xref ref-type="bibr" rid="bib31">Koel et al., 2013</xref>; <xref ref-type="bibr" rid="bib4">Bedford et al., 2014</xref>). The fact that such antigenic mutations fix at high frequency implies some degree of mutational tolerance at antigenic sites, since no mutations would fix if these sites were under absolute structural or functional constraint. However, it is not possible to tell from natural sequences alone whether antigenic sites are unusually mutationally tolerant compared to the rest of HA, or whether their rapid evolution is solely because they are under strong external immune selection.</p><p>To address this issue, we used the results of the deep mutational scanning to compare the inherent mutational tolerance of antigenic sites to the rest of the HA protein. <xref ref-type="bibr" rid="bib12">Caton et al. (1982)</xref> mapped the antigenic sites of the H1 HA from A/Puerto Rico/8/1934 (PR8), which is closely related to the WSN HA used in our experiments. We therefore defined the ‘Caton et al. antigenic sites’ as the WSN residues homologous to those mapped by <xref ref-type="bibr" rid="bib12">Caton et al. (1982)</xref> with the exclusion of a single site that has gained glycosylation in the WSN HA relative to the PR8 HA (see ‘Materials and methods’ for details). One possible concern is that <xref ref-type="bibr" rid="bib12">Caton et al. (1982)</xref> mapped antigenic sites largely by selecting monoclonal-antibody escape mutants, and so these sites might be biased towards being more mutationally tolerant. We therefore also made a broader classification of ‘antigenic sites and contacting residues’ consisting of the Caton et al. antigenic sites <italic>plus</italic> all surface-exposed residues in contact with these sites (see ‘Materials and methods’ for details). This broader classification includes all residues in regions of the HA surface targeted by antibodies, and so should not be biased by whether sites are amenable to the selection of monoclonal-antibody escape mutants. We hypothesized that both sets of antigenic sites would have unusually high mutational tolerance.</p><p>For comparison, we used two classifications of receptor-binding residues (‘Materials and methods’). The first classification consists of residues that have important roles in receptor binding (<xref ref-type="bibr" rid="bib38">Martin et al., 1998</xref>) <italic>and</italic> are conserved in H1 HAs; these residues are mostly deep in the binding pocket. The second classification consists of all residues that contact the sialic-acid receptor in the crystal structure, regardless of their level of conservation. We hypothesized that the core set of conserved receptor-binding residues would have unusually low mutational tolerance, but that the set of all receptor-binding residues would have typical levels of mutational tolerance since influenza routinely escapes from antibodies that target the periphery of the receptor-binding pocket (<xref ref-type="bibr" rid="bib31">Koel et al., 2013</xref>).</p><p>The positions of the Caton et al. antigenic sites and the conserved receptor-binding residues in the primary sequence are indicated by the top overlay bar in <xref ref-type="fig" rid="fig5">Figure 5</xref>. Visual inspection suggests that the conserved receptor-binding residues are indeed relatively intolerant of mutations (have a strong preference for one specific amino acid), whereas the Caton et al. antigenic sites are relatively tolerant of mutations (have roughly equivalent preferences for many amino acids).</p><p>For a more quantitative analysis, we computed a site entropy from the inferred amino-acid preferences—larger site entropies indicate a higher inherent tolerance for mutations. The site entropies of all residues are displayed on the HA protein structure in <xref ref-type="fig" rid="fig10">Figure 10</xref>. Visual inspection suggests that both classifications of antigenic sites have unusually high mutational tolerance, whereas the conserved receptor-binding residues have unusually low mutational tolerance.<fig id="fig10" position="float"><object-id pub-id-type="doi">10.7554/eLife.03300.021</object-id><label>Figure 10.</label><caption><title>Inherent mutational tolerance of HA’s receptor-binding residues and antigenic sites.</title><p>(<bold>A</bold>) Surface of HA with one monomer colored by site entropy as determined by the deep mutational scanning; blue indicates low mutational tolerance and red indicates high mutational tolerance. (<bold>B</bold>) The structure shows residues classified as antigenic sites by <xref ref-type="bibr" rid="bib12">Caton et al. (1982)</xref> in colored spheres; the plot shows site entropy vs relative solvent accessibility (RSA) of these residues (red triangles) and all other HA1 residues in the crystal structure (blue circles). (<bold>C</bold>) Antigenic sites of <xref ref-type="bibr" rid="bib12">Caton et al. (1982)</xref> plus all other surface-exposed residues that contact these sites. (<bold>D</bold>) Conserved receptor-binding residues. (<bold>E</bold>) All receptor-binding residues. <xref ref-type="table" rid="tbl4">Table 4</xref> shows that residues in (<bold>B</bold>) and (<bold>C</bold>) have unusually high mutational tolerance, residues in (<bold>D</bold>) have unusually low mutational tolerance, and residues in (<bold>E</bold>) do not have unusual mutational tolerance. The data and code to create all panels of this figure is provided via <ext-link ext-link-type="uri" xlink:href="http://jbloom.github.io/mapmuts/example_WSN_HA_2014Analysis.html">http://jbloom.github.io/mapmuts/example_WSN_HA_2014Analysis.html</ext-link>. The structure is PDB 1RVX (<xref ref-type="bibr" rid="bib20">Gamblin et al., 2004</xref>).</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03300.021">http://dx.doi.org/10.7554/eLife.03300.021</ext-link></p></caption><graphic xlink:href="elife03300f010"/></fig></p><p>We next tested whether these visual observations were supported by a rigorous statistical analysis. A confounding factor in comparing mutational tolerance across different sets of residues is that sites with higher solvent accessibility are typically more tolerant of mutations (<xref ref-type="bibr" rid="bib11">Bustamante et al., 2000</xref>; <xref ref-type="bibr" rid="bib45">Ramsey et al., 2011</xref>). To correct for this fact, we computed the relative solvent accessibility (RSA) for all residues in the HA crystal structure. Residues with RSAs close to zero are buried and are expected to be fairly intolerant of mutations, whereas residues with RSAs substantially greater than zero are surface exposed and are expected to be fairly tolerant of mutations. <xref ref-type="fig" rid="fig10">Figure 10</xref> plots site entropy as a function of RSA for HA1 residues. This figure shows that sites with higher RSA are more mutationally tolerant as expected. However, the figure also suggests that both classifications of antigenic sites are more mutationally tolerant than other residues with equivalent RSA. The figure also suggest that the conserved receptor-binding residues are less mutationally tolerant than other residues with equivalent RSA, whereas the set of all receptor-binding residues have fairly typical mutational tolerance. These observations are supported by the statistical analyses in <xref ref-type="table" rid="tbl4">Table 4</xref>: even after correcting for RSA, there is a significant trend for antigenic sites to have high mutational tolerance, and for conserved receptor-binding residues to have low mutational tolerance.<table-wrap id="tbl4" position="float"><object-id pub-id-type="doi">10.7554/eLife.03300.022</object-id><label>Table 4.</label><caption><p>The antigenic sites are more significantly mutationally tolerant than other HA1 residues with similar relative solvent accessibility (RSA), the conserved receptor-binding residues are significantly less mutationally tolerant than other similar residues, and sites in the more expansive set of all receptor-binding residues have typical levels of mutational tolerance</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03300.022">http://dx.doi.org/10.7554/eLife.03300.022</ext-link></p></caption><table frame="hsides" rules="groups"><tbody><tr><td colspan="4">Model: site entropy ∼ RSA + (Caton et al. antigenic site) + intercept</td></tr><tr><td> Property</td><td>Estimate</td><td>Standard error</td><td>p-value</td></tr><tr><td> RSA</td><td>1.29</td><td>0.12</td><td><10<sup>−10</sup></td></tr><tr><td> Caton et al. antigenic site</td><td>0.30</td><td>0.09</td><td>1.6 × 10<sup>−3</sup></td></tr><tr><td colspan="4">Model: site entropy ∼ RSA + (antigenic site or contacting residue) + intercept</td></tr><tr><td> Property</td><td>Estimate</td><td>Standard error</td><td>p-value</td></tr><tr><td> RSA</td><td>1.22</td><td>0.13</td><td><10<sup>−10</sup></td></tr><tr><td> antigenic site or contacting residue</td><td>0.23</td><td>0.07</td><td>2.2 × 10<sup>−3</sup></td></tr><tr><td colspan="4">Model: site entropy ∼ RSA + (conserved receptor binding) + intercept</td></tr><tr><td> Property</td><td>Estimate</td><td>Standard error</td><td>p-value</td></tr><tr><td> RSA</td><td>1.38</td><td>0.11</td><td><10<sup>−10</sup></td></tr><tr><td> conserved receptor binding</td><td>−0.52</td><td>0.16</td><td>1.7 × 10<sup>−3</sup></td></tr><tr><td colspan="4">Model: site entropy ∼ RSA + (all receptor binding) + intercept</td></tr><tr><td> Property</td><td>Estimate</td><td>Standard error</td><td>p-value</td></tr><tr><td> RSA</td><td>1.40</td><td>0.11</td><td><10<sup>−10</sup></td></tr><tr><td> all receptor binding</td><td>−0.18</td><td>0.11</td><td>0.12</td></tr></tbody></table><table-wrap-foot><fn><p>The sets of residues analyzed here are those shown in <xref ref-type="fig" rid="fig10">Figure 10</xref>. Shown here are the results of multiple linear regression of the continuous dependent variable of site entropy (as computed from the amino-acid preferences) vs the continuous independent variable of RSA and the binary variable of being a receptor-binding residue or being an antigenic site. The data and code used to perform these analyses are available via <ext-link ext-link-type="uri" xlink:href="http://jbloom.github.io/mapmuts/example_WSN_HA_2014Analysis.html">http://jbloom.github.io/mapmuts/example_WSN_HA_2014Analysis.html</ext-link>.</p></fn></table-wrap-foot></table-wrap></p><p>Overall, these results show that antigenic sites in HA have unusually high inherent mutational tolerance, suggesting that this property combines with external immune selection to contribute to HA’s rapid antigenic evolution. These results also show that while a core group of conserved residues deep in the receptor-binding pocket have unusually low mutational tolerance, the bulk of residues that contact the receptor are not under exceptional constraint. This fact probably explains why HA is able to escape from antibodies targeting the periphery of the receptor-binding pocket (<xref ref-type="bibr" rid="bib31">Koel et al., 2013</xref>), and why only rare antibodies that penetrate deep into this pocket are broadly neutralizing (<xref ref-type="bibr" rid="bib55">Whittle et al., 2011</xref>).</p></sec><sec id="s2-10"><title>HA's antigenic evolvability is not shared by all influenza proteins</title><p>The foregoing results show that the antigenic sites in HA have an unusually high inherent tolerance for mutations. Is this antigenic evolvability an exceptional feature of HA, or is it commonly shared by other viral proteins? Ideally one would compare HA to the major surface antigens of other viruses with high (e.g., HIV) and low (e.g., measles) rates of antigenic evolution—but unfortunately comparable data sets for these other viruses are not yet available. Therefore, we instead compared the antigenic evolvability of HA to that of influenza nucleoprotein (NP), a protein for which we have recently performed a similar deep mutational scanning experiment (<xref ref-type="bibr" rid="bib9">Bloom, 2014</xref>).</p><p>The adaptive immune system targets NP via cytotoxic T-lymphocytes (CTLs) (<xref ref-type="bibr" rid="bib54">Valkenburg et al., 2011</xref>). Although the selection exerted by these CTLs is believed to be weaker than the antibody-mediated selection on HA’s antigenic sites (<xref ref-type="bibr" rid="bib8">Bhatt et al., 2011</xref>), influenza does benefit from mutations in NP that promote escape from CTLs (<xref ref-type="bibr" rid="bib7">Berkhoff et al., 2007</xref>; <xref ref-type="bibr" rid="bib53">Valkenburg et al., 2013</xref>). However, whereas HA rapidly evolves to escape from antibodies, NP does not appear to have any special propensity for rapid evolution of the epitopes targeted by CTLs. Instead, mutations in NP's CTL epitopes are often deleterious and require secondary permissive or compensatory mutations to fix without a fitness cost (<xref ref-type="bibr" rid="bib47">Rimmelzwaan et al., 2004</xref>; <xref ref-type="bibr" rid="bib5">Berkhoff et al., 2005</xref>, <xref ref-type="bibr" rid="bib6">2006</xref>; <xref ref-type="bibr" rid="bib24">Gong et al., 2013</xref>). Therefore, we hypothesized that unlike HA's highly evolvable antigenic sites, NP's CTL-antigenic sites would <italic>not</italic> possess unusually high inherent mutational tolerance.</p><p>To test this hypothesis, we used a previously described delineation of epitopes in NP from the human H3N2 strain A/Aichi/2/1968 with experimentally validated human CTL responses (<xref ref-type="bibr" rid="bib23">Gong and Bloom, 2014</xref>). In this delineation, less than a quarter of NP’s sites participate in multiple CTL epitopes. We used the results of our previous deep mutational scanning of NP to compare the inherent mutational tolerance of sites that participate in multiple CTL epitopes to all other sites in NP. As shown in <xref ref-type="fig" rid="fig11">Figure 11</xref> and <xref ref-type="table" rid="tbl5">Table 5</xref>, the NP sites involved in multiple CTL epitopes have an inherent mutational tolerance that is indistinguishable from other sites in the protein. Therefore, NP does not possess any special inherent mutational tolerance in its CTL epitopes. This finding implies that a high level of antigenic evolvability is not a general feature of all viral proteins, but is instead at least somewhat unique to HA.<fig id="fig11" position="float"><object-id pub-id-type="doi">10.7554/eLife.03300.023</object-id><label>Figure 11.</label><caption><title>The inherent mutational tolerance of NP's CTL epitopes is indistinguishable from that of non-epitope sites in NP.</title><p>The plot shows the site entropy vs relative solvent accessibility (RSA) of NP residues that participate in multiple CTL epitopes (red triangles) and all other NP residues in the crystal structure (blue circles). Visual inspection suggests that the epitope sites have mutational tolerance comparable to other sites, and this result is supported by the statistical analysis in <xref ref-type="table" rid="tbl5">Table 5</xref>. Note that unlike for HA, there is no trend for RSA to correlate with site entropy—this could be because many of NP’s surface-exposed sites are constrained by interactions with viral RNA. The CTL epitopes are those delineated in the first supplementary table of <xref ref-type="bibr" rid="bib23">Gong and Bloom (2014)</xref>. The site entropies are computed from a previously described deep mutational scan of NP, and are the values in the first supplementary file of <xref ref-type="bibr" rid="bib9">Bloom (2014)</xref>; the RSA values are also taken from that reference. The data and code used to generate this plot is available via <ext-link ext-link-type="uri" xlink:href="http://jbloom.github.io/mapmuts/example_WSN_HA_2014Analysis.html">http://jbloom.github.io/mapmuts/example_WSN_HA_2014Analysis.html</ext-link>; the plot itself is the file <italic>NP_CTL_entropy_rsa_correlation.pdf</italic> described therein.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03300.023">http://dx.doi.org/10.7554/eLife.03300.023</ext-link></p></caption><graphic xlink:href="elife03300f011"/></fig><table-wrap id="tbl5" position="float"><object-id pub-id-type="doi">10.7554/eLife.03300.024</object-id><label>Table 5.</label><caption><p>There is no statistically significant difference between the inherent mutational tolerance of NP sites involved in multiple CTL epitopes and all other NP residues</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03300.024">http://dx.doi.org/10.7554/eLife.03300.024</ext-link></p></caption><table frame="hsides" rules="groups"><thead><tr><th colspan="4">Model: NP site entropy ∼ RSA + (multiple CTL epitopes) + intercept</th></tr><tr><th>Property</th><th>Estimate</th><th>Standard error</th><th>p-value</th></tr></thead><tbody><tr><td>RSA</td><td>−0.05</td><td>0.07</td><td>0.52</td></tr><tr><td>multiple CTL epitopes</td><td>−0.04</td><td>0.04</td><td>0.31</td></tr></tbody></table><table-wrap-foot><fn><p>The table shows the result of multiple linear regression of the continuous dependent variable of site entropy (as computed from the amino-acid preferences) vs the continuous independent variable of RSA and the binary variable of participating in multiple CTL epitopes. The data set analyzed here is plotted in <xref ref-type="fig" rid="fig11">Figure 11</xref>. The data and code used to perform this analysis are available via <ext-link ext-link-type="uri" xlink:href="http://jbloom.github.io/mapmuts/example_WSN_HA_2014Analysis.html">http://jbloom.github.io/mapmuts/example_WSN_HA_2014Analysis.html</ext-link>.</p></fn></table-wrap-foot></table-wrap></p></sec></sec><sec id="s3" sec-type="discussion"><title>Discussion</title><p>A fundamental challenge in studying the natural evolution of influenza is separating the effects of external selection pressures from inherent structural and functional constraints. The evolutionary patterns observed in natural sequences are shaped by a combination of inherent mutational tolerance and external pressures such as immune selection, and the analysis of such sequences is further confounded by the fact that influenza is not at evolutionary equilibrium.</p><p>Here we have quantified the inherent mutational tolerance of influenza HA by using deep mutational scanning (<xref ref-type="bibr" rid="bib19">Fowler et al., 2010</xref>; <xref ref-type="bibr" rid="bib2">Araya and Fowler, 2011</xref>) to simultaneously assess the impact on viral growth of the vast majority of the ≈10<sup>4</sup> possible amino-acid mutations to influenza HA. The information obtained from the deep mutational scanning is consistent with existing knowledge about the effects of mutations on HA function and structure. For instance, the deep mutational scanning shows strong selection for specific amino acids known to play important roles in HA's receptor-binding activity, fusion activity, and proteolytic activation (<xref ref-type="bibr" rid="bib38">Martin et al., 1998</xref>; <xref ref-type="bibr" rid="bib44">Qiao et al., 1999</xref>; <xref ref-type="bibr" rid="bib51">Stech et al., 2005</xref>). Similarly, at the sites of known temperature-sensitive mutations to HA (<xref ref-type="bibr" rid="bib39">Nakajima et al., 1986</xref>), the deep mutational scanning identifies the more stabilizing amino-acid as more favorable. Broader trends from the deep mutational scanning are also in agreement with current thinking about mutational effects. For example, the deep mutational scanning finds that there is strong purifying selection against stop-codon mutations and many nonsynonymous mutations, but that there is only weak selection against synonymous mutations. All of these results suggest that the deep mutational scanning faithfully captures both the specific and general effects of mutations on HA.</p><p>The comprehensive information generated by the deep mutational scanning can be used to create quantitative evolutionary models for analyzing HA sequence phylogenies. Here we have shown that an evolutionary model constructed from our deep mutational scanning data describes the evolution of human and swine H1 HAs far better than existing state-of-the-art models for sequence evolution. We anticipate that separating HA's inherent mutational tolerance from external selection should also eventually allow the external selection pressures to be studied in greater detail. For example, one might imagine that sites in HA that exhibit evolutionary patterns that deviate from the quantitative model created from our deep mutational scanning are likely to be under external selection. Future work that augments deep mutational scanning with specific experimentally defined selection pressures (such as antibodies against HA) could aid in further elucidation of the forces that shape influenza evolution. It also may be possible to utilize high-throughput experimental data on mutational effects to better estimate the fitness of naturally occurring strains in a way that aids in prediction of the year-to-year strain dynamics of influenza (<xref ref-type="bibr" rid="bib35">Łuksza and Lässig, 2014</xref>).</p><p>The deep mutational scanning also enabled us to assess the extent to which HA's inherent mutational tolerance contributes to influenza’s antigenic evolvability. It remains a mystery why error-prone RNA viruses differ so widely in their capacity for evolutionary escape from immunity, with some (e.g., influenza and HIV) undergoing rapid antigenic evolution while others (e.g., measles) show little antigenic change on relevant timescales (<xref ref-type="bibr" rid="bib34">Lipsitch and O’Hagan, 2007</xref>; <xref ref-type="bibr" rid="bib32">Koelle et al., 2006</xref>; <xref ref-type="bibr" rid="bib27">Heaton et al., 2013</xref>). Our data demonstrate that the antigenic sites in HA are unusually tolerant to mutations, implying that inherent evolutionary plasticity at sites targeted by the immune system is one factor that contributes to influenza's rapid antigenic evolution. This high mutational tolerance at antigenic sites could itself be a property that influenza has evolved to aid in its antigenic escape—or it might simply be an unfortunate coincidence that the immune system focuses on especially plastic portions of HA. In either case, it is intriguing to speculate whether a high inherent mutational tolerance in antigenic sites is also a feature of other antigenically variable RNA viruses. Application of the deep mutational scanning approach used here to additional viruses should provide a means to address this question.</p></sec><sec id="s4" sec-type="materials|methods"><title>Materials and methods</title><sec id="s4-1"><title>Availability of data and computer code</title><p>Illumina sequencing data are available at the SRA, accession SRP040983 (<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/sra/?term=SRP040983">http://www.ncbi.nlm.nih.gov/sra/?term=SRP040983</ext-link>). Source code and a description of the computational process used to analyze the sequencing data and infer the amino-acid preferences is at <ext-link ext-link-type="uri" xlink:href="http://jbloom.github.io/mapmuts/example_WSN_HA_2014Analysis.html">http://jbloom.github.io/mapmuts/example_WSN_HA_2014Analysis.html</ext-link>. Source code and a description of the computational process used for the phylogenetic analyses is available at <ext-link ext-link-type="uri" xlink:href="http://jbloom.github.io/phyloExpCM/example_2014Analysis_Influenza_H1_HA.html">http://jbloom.github.io/phyloExpCM/example_2014Analysis_Influenza_H1_HA.html</ext-link>.</p></sec><sec id="s4-2"><title>HA sequence numbering</title><p>A variety of different numbering schemes for HA are used in the literature. Unless noted otherwise, residues are numbered here using sequential numbering of the WSN HA protein sequence (<xref ref-type="supplementary-material" rid="SD1-data">Supplementary file 1</xref>) starting with one at the N-terminal methionine. In some cases, the number of the corresponding residues in the widely used H3 numbering scheme is also indicated. These numbering systems can be interconverted using the Python script available at <ext-link ext-link-type="uri" xlink:href="https://github.com/jbloom/HA_numbering">https://github.com/jbloom/HA_numbering</ext-link>.</p></sec><sec id="s4-3"><title>Generation of HA codon mutation library</title><p>The HA codon-mutant library was generated using the oligo-based PCR mutagenesis protocol described previously by <xref ref-type="bibr" rid="bib9">Bloom (2014)</xref>. The only differences from that protocol were that HA was used as the template rather than NP, and that only two overall rounds of mutagenesis were performed, rather than the three rounds used by <xref ref-type="bibr" rid="bib9">Bloom (2014)</xref>. This reduction in the number of rounds of mutagenesis reduced the average number of codon mutations from the ≈ three per clone in <xref ref-type="bibr" rid="bib9">Bloom (2014)</xref> to the ≈ two per clone shown in <xref ref-type="fig" rid="fig2">Figure 2</xref>. The libraries were created in full biological triplicate, meaning that each experimental replicate was derived from an independent plasmid mutant library.</p><p>The end primers for the mutagenesis were 5′-cgatcacgtctctgggagcaaaagcaggggaaaataaaaacaac-3′ and 5′-gatacacgtctcatattagtagaaacaagggtgtttttccttatatttctg-3′ (these primers include BsmBI restriction sites). The mutagenic primers were ordered from Integrated DNA Technologies, and are listed in <xref ref-type="supplementary-material" rid="SD2-data">Supplementary file 2</xref>.</p><p>The final products from the codon mutagenesis PCR were gel purified and digested with BsmBI (R0580L; New England Biolabs, Ipswich, Massachusetts). The BsmBI-digested HA was ligated into a dephosphorylated (Antarctic Phosphatase, M0289L; New England Biolabs) and BsmBI-digested preparation of the bidirectional reverse-genetics plasmid pHW2000 (<xref ref-type="bibr" rid="bib28">Hoffmann et al., 2000</xref>) using T4 DNA ligase (M0202S; New England Biolabs). Column-purified ligations were electroporated into ElectroMAX DH10B T1 phage-resistant competent cells (12033-015; Invitrogen, Carlsbad, California) and plated on LB plates containing 100 μg/ml of ampicilin. A 1:4000 dilution of each transformation was plated in parallel to enable estimation of the number of unique transformants—we obtained at least two-million unique colonies per transformation. For each replicate of the codon-mutant library, we performed three transformations to generate approximately six-million independent clones per replicate library. Control ligations lacking an insert yielded at least 100 times fewer colonies, indicating a very low rate of background self-ligation of the pHW2000 plasmid. The transformants from each HA mutant library replicate were pooled, cultured in LB supplemented with ampicillin, and mini-prepped to generate the HA codon mutant plasmid libraries.</p><p>For the Sanger sequencing analysis shown in <xref ref-type="fig" rid="fig2">Figure 2</xref>, we picked and prepped 34 independent colonies for sequencing. The full analysis of this Sanger sequencing is available at <ext-link ext-link-type="uri" xlink:href="https://github.com/jbloom/SangerMutantLibraryAnalysis/tree/v0.2">https://github.com/jbloom/SangerMutantLibraryAnalysis/tree/v0.2</ext-link>.</p></sec><sec id="s4-4"><title>Virus rescue and passage in cells</title><p>The HA mutant plasmid libraries were used to generate pools of mutant influenza viruses by reverse genetics (<xref ref-type="bibr" rid="bib28">Hoffmann et al., 2000</xref>). Cocultures of 293T and MDCK-SIAT1 cells were transfected with equal amounts of HA (either unmutated or one of the plasmid mutant libraries) cloned into pHW2000 as described above, plus the seven other WSN genes in bidirectional reverse-genetics plasmids (pHW181-PB2, pHW182-PB1, pHW183-PA, pHW185-NP, pHW186-NA, pHW187-NA, pHW188-NS), which were kind gifts from Robert Webster of St. Jude Children's Research Hospital. Overall, six viral rescues and passages were performed, each using a different HA plasmid preparation: the three HA mutant library replicates (eventually yielding the <bold>mutvirus</bold> samples in <xref ref-type="fig" rid="fig1">Figure 1</xref>) and three independent unmutated HAs (eventually yielding the <bold>virus</bold> samples in <xref ref-type="fig" rid="fig1">Figure 1</xref>).</p><p>Each of the viral rescues was performed by transfecting multiple wells of cells in an effort to increase the diversity of the rescued viruses. Specifically, two 12-well dishes were transfected per rescue. Cells were plated at 2 × 10<sup>5</sup> 293T cells and 5 × 10<sup>4</sup> MDCK-SIAT1 cells per well in D10 (DMEM supplemented with 10% heat-inactivated FBS, 2 mM L-glutamine, 100 U of penicillin/ml, and 100 μg of streptomycin/ml), and then each well was transfected with 1 μg of total plasmid DNA (125 ng of each of the eight plasmids) using the BioT transfection reagent (Bioland B01-02, Paramount, California). At 12 to 18 hr post-transfection, the medium was changed to our WSN viral growth media: Opti-MEM supplemented with 0.5% heat-inactivated FBS, 0.3% BSA, 100 U of penicillin/ml, 100 μg of streptomycin/ml, and 100 μg of calcium chloride/ml. This media does not contain trypsin since viruses with the WSN HA and NA are trypsin independent (<xref ref-type="bibr" rid="bib25">Goto and Kawaoka, 1998</xref>). Viral supernatants were collected 72 hr post-transfection, and the supernatants from the different wells were pooled for each viral rescue. These pooled supernatants were then clarified by centrifugation at 2000×<italic>g</italic> for 5 min, aliquoted, and frozen at −80 <sup><italic>o</italic></sup>C. Aliquots were then thawed and titered by TCID50 (see below).</p><p>For viral passage, each viral rescue replicate was passaged in four 10-cm dishes. Briefly, 6 × 10<sup>6</sup> MDCK-SIAT1 cells per 10-cm dish in <italic>WSN viral growth media</italic> were infected with 6×10<sup>5</sup> infectious particles (multiplicity of infection of 0.1). Since there are four dishes for each replicate, this maintains a diversity of 2.4 × 10<sup>6</sup> TCID50 units per replicate. The passaged viral supernatants were collected at 50 hr post-infection, and the supernatants for the four plates were pooled for each replicate. These pooled supernatants were clarified at 2000 × g for 5 min, aliquoted, and frozen at −80 <sup><italic>o</italic></sup>C. Aliquots were then thawed and titered by TCID50.</p></sec><sec id="s4-5"><title>Virus titering by TCID50</title><p>The viruses were titered by TCID50 (50% tissue culture infectious dose). In this assay, 10 μl of a 1:10 dilution of the viral supernatant to be titered was added to the first row of a 96-well tissue culture plate containing 90 μl of WSN viral growth media. At least one no-virus control supernatant was included on each plate as a negative control. The virus was then serially diluted 1:10 down the rows of the plates, and then 5 × 10<sup>3</sup> MDCK-SIAT1 cells were added to each well. The plates were then incubated at 37<sup><italic>°</italic></sup>C, and scored for cytopathic effects caused by viral growth after for 65–72 hr. Virus titers were calculated by the method of <xref ref-type="bibr" rid="bib46">Reed and Muench (1938)</xref> implemented via the Python script at <ext-link ext-link-type="uri" xlink:href="https://github.com/jbloom/reedmuenchcalculator">https://github.com/jbloom/reedmuenchcalculator</ext-link>.</p></sec><sec id="s4-6"><title>Generation of samples for Illumina deep sequencing</title><p>The deep sequencing samples were prepared from PCR amplicons that were generated exactly as described for the <bold>DNA</bold>, <bold>mutDNA</bold>, <bold>virus</bold>, and <bold>mutvirus</bold> samples in <xref ref-type="bibr" rid="bib9">Bloom (2014)</xref>. The viral RNA template for the <bold>virus</bold> and <bold>mutvirus</bold> were isolated using freshly purchased Trizol reagent (15596-026; Life Technologies) in order to avoid any oxidative damage associated with old reagents. After performing reverse transcription as described in <xref ref-type="bibr" rid="bib9">Bloom (2014)</xref>, quantitative PCR (qPCR) was used to quantify the number of HA cDNA molecules to ensure that there were at least 10<sup>6</sup> unique template molecules before beginning the subsequent PCR amplification. The qPCR primers were designed based on those described by <xref ref-type="bibr" rid="bib36">Marsh et al. (2007)</xref>, and were 5′-taacctgctcgaagacagcc-3′ and 5′-agagccatccggtgatgtta-3′.</p><p>The PCR amplicons were fragmented and barcoded using the custom modification of Illumina's Nextera kit using the protocol described in <xref ref-type="bibr" rid="bib9">Bloom (2014)</xref>. Samples were barcoded as follows: <bold>DNA</bold>–N701, <bold>mutDNA</bold>–N702, <bold>virus</bold>–N704, and <bold>mutvirus</bold>–N705. For each of the three biological replicates, these four samples were pooled and sequenced on their own Illumina lane with 50-nucleotide paired-end reads as described in <xref ref-type="bibr" rid="bib9">Bloom (2014)</xref>. For the technical sequencing repeat of biological replicate #1, the library preparation and sequencing were repeated from the same viral RNA templates. This technical repeat therefore only quantifies variation associated with sample preparation and sequencing, whereas the biological replicates also quantify variation associated with the processes of codon-mutant library creation, virus generation, and virus passage.</p></sec><sec id="s4-7"><title>Analysis of deep sequencing data</title><p>The deep sequencing data was analyzed using the <italic>mapmuts</italic> computer program (<xref ref-type="bibr" rid="bib9">Bloom, 2014</xref>). A description of the analysis approach and the resulting data files and figures produced are available at <ext-link ext-link-type="uri" xlink:href="http://jbloom.github.io/mapmuts/example_WSN_HA_2014Analysis.html">http://jbloom.github.io/mapmuts/example_WSN_HA_2014Analysis.html</ext-link>.</p><p>Briefly, paired reads were overlapped as illustrated in <xref ref-type="fig" rid="fig3s1">Figure 3—figure supplement 1</xref> and then aligned to HA. Reads were retained only if both reads in the pair passed the default Illumina filter, had average Q-scores of at least 25, overlapped for at least 30 nucleotides with no more than one mismatch, and the overlap aligned to the HA gene with no more than six mismatches. <xref ref-type="fig" rid="fig3s2">Figure 3—figure supplement 2</xref> shows the number of reads for each sample that met these criteria. Most reads that did not meet these criteria failed to do so because they could not be paired with at least 30 nucleotides of overlap—a situation that arises when the HA fragment produced by the Nextera fragmentation produces a fragment smaller than 30 nucleotides or larger than 70 nucleotides. Codon identities were called only if both overlapped paired reads agreed on the identity of the codon. This requirement reduces the error rate, because it is rare for both paired reads to independently experience the same sequencing error.</p><p>As shown in <xref ref-type="fig" rid="fig4">Figure 4</xref>, we estimated that 85% of possible codon mutations were sampled at least five times by the mutant viruses. To estimate the fraction of amino-acid mutations that would have been sampled, we simulated randomly selecting 85% of the mutant codons from the HA sequence, and determined that these codons encoded ≈97% of the amino-acid mutations.</p></sec><sec id="s4-8"><title>Inference of amino-acid preferences and site entropies</title><p>The counts of each codon identity in the deep sequencing data was used to infer the ‘preference’ of each site for each amino acid as described in <xref ref-type="bibr" rid="bib9">Bloom (2014)</xref>. This inference was also done using the <italic>mapmuts</italic> computer program as detailed at <ext-link ext-link-type="uri" xlink:href="http://jbloom.github.io/mapmuts/example_WSN_HA_2014Analysis.html">http://jbloom.github.io/mapmuts/example_WSN_HA_2014Analysis.html</ext-link>.</p><p>Briefly, the preference <inline-formula><mml:math id="inf2"><mml:mrow><mml:msub><mml:mi>π</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>a</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> of site <italic>r</italic> for amino-acid <italic>a</italic> represents the expected frequency of that amino acid in a hypothetical library where each amino-acid is introduced at equal frequency. Specifically, the expected frequency <inline-formula><mml:math id="inf3"><mml:mrow><mml:msubsup><mml:mi>f</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="bold">m</mml:mi><mml:mi mathvariant="bold">u</mml:mi><mml:mi mathvariant="bold">t</mml:mi><mml:mi mathvariant="bold">v</mml:mi><mml:mi mathvariant="bold">i</mml:mi><mml:mi mathvariant="bold">r</mml:mi><mml:mi mathvariant="bold">u</mml:mi><mml:mi mathvariant="bold">s</mml:mi></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula> of mutant codon <italic>x</italic> at site <italic>r</italic> in the <bold>mutvirus</bold> sample is related to the preference for its encoded amino-acid <inline-formula><mml:math id="inf4"><mml:mrow><mml:mi mathvariant="script">A</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> by<disp-formula id="equ1"><mml:math id="m1"><mml:mrow><mml:msubsup><mml:mi>f</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="bold">m</mml:mi><mml:mi mathvariant="bold">u</mml:mi><mml:mi mathvariant="bold">t</mml:mi><mml:mi mathvariant="bold">v</mml:mi><mml:mi mathvariant="bold">i</mml:mi><mml:mi mathvariant="bold">r</mml:mi><mml:mi mathvariant="bold">u</mml:mi><mml:mi mathvariant="bold">s</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>ρ</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mi>μ</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>×</mml:mo><mml:msub><mml:mi>π</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="script">A</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mstyle displaystyle="true"><mml:mo>∑</mml:mo></mml:mstyle><mml:mi>y</mml:mi></mml:msub><mml:msub><mml:mi>μ</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:msub><mml:mo>×</mml:mo><mml:msub><mml:mi>π</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="script">A</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msub></mml:mrow></mml:mfrac><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>where <inline-formula><mml:math id="inf5"><mml:mrow><mml:msub><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>x</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is the rate at which site <italic>r</italic> is erroneously read to be codon <italic>x</italic>, <inline-formula><mml:math id="inf6"><mml:mrow><mml:msub><mml:mi>ρ</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>x</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is the rate at which site <italic>r</italic> is erroneously reverse-transcribed to codon <italic>x</italic>, and <inline-formula><mml:math id="inf7"><mml:mrow><mml:msub><mml:mi>μ</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>x</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is the rate at which site <italic>r</italic> is mutagenized to codon <italic>x</italic> in the mutant DNA sample. These unknown error and mutation rate parameters are inferred from the <bold>DNA</bold>, <bold>virus</bold>, and <bold>mutvirus</bold> samples using the Bayesian approach described in <xref ref-type="bibr" rid="bib9">Bloom (2014)</xref>. Inferences of the posterior mean preferences <inline-formula><mml:math id="inf8"><mml:mrow><mml:msub><mml:mi>π</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>a</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> were made separately for each replicate of the experiment, and the correlations among these inferences from different replicates are in <xref ref-type="fig" rid="fig6">Figure 6</xref>. The final ‘best’ inferred preferences from the combined data of the three biological replicates were obtained by averaging the preferences obtained from the three biological replicates. These final inferred preferences are provided in <xref ref-type="supplementary-material" rid="SD3-data">Supplementary file 3</xref> and displayed graphically in <xref ref-type="fig" rid="fig5">Figure 5</xref>.</p><p>The site entropies in <xref ref-type="fig" rid="fig10">Figure 10</xref> and <xref ref-type="table" rid="tbl4">Table 4</xref> were calculated from the amino-acid preferences as <inline-formula><mml:math id="inf9"><mml:mrow><mml:msub><mml:mi>h</mml:mi><mml:mi>r</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:munder><mml:mstyle displaystyle="true"><mml:mo>∑</mml:mo></mml:mstyle><mml:mi>a</mml:mi></mml:munder><mml:mtext> </mml:mtext><mml:msub><mml:mi>π</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>a</mml:mi></mml:mrow></mml:msub><mml:mo>×</mml:mo><mml:msub><mml:mrow><mml:mtext>log</mml:mtext></mml:mrow><mml:mn>2</mml:mn></mml:msub><mml:msub><mml:mi>π</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>a</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>. These site entropies are therefore in bits. Higher site entropies indicate a higher inherent mutational tolerance.</p></sec><sec id="s4-9"><title>Alignment of naturally occurring HAs and phylogenetic tree</title><p>The inferred amino-acid preferences were compared to amino-acid frequencies in an alignment of naturally occurring H1N1 HAs from swine and human lineages descended from a close relative of the 1918 virus. Briefly, all full-length H1 HAs from these hosts were downloaded from the Influenza Virus Resource (<xref ref-type="bibr" rid="bib3">Bao et al., 2008</xref>). Up to three sequences per host and year were randomly subsampled and used to build a phylogenetic tree. Clear outliers from the molecular clock (typically lab artifacts or mis-annotated sequences) were iteratively excluded and the trees were rebuilt. The final sequence alignment is in <xref ref-type="supplementary-material" rid="SD4-data">Supplementary file 4</xref>. This alignment was used to build the phylogenetic trees in <xref ref-type="fig" rid="fig8">Figure 8</xref> and <xref ref-type="fig" rid="fig8s1">Figure 8—figure supplement 1</xref> with <italic>codonPhyML</italic> (<xref ref-type="bibr" rid="bib21">Gil et al., 2013</xref>) using the codon-substitution model of (<xref ref-type="bibr" rid="bib22">Goldman and Yang, 1994</xref>) or (<xref ref-type="bibr" rid="bib33">Kosiol et al., 2007</xref>) with empirical codon frequencies determined using the CF3x4 method (<xref ref-type="bibr" rid="bib40">Pond et al., 2010</xref>) or the <italic>F</italic> method, respectively. In both cases, the nonsynonymous-synonymous ratio (<italic>ω</italic>) was drawn from four gamma-distributed categories (<xref ref-type="bibr" rid="bib59">Yang et al., 2000</xref>). A description of this process is at <ext-link ext-link-type="uri" xlink:href="http://jbloom.github.io/phyloExpCM/example_2014Analysis_Influenza_H1_HA.html">http://jbloom.github.io/phyloExpCM/example_2014Analysis_Influenza_H1_HA.html</ext-link>.</p></sec><sec id="s4-10"><title>Comparison of evolutionary models</title><p>We compared the accuracy with which the naturally occurring HA phylogeny was described by an evolutionary model based on the experimentally measured amino-acid preferences vs several standard codon-substitution models. These comparisons were used made using <italic>HYPHY</italic> (<xref ref-type="bibr" rid="bib41">Pond et al., 2005</xref>) and <italic>phyloExpCM</italic> (<xref ref-type="bibr" rid="bib9">Bloom, 2014</xref>). A description of this analysis is at <ext-link ext-link-type="uri" xlink:href="http://jbloom.github.io/phyloExpCM/example_2014Analysis_Influenza_H1_HA.html">http://jbloom.github.io/phyloExpCM/example_2014Analysis_Influenza_H1_HA.html</ext-link>.</p><p>Briefly, the phylogenetic tree topology was fixed to that shown in <xref ref-type="fig" rid="fig8">Figure 8</xref> or <xref ref-type="fig" rid="fig8s1">Figure 8—figure supplement 1</xref>. The branch lengths and any free parameters of the evolutionary model were then optimized by maximum likelihood. The experimentally determined evolutionary models were constructed from the inferred amino-acid preferences reported here and the experimentally measured mutation rates reported in <xref ref-type="bibr" rid="bib9">Bloom (2014)</xref>. The ‘fixation probabilities’ were computed using either the Metropolis-like relationship described in <xref ref-type="bibr" rid="bib9">Bloom (2014)</xref> or the relationship proposed by <xref ref-type="bibr" rid="bib26">Halpern and Bruno (1998)</xref>. The results of these comparisons are in <xref ref-type="table" rid="tbl2 tbl3">Tables 2 and 3</xref>. All of these comparisons show that the experimentally determined evolutionary models are far superior to the various standard models.</p></sec><sec id="s4-11"><title>Structural analyses</title><p>The WSN HA studied here has a high degree of sequence identity to the HA crystallized in PDB 1RVX (<xref ref-type="bibr" rid="bib20">Gamblin et al., 2004</xref>). It is this HA structure that is shown <xref ref-type="fig" rid="fig10">Figure 10</xref>. The relative solvent accessibilities (RSA) values in <xref ref-type="fig" rid="fig5">Figure 5</xref> and <xref ref-type="fig" rid="fig10">Figure 10</xref> were calculated by first determining the absolute solvent accessibilities of the residues in the full trimeric HA in PDB 1RVX with the DSSP (<xref ref-type="bibr" rid="bib30">Joosten et al., 2011</xref>) webserver at <ext-link ext-link-type="uri" xlink:href="http://www.cmbi.ru.nl/hsspsoap/">http://www.cmbi.ru.nl/hsspsoap/</ext-link>, and then normalizing by the maximum solvent accessibilities given by <xref ref-type="bibr" rid="bib52">Tien et al. (2013)</xref>.</p></sec><sec id="s4-12"><title>Classification of antigenic sites and conserved receptor-binding residues</title><p>Several sub-classifications of HA residues were performed.</p><p>Conserved receptor-binding sites were any residues listed in the first table of <xref ref-type="bibr" rid="bib38">Martin et al. (1998)</xref> that are also conserved in at least 90% of H1 HAs. These residues are listed in <xref ref-type="table" rid="tbl1">Table 1</xref>.</p><p>All receptor-binding residues were any residues with any atom within 5 Å of the substrate in PDB 1RVX (<xref ref-type="bibr" rid="bib20">Gamblin et al., 2004</xref>). No constraint is placed on whether or not these residues are conserved in natural sequences. The residues that fall into this classification are (in sequential numbering of the WSN HA): 108, 147, 148, 149, 150, 151, 158, 166, 168, 196, 198, 199, 203, 207, 238, 239, and 241.</p><p>The <xref ref-type="bibr" rid="bib12">Caton et al. (1982)</xref> antigenic-site residues are classified based on antigenic mapping of the A/Puerto Rico/8/1934 (H1N1) HA. Specifically, these are any residues listed in the third table of <xref ref-type="bibr" rid="bib12">Caton et al. (1982)</xref> with the following exceptions: residue 182 (H3 numbering) is not considered for the reason explained on page 421 of <xref ref-type="bibr" rid="bib12">Caton et al. (1982)</xref>, residue 273 (H3 numbering) is not considered for the reason explained on page 422 of <xref ref-type="bibr" rid="bib12">Caton et al. (1982)</xref>, and residue 129 (H3 numbering) is not considered because it has gained a glycosylation site in the WSN HA that is not present in the A/Puerto Rico/8/1934 (H1N1) HA and mutation of this WSN glycosylation site can strongly affect viral growth (<xref ref-type="bibr" rid="bib14">Deom et al., 1986</xref>). Overall, this gives the following set of antigenic residues, listed by sequential numbering of the WSN HA with the H3 number in parentheses: 171 (158), 173 (160), 175 (162), 176 (163), 178 (165), 179 (166), 180 (167), 169 (156), 172 (159), 205 (192), 206 (193), 209 (196), 211 (198), 182 (169), 186 (173), 220 (207), 253 (240), 153 (140), 156 (143), 158 (145), 237 (224), 238 (225), 87 (78), 88 (79), 90 (81), 91 (82), 92 (83), and 135 (122).</p><p>A second classification is done that includes the <xref ref-type="bibr" rid="bib12">Caton et al. (1982)</xref> <italic>plus</italic> any surface-exposed residues that are in contact with these residues, using an <italic>α</italic>-carbon to <italic>α</italic>-carbon distance of ≤6.0<italic>Å</italic> as the threshold for being in contact and classifying residues are solvent-exposed if they have an RSA of at least 20%. The rationale for this second classification is that the mapping by <xref ref-type="bibr" rid="bib12">Caton et al. (1982)</xref> may have been biased towards inherently variable sites, and so other surface-exposed residues that contact these sites could also be antigenic. This classification adds the following 28 residues (listed by sequential numbering of the WSN HA) to the 29 <xref ref-type="bibr" rid="bib12">Caton et al. (1982)</xref> residues: 85, 86, 89, 126, 132, 136, 137, 138, 142, 148, 150, 154, 155, 157, 170, 184, 185, 187, 202, 203, 207, 210, 212, 221, 235, 236, 239, and 252.</p></sec></sec></body><back><ack id="ack"><title>Acknowledgements</title><p>We thank Paul Edlefsen for assistance with the multiple linear regression. We thank Hugh Haddox for helpful comments on the manuscript.</p></ack><sec sec-type="additional-information"><title>Additional information</title><fn-group content-type="competing-interest"><title>Competing interests</title><fn fn-type="conflict" id="conf1"><p>The authors declare that no competing interests exist.</p></fn></fn-group><fn-group content-type="author-contribution"><title>Author contributions</title><fn fn-type="con" id="con1"><p>BT, Conception and design, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article</p></fn><fn fn-type="con" id="con2"><p>JDB, Conception and design, Analysis and interpretation of data, Drafting or revising the article</p></fn></fn-group></sec><sec sec-type="supplementary-material"><title>Additional files</title><supplementary-material id="SD1-data"><object-id pub-id-type="doi">10.7554/eLife.03300.025</object-id><label>Supplementary file 1.</label><caption><p>The coding sequence of the WSN HA gene used in this study is provided in FASTA format.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03300.025">http://dx.doi.org/10.7554/eLife.03300.025</ext-link></p></caption><media mime-subtype="fasta" mimetype="chemical" xlink:href="elife03300s001.fasta"/></supplementary-material><supplementary-material id="SD2-data"><object-id pub-id-type="doi">10.7554/eLife.03300.026</object-id><label>Supplementary file 2.</label><caption><p>An Excel file listing the oligonucleotides used for the codon mutagenesis.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03300.026">http://dx.doi.org/10.7554/eLife.03300.026</ext-link></p></caption><media mime-subtype="xls" mimetype="application" xlink:href="elife03300s002.xls"/></supplementary-material><supplementary-material id="SD3-data"><object-id pub-id-type="doi">10.7554/eLife.03300.027</object-id><label>Supplementary file 3.</label><caption><p>The site-specific amino-acid preferences as computed from the averages of the three unique replicates are provided in this supplementary file in text format. This is the file <italic>combined_equilibriumpreferences.txt</italic> described at <ext-link ext-link-type="uri" xlink:href="http://jbloom.github.io/mapmuts/example_WSN_HA_2014Analysis.html">http://jbloom.github.io/mapmuts/example_WSN_HA_2014Analysis.html</ext-link>.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03300.027">http://dx.doi.org/10.7554/eLife.03300.027</ext-link></p></caption><media mime-subtype="txt" mimetype="text" xlink:href="elife03300s003.txt"/></supplementary-material><supplementary-material id="SD4-data"><object-id pub-id-type="doi">10.7554/eLife.03300.028</object-id><label>Supplementary file 4.</label><caption><p>The alignment of human and swine HA sequences used to build the phylogenetic trees are provided in this supplementary file in FASTA format. This is the file <italic>H1_HumanSwine_alignment.fasta</italic> described at <ext-link ext-link-type="uri" xlink:href="http://jbloom.github.io/phyloExpCM/example_2014Analysis_Influenza_H1_HA.html">http://jbloom.github.io/phyloExpCM/example_2014Analysis_Influenza_H1_HA.html</ext-link>.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03300.028">http://dx.doi.org/10.7554/eLife.03300.028</ext-link></p></caption><media mime-subtype="fasta" mimetype="chemical" xlink:href="elife03300s004.fasta"/></supplementary-material><sec sec-type="datasets"><title>Major dataset</title><p>The following dataset was generated:</p><p><related-object content-type="generated-dataset" document-id="Dataset ID and/or url" document-id-type="dataset" document-type="data" id="dataro1"><name><surname>Bargavi</surname><given-names>Thyagarajan</given-names></name>, <name><surname>Jesse</surname><given-names>Bloom</given-names></name>, <year>2014</year><x>, </x><source>deep mutational scanning of WSN influenza hemagglutinin</source><x>, </x><ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/biosample/SAMN02719578">http://www.ncbi.nlm.nih.gov/biosample/SAMN02719578</ext-link><x>, </x><comment>Publicly available at NCBI BioSample.</comment></related-object></p></sec></sec><ref-list><title>References</title><ref id="bib1"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Adey</surname><given-names>A</given-names></name><name><surname>Morrison</surname><given-names>HG</given-names></name><name><surname>Xun</surname><given-names>X</given-names></name><name><surname>Morrison</surname><given-names>HG</given-names></name><name><surname>Asan</surname></name><name><surname>Xun</surname><given-names>X</given-names></name><name><surname>Kitzman</surname><given-names>JO</given-names></name><name><surname>Turner</surname><given-names>EH</given-names></name><name><surname>Stackhouse</surname><given-names>B</given-names></name><name><surname>MacKenzie</surname><given-names>AP</given-names></name><name><surname>Caruccio</surname><given-names>NC</given-names></name><name><surname>Zhang</surname><given-names>X</given-names></name><name><surname>Shendure</surname><given-names>J</given-names></name></person-group><year>2010</year><article-title>Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition</article-title><source>Genome Biology</source><volume>11</volume><fpage>R119</fpage><pub-id pub-id-type="doi">10.1186/gb-2010-11-12-r119</pub-id></element-citation></ref><ref id="bib2"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Araya</surname><given-names>CL</given-names></name><name><surname>Fowler</surname><given-names>DM</given-names></name></person-group><year>2011</year><article-title>Deep mutational scanning: assessing protein function on a massive scale</article-title><source>Trends in Biotechnology</source><volume>29</volume><fpage>435</fpage><lpage>442</lpage><pub-id pub-id-type="doi">10.1016/j.tibtech.2011.04.003</pub-id></element-citation></ref><ref id="bib3"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Bao</surname><given-names>Y</given-names></name><name><surname>Bolotov</surname><given-names>P</given-names></name><name><surname>Dernovoy</surname><given-names>D</given-names></name><name><surname>Kiryutin</surname><given-names>B</given-names></name><name><surname>Zaslavsky</surname><given-names>L</given-names></name><name><surname>Tatusova</surname><given-names>T</given-names></name><name><surname>Ostell</surname><given-names>J</given-names></name><name><surname>Lipman</surname><given-names>D</given-names></name></person-group><year>2008</year><article-title>The influenza virus resource at the national center for biotechnology information</article-title><source>Journal of Virology</source><volume>82</volume><fpage>596</fpage><lpage>601</lpage><pub-id pub-id-type="doi">10.1128/JVI.02005-07</pub-id></element-citation></ref><ref id="bib4"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Bedford</surname><given-names>T</given-names></name><name><surname>Suchard</surname><given-names>MA</given-names></name><name><surname>Lemey</surname><given-names>P</given-names></name><name><surname>Dudas</surname><given-names>G</given-names></name><name><surname>Gregory</surname><given-names>V</given-names></name><name><surname>Hay</surname><given-names>AJ</given-names></name><name><surname>McCauley</surname><given-names>JW</given-names></name><name><surname>Russell</surname><given-names>CA</given-names></name><name><surname>Smith</surname><given-names>DJ</given-names></name><name><surname>Rambaut</surname><given-names>A</given-names></name></person-group><year>2014</year><article-title>Integrating influenza antigenic dynamics with molecular evolution</article-title><source>eLife</source><volume>3</volume><fpage>e01914</fpage><pub-id pub-id-type="doi">10.7554/eLife.01914</pub-id></element-citation></ref><ref id="bib5"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Berkhoff</surname><given-names>E</given-names></name><name><surname>De Wit</surname><given-names>E</given-names></name><name><surname>Geelhoed-Mieras</surname><given-names>M</given-names></name><name><surname>Boon</surname><given-names>A</given-names></name><name><surname>Symons</surname><given-names>J</given-names></name><name><surname>Fouchier</surname><given-names>R</given-names></name><name><surname>Osterhaus</surname><given-names>A</given-names></name><name><surname>Rimmelzwaan</surname><given-names>G</given-names></name></person-group><year>2005</year><article-title>Functional constraints of influenza A virus epitopes limit escape from cytotoxic T lymphocytes</article-title><source>Journal of Virology</source><volume>79</volume><fpage>11239</fpage><lpage>11246</lpage><pub-id pub-id-type="doi">10.1128/JVI.79.17.11239-11246.2005</pub-id></element-citation></ref><ref id="bib6"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Berkhoff</surname><given-names>E</given-names></name><name><surname>de Wit</surname><given-names>E</given-names></name><name><surname>Geelhoed-Mieras</surname><given-names>M</given-names></name><name><surname>Boon</surname><given-names>A</given-names></name><name><surname>Symons</surname><given-names>J</given-names></name><name><surname>Fouchier</surname><given-names>R</given-names></name><name><surname>Osterhaus</surname><given-names>A</given-names></name><name><surname>Rimmelzwaan</surname><given-names>G</given-names></name></person-group><year>2006</year><article-title>Fitness costs limit escape from cytotoxic T lymphocytes by influenza A viruses</article-title><source>Vaccine</source><volume>24</volume><fpage>6594</fpage><lpage>6596</lpage><pub-id pub-id-type="doi">10.1016/j.vaccine.2006.05.051</pub-id></element-citation></ref><ref id="bib7"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Berkhoff</surname><given-names>E</given-names></name><name><surname>Geelhoed-Mieras</surname><given-names>M</given-names></name><name><surname>Fouchier</surname><given-names>R</given-names></name><name><surname>Osterhaus</surname><given-names>A</given-names></name><name><surname>Rimmelzwaan</surname><given-names>G</given-names></name></person-group><year>2007</year><article-title>Assessment of the extent of variation in influenza A virus cytotoxic T-lymphocyte epitopes by using virus-specific CD8+ T-cell clones</article-title><source>The Journal of General Virology</source><volume>88</volume><fpage>530</fpage><lpage>535</lpage><pub-id pub-id-type="doi">10.1099/vir.0.82120-0</pub-id></element-citation></ref><ref id="bib8"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Bhatt</surname><given-names>S</given-names></name><name><surname>Holmes</surname><given-names>EC</given-names></name><name><surname>Pybus</surname><given-names>OG</given-names></name></person-group><year>2011</year><article-title>The genomic rate of molecular adaptation of the human influenza a virus</article-title><source>Molecular Biology and Evolution</source><volume>28</volume><fpage>2443</fpage><pub-id pub-id-type="doi">10.1093/molbev/msr044</pub-id></element-citation></ref><ref id="bib9"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Bloom</surname><given-names>JD</given-names></name></person-group><year>2014</year><article-title>An experimentally determined evolutionary model dramatically improves phylogenetic fit</article-title><source>Molecular Biology and Evolution</source><volume>31</volume><fpage>1956</fpage><lpage>1978</lpage><pub-id pub-id-type="doi">10.1093/molbev/msu173</pub-id></element-citation></ref><ref id="bib10"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Burton</surname><given-names>DR</given-names></name><name><surname>Poignard</surname><given-names>P</given-names></name><name><surname>Stanfield</surname><given-names>RL</given-names></name><name><surname>Wilson</surname><given-names>IA</given-names></name></person-group><year>2012</year><article-title>Broadly neutralizing antibodies present new prospects to counter highly antigenically diverse viruses</article-title><source>Science</source><volume>337</volume><fpage>183</fpage><lpage>186</lpage><pub-id pub-id-type="doi">10.1126/science.1225416</pub-id></element-citation></ref><ref id="bib11"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Bustamante</surname><given-names>CD</given-names></name><name><surname>Townsend</surname><given-names>JP</given-names></name><name><surname>Hartl</surname><given-names>DL</given-names></name></person-group><year>2000</year><article-title>Solvent accessibility and purifying selection within proteins of escherichia coli and salmonella enterica</article-title><source>Molecular Biology and Evolution</source><volume>17</volume><fpage>301</fpage><lpage>308</lpage><pub-id pub-id-type="doi">10.1093/oxfordjournals.molbev.a026310</pub-id></element-citation></ref><ref id="bib12"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Caton</surname><given-names>AJ</given-names></name><name><surname>Brownlee</surname><given-names>GG</given-names></name><name><surname>Yewdell</surname><given-names>JW</given-names></name><name><surname>Gerhard</surname><given-names>W</given-names></name></person-group><year>1982</year><article-title>The antigenic structure of the influenza virus A/PR/8/34 hemagglutinin (H1 subtype)</article-title><source>Cell</source><volume>31</volume><fpage>417</fpage><lpage>427</lpage><pub-id pub-id-type="doi">10.1016/0092-8674(82)90135-0</pub-id></element-citation></ref><ref id="bib13"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Das</surname><given-names>SR</given-names></name><name><surname>Hensley</surname><given-names>SE</given-names></name><name><surname>Ince</surname><given-names>WL</given-names></name><name><surname>Brooke</surname><given-names>CB</given-names></name><name><surname>Subba</surname><given-names>A</given-names></name><name><surname>Delboy</surname><given-names>MG</given-names></name><name><surname>Russ</surname><given-names>G</given-names></name><name><surname>Gibbs</surname><given-names>JS</given-names></name><name><surname>Bennink</surname><given-names>JR</given-names></name><name><surname>Yewdell</surname><given-names>JW</given-names></name></person-group><year>2013</year><article-title>Defining influenza a virus hemagglutinin antigenic drift by sequential monoclonal antibody selection</article-title><source>Cell Host & Microbe</source><volume>13</volume><fpage>314</fpage><lpage>323</lpage><pub-id pub-id-type="doi">10.1016/j.chom.2013.02.008</pub-id></element-citation></ref><ref id="bib14"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Deom</surname><given-names>CM</given-names></name><name><surname>Caton</surname><given-names>AJ</given-names></name><name><surname>Schulze</surname><given-names>IT</given-names></name></person-group><year>1986</year><article-title>Host cell-mediated selection of a mutant influenza A virus that has lost a complex oligosaccharide from the tip of the hemagglutinin</article-title><source>Proceedings of the National Academy of Sciences of the United States of America</source><volume>83</volume><fpage>3771</fpage><lpage>3775</lpage><pub-id pub-id-type="doi">10.1073/pnas.83.11.3771</pub-id></element-citation></ref><ref id="bib15"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Duffy</surname><given-names>S</given-names></name><name><surname>Shackelton</surname><given-names>LA</given-names></name><name><surname>Holmes</surname><given-names>EC</given-names></name></person-group><year>2008</year><article-title>Rates of evolutionary change in viruses: patterns and determinants</article-title><source>Nature Reviews Genetics</source><volume>9</volume><fpage>267</fpage><lpage>276</lpage><pub-id pub-id-type="doi">10.1038/nrg2323</pub-id></element-citation></ref><ref id="bib16"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Felsenstein</surname><given-names>J</given-names></name></person-group><year>1973</year><article-title>Maximum likelihood and minimum-step methods for estimating evolutionary trees from data on discrete characters</article-title><source>Systematic Zoology</source><volume>22</volume><fpage>240</fpage><lpage>249</lpage><pub-id pub-id-type="doi">10.2307/2412304</pub-id></element-citation></ref><ref id="bib17"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Felsenstein</surname><given-names>J</given-names></name></person-group><year>1981</year><article-title>Evolutionary trees from DNA sequences: a maximum likelihood approach</article-title><source>Journal of Molecular Evolution</source><volume>17</volume><fpage>368</fpage><lpage>376</lpage><pub-id pub-id-type="doi">10.1007/BF01734359</pub-id></element-citation></ref><ref id="bib18"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Firnberg</surname><given-names>E</given-names></name><name><surname>Ostermeier</surname><given-names>M</given-names></name></person-group><year>2012</year><article-title>PFunkel: efficient, expansive, user-defined mutagenesis</article-title><source>PLOS ONE</source><volume>7</volume><fpage>e52031</fpage><pub-id pub-id-type="doi">10.1371/journal.pone.0052031</pub-id></element-citation></ref><ref id="bib19"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Fowler</surname><given-names>DM</given-names></name><name><surname>Araya</surname><given-names>CL</given-names></name><name><surname>Fleishman</surname><given-names>SJ</given-names></name><name><surname>Kellogg</surname><given-names>EH</given-names></name><name><surname>Stephany</surname><given-names>JJ</given-names></name><name><surname>Baker</surname><given-names>D</given-names></name><name><surname>Fields</surname><given-names>S</given-names></name></person-group><year>2010</year><article-title>High-resolution mapping of protein sequence-function relationships</article-title><source>Nature Methods</source><volume>7</volume><fpage>741</fpage><lpage>746</lpage><pub-id pub-id-type="doi">10.1038/nmeth.1492</pub-id></element-citation></ref><ref id="bib20"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Gamblin</surname><given-names>SJ</given-names></name><name><surname>Haire</surname><given-names>LF</given-names></name><name><surname>Russell</surname><given-names>RJ</given-names></name><name><surname>Haire LF</surname></name><name><surname>Russell</surname><given-names>RJ</given-names></name><name><surname>Stevens</surname><given-names>DJ</given-names></name><name><surname>Xiao</surname><given-names>B</given-names></name><name><surname>Ha</surname><given-names>Y</given-names></name><name><surname>Vasisht</surname><given-names>N</given-names></name><name><surname>Steinhauer</surname><given-names>DA</given-names></name><name><surname>Daniels</surname><given-names>RS</given-names></name><name><surname>Elliot</surname><given-names>A</given-names></name><name><surname>Wiley</surname><given-names>DC</given-names></name><name><surname>Skehel</surname><given-names>JJ</given-names></name></person-group><year>2004</year><article-title>The structure and receptor binding properties of the 1918 influenza hemagglutinin</article-title><source>Science</source><volume>303</volume><fpage>1838</fpage><lpage>1842</lpage><pub-id pub-id-type="doi">10.1126/science.1093155</pub-id></element-citation></ref><ref id="bib21"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Gil</surname><given-names>M</given-names></name><name><surname>Zanetti</surname><given-names>MS</given-names></name><name><surname>Zoller</surname><given-names>S</given-names></name><name><surname>Anisimova</surname><given-names>M</given-names></name></person-group><year>2013</year><article-title>Codonphyml: fast maximum likelihood phylogeny estimation under codon substitution models</article-title><source>Molecular Biology and Evolution</source><volume>30</volume><fpage>1270</fpage><lpage>1280</lpage><pub-id pub-id-type="doi">10.1093/molbev/mst034</pub-id></element-citation></ref><ref id="bib22"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Goldman</surname><given-names>N</given-names></name><name><surname>Yang</surname><given-names>Z</given-names></name></person-group><year>1994</year><article-title>A codon-based model of nucleotide substitution probabilities for protein-coding DNA sequences</article-title><source>Molecular Biology and Evolution</source><volume>11</volume><fpage>725</fpage><lpage>736</lpage></element-citation></ref><ref id="bib23"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Gong</surname><given-names>LI</given-names></name><name><surname>Bloom</surname><given-names>JD</given-names></name></person-group><year>2014</year><article-title>Epistatically interacting substitutions are enriched during adaptive protein evolution</article-title><source>PLOS Genetics</source><volume>10</volume><fpage>e1004328</fpage><pub-id pub-id-type="doi">10.1371/journal.pgen.1004328</pub-id></element-citation></ref><ref id="bib24"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Gong</surname><given-names>LI</given-names></name><name><surname>Suchard</surname><given-names>MA</given-names></name><name><surname>Bloom</surname><given-names>JD</given-names></name></person-group><year>2013</year><article-title>Stability-mediated epistasis constrains the evolution of an influenza protein</article-title><source>eLife</source><volume>2</volume><fpage>e00631</fpage><pub-id pub-id-type="doi">10.7554/eLife.00631</pub-id></element-citation></ref><ref id="bib25"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Goto</surname><given-names>H</given-names></name><name><surname>Kawaoka</surname><given-names>Y</given-names></name></person-group><year>1998</year><article-title>A novel mechanism for the acquisition of virulence by a human influenza A virus</article-title><source>Proceedings of the National Academy of Sciences of the United States of America</source><volume>95</volume><fpage>10224</fpage><lpage>10228</lpage><pub-id pub-id-type="doi">10.1073/pnas.95.17.10224</pub-id></element-citation></ref><ref id="bib26"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Halpern</surname><given-names>AL</given-names></name><name><surname>Bruno</surname><given-names>WJ</given-names></name></person-group><year>1998</year><article-title>Evolutionary distances for protein-coding sequences: modeling site-specific residue frequencies</article-title><source>Molecular Biology and Evolution</source><volume>15</volume><fpage>910</fpage><lpage>917</lpage><pub-id pub-id-type="doi">10.1093/oxfordjournals.molbev.a025995</pub-id></element-citation></ref><ref id="bib27"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Heaton</surname><given-names>NS</given-names></name><name><surname>Sachs</surname><given-names>D</given-names></name><name><surname>Chen</surname><given-names>CJ</given-names></name><name><surname>Hai</surname><given-names>R</given-names></name><name><surname>Palese</surname><given-names>P</given-names></name></person-group><year>2013</year><article-title>Genome-wide mutagenesis of influenza virus reveals unique plasticity of the hemagglutinin and ns1 proteins</article-title><source>Proceedings of the National Academy of Sciences of the United States of America</source><volume>110</volume><fpage>20248</fpage><lpage>20253</lpage><pub-id pub-id-type="doi">10.1073/pnas.1320524110</pub-id></element-citation></ref><ref id="bib28"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hoffmann</surname><given-names>E</given-names></name><name><surname>Neumann</surname><given-names>G</given-names></name><name><surname>Kawaoka</surname><given-names>Y</given-names></name><name><surname>Hobom</surname><given-names>G</given-names></name><name><surname>Webster</surname><given-names>RG</given-names></name></person-group><year>2000</year><article-title>A DNA transfection system for generation of influenza A virus from eight plasmids</article-title><source>Proceedings of the National Academy of Sciences of the United States of America</source><volume>97</volume><fpage>6108</fpage><lpage>6113</lpage><pub-id pub-id-type="doi">10.1073/pnas.100133697</pub-id></element-citation></ref><ref id="bib29"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Jain</surname><given-names>PC</given-names></name><name><surname>Varadarajan</surname><given-names>R</given-names></name></person-group><year>2014</year><article-title>A rapid, efficient, and economical inverse polymerase chain reaction-based method for generating a site saturation mutant library</article-title><source>Analytical Biochemistry</source><volume>449</volume><fpage>90</fpage><lpage>98</lpage><pub-id pub-id-type="doi">10.1016/j.ab.2013.12.002</pub-id></element-citation></ref><ref id="bib30"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Joosten</surname><given-names>RP</given-names></name><name><surname>Te Beek</surname><given-names>TA</given-names></name><name><surname>Krieger</surname><given-names>E</given-names></name><name><surname>Hekkelman</surname><given-names>ML</given-names></name><name><surname>Hooft</surname><given-names>RW</given-names></name><name><surname>Schneider</surname><given-names>R</given-names></name><name><surname>Sander</surname><given-names>C</given-names></name><name><surname>Vriend</surname><given-names>G</given-names></name></person-group><year>2011</year><article-title>A series of PDB related databases for everyday needs</article-title><source>Nucleic Acids Research</source><volume>39</volume><fpage>D411</fpage><lpage>D419</lpage><pub-id pub-id-type="doi">10.1093/nar/gkq1105</pub-id></element-citation></ref><ref id="bib31"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Koel</surname><given-names>BF</given-names></name><name><surname>Burke</surname><given-names>DF</given-names></name><name><surname>Bestebroer</surname><given-names>TM</given-names></name><name><surname>Burke</surname><given-names>DF</given-names></name><name><surname>Bestebroer</surname><given-names>TM</given-names></name><name><surname>van der Vliet</surname><given-names>S</given-names></name><name><surname>Zondag</surname><given-names>GC</given-names></name><name><surname>Vervaet</surname><given-names>G</given-names></name><name><surname>Skepner</surname><given-names>E</given-names></name><name><surname>Lewis</surname><given-names>NS</given-names></name><name><surname>Spronken</surname><given-names>MI</given-names></name><name><surname>Russell</surname><given-names>CA</given-names></name><name><surname>Eropkin</surname><given-names>MY</given-names></name><name><surname>Hurt</surname><given-names>AC</given-names></name><name><surname>Barr</surname><given-names>IG</given-names></name><name><surname>de Jong</surname><given-names>JC</given-names></name><name><surname>Rimmelzwaan GF</surname></name><name><surname>Osterhaus</surname><given-names>AD</given-names></name><name><surname>Fouchier</surname><given-names>RA</given-names></name><name><surname>Smith</surname><given-names>DJ</given-names></name></person-group><year>2013</year><article-title>Substitutions near the receptor binding site determine major antigenic change during influenza virus evolution</article-title><source>Science</source><volume>342</volume><fpage>976</fpage><lpage>979</lpage><pub-id pub-id-type="doi">10.1126/science.1244730</pub-id></element-citation></ref><ref id="bib32"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Koelle</surname><given-names>K</given-names></name><name><surname>Cobey</surname><given-names>S</given-names></name><name><surname>Grenfell</surname><given-names>B</given-names></name><name><surname>Pascual</surname><given-names>M</given-names></name></person-group><year>2006</year><article-title>Epochal evolution shapes the phylodynamics of interpandemic influenza A (H3N2) in humans</article-title><source>Science</source><volume>314</volume><fpage>1898</fpage><lpage>1903</lpage><pub-id pub-id-type="doi">10.1126/science.1132745</pub-id></element-citation></ref><ref id="bib33"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kosiol</surname><given-names>C</given-names></name><name><surname>Holmes</surname><given-names>I</given-names></name><name><surname>Goldman</surname><given-names>N</given-names></name></person-group><year>2007</year><article-title>An empirical codon model for protein sequence evolution</article-title><source>Molecular Biology and Evolution</source><volume>24</volume><fpage>1464</fpage><lpage>1479</lpage><pub-id pub-id-type="doi">10.1093/molbev/msm064</pub-id></element-citation></ref><ref id="bib34"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lipsitch</surname><given-names>M</given-names></name><name><surname>O’Hagan</surname><given-names>JJ</given-names></name></person-group><year>2007</year><article-title>Patterns of antigenic diversity and the mechanisms that maintain them</article-title><source>Journal of the Royal Society Interface</source><volume>4</volume><fpage>787</fpage><lpage>802</lpage><pub-id pub-id-type="doi">10.1098/rsif.2007.0229</pub-id></element-citation></ref><ref id="bib35"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Łuksza</surname><given-names>M</given-names></name><name><surname>Lässig</surname><given-names>M</given-names></name></person-group><year>2014</year><article-title>A predictive fitness model for influenza</article-title><source>Nature</source><volume>507</volume><fpage>57</fpage><lpage>61</lpage><pub-id pub-id-type="doi">10.1038/nature13087</pub-id></element-citation></ref><ref id="bib36"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Marsh</surname><given-names>GA</given-names></name><name><surname>Hatami</surname><given-names>R</given-names></name><name><surname>Palese</surname><given-names>P</given-names></name></person-group><year>2007</year><article-title>Specific residues of the influenza A virus hemagglutinin viral RNA are important for efficient packaging into budding virions</article-title><source>Journal of Virology</source><volume>81</volume><fpage>9727</fpage><lpage>9736</lpage><pub-id pub-id-type="doi">10.1128/JVI.01144-07</pub-id></element-citation></ref><ref id="bib37"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Marsh</surname><given-names>GA</given-names></name><name><surname>Rabadán</surname><given-names>R</given-names></name><name><surname>Levine</surname><given-names>AJ</given-names></name><name><surname>Palese</surname><given-names>P</given-names></name></person-group><year>2008</year><article-title>Highly conserved regions of influenza a virus polymerase gene segments are critical for efficient viral RNA packaging</article-title><source>Journal of Virology</source><volume>82</volume><fpage>2295</fpage><lpage>2304</lpage><pub-id pub-id-type="doi">10.1128/JVI.02267-07</pub-id></element-citation></ref><ref id="bib38"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Martin</surname><given-names>J</given-names></name><name><surname>Wharton</surname><given-names>SA</given-names></name><name><surname>Lin</surname><given-names>YP</given-names></name><name><surname>Takemoto</surname><given-names>DK</given-names></name><name><surname>Skehel</surname><given-names>JJ</given-names></name><name><surname>Wiley</surname><given-names>DC</given-names></name><name><surname>Steinhauer</surname><given-names>DA</given-names></name></person-group><year>1998</year><article-title>Studies of the binding properties of influenza hemagglutinin receptor-site mutants</article-title><source>Virology</source><volume>241</volume><fpage>101</fpage><lpage>111</lpage><pub-id pub-id-type="doi">10.1006/viro.1997.8958</pub-id></element-citation></ref><ref id="bib39"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Nakajima</surname><given-names>S</given-names></name><name><surname>Brown</surname><given-names>DJ</given-names></name><name><surname>Ueda</surname><given-names>M</given-names></name><name><surname>Nakajima</surname><given-names>K</given-names></name><name><surname>Sugiura</surname><given-names>A</given-names></name><name><surname>Pattnaik</surname><given-names>AK</given-names></name><name><surname>Nayak</surname><given-names>DP</given-names></name></person-group><year>1986</year><article-title>Identification of the defects in the hemagglutinin gene of two temperature-sensitive mutants of A/WSN/33 influenza virus</article-title><source>Virology</source><volume>154</volume><fpage>279</fpage><lpage>285</lpage><pub-id pub-id-type="doi">10.1016/0042-6822(86)90454-X</pub-id></element-citation></ref><ref id="bib40"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pond</surname><given-names>SK</given-names></name><name><surname>Delport</surname><given-names>W</given-names></name><name><surname>Muse</surname><given-names>SV</given-names></name><name><surname>Scheffler</surname><given-names>K</given-names></name></person-group><year>2010</year><article-title>Correcting the bias of empirical frequency parameter estimators in codon models</article-title><source>PLOS ONE</source><volume>5</volume><fpage>e11230</fpage><pub-id pub-id-type="doi">10.1371/journal.pone.0011230</pub-id></element-citation></ref><ref id="bib41"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pond</surname><given-names>SL</given-names></name><name><surname>Frost</surname><given-names>SD</given-names></name><name><surname>Muse</surname><given-names>SV</given-names></name></person-group><year>2005</year><article-title>Hyphy: hypothesis testing using phylogenies</article-title><source>Bioinformatics</source><volume>21</volume><fpage>676</fpage><lpage>679</lpage><pub-id pub-id-type="doi">10.1093/bioinformatics/bti079</pub-id></element-citation></ref><ref id="bib42"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Posada</surname><given-names>D</given-names></name><name><surname>Buckley</surname><given-names>TR</given-names></name></person-group><year>2004</year><article-title>Model selection and model averaging in phylogenetics: advantages of akaike information criterion and bayesian approaches over likelihood ratio tests</article-title><source>Systematic Biology</source><volume>53</volume><fpage>793</fpage><lpage>808</lpage><pub-id pub-id-type="doi">10.1080/10635150490522304</pub-id></element-citation></ref><ref id="bib43"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Povolotskaya</surname><given-names>IS</given-names></name><name><surname>Kondrashov</surname><given-names>FA</given-names></name></person-group><year>2010</year><article-title>Sequence space and the ongoing expansion of the protein universe</article-title><source>Nature</source><volume>465</volume><fpage>922</fpage><lpage>926</lpage><pub-id pub-id-type="doi">10.1038/nature09105</pub-id></element-citation></ref><ref id="bib44"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Qiao</surname><given-names>H</given-names></name><name><surname>Armstrong</surname><given-names>RT</given-names></name><name><surname>Melikyan</surname><given-names>GB</given-names></name><name><surname>Cohen</surname><given-names>FS</given-names></name><name><surname>White</surname><given-names>JM</given-names></name></person-group><year>1999</year><article-title>A specific point mutant at position 1 of the influenza hemagglutinin fusion peptide displays a hemifusion phenotype</article-title><source>Molecular Biology of the Cell</source><volume>10</volume><fpage>2759</fpage><lpage>2769</lpage><pub-id pub-id-type="doi">10.1091/mbc.10.8.2759</pub-id></element-citation></ref><ref id="bib45"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ramsey</surname><given-names>DC</given-names></name><name><surname>Scherrer</surname><given-names>MP</given-names></name><name><surname>Zhou</surname><given-names>T</given-names></name><name><surname>Wilke</surname><given-names>CO</given-names></name></person-group><year>2011</year><article-title>The relationship between relative solvent accessibility and evolutionary rate in protein evolution</article-title><source>Genetics</source><volume>188</volume><fpage>479</fpage><lpage>488</lpage><pub-id pub-id-type="doi">10.1534/genetics.111.128025</pub-id></element-citation></ref><ref id="bib46"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Reed</surname><given-names>LJ</given-names></name><name><surname>Muench</surname><given-names>H</given-names></name></person-group><year>1938</year><article-title>A simple method of estimating fifty per cent endpoints</article-title><source>American Journal of Epidemiology</source><volume>27</volume><fpage>493</fpage><lpage>497</lpage></element-citation></ref><ref id="bib47"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Rimmelzwaan</surname><given-names>G</given-names></name><name><surname>Berkhoff</surname><given-names>E</given-names></name><name><surname>Nieuwkoop</surname><given-names>N</given-names></name><name><surname>Fouchier</surname><given-names>R</given-names></name><name><surname>Osterhaus</surname><given-names>A</given-names></name></person-group><year>2004</year><article-title>Functional compensation of a detrimental amino acid substitution in a cytotoxic-T-lymphocyte epitope of influenza a viruses by comutations</article-title><source>Journal of Virology</source><volume>78</volume><fpage>8946</fpage><lpage>8949</lpage><pub-id pub-id-type="doi">10.1128/JVI.78.16.8946-8949.2004</pub-id></element-citation></ref><ref id="bib48"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Russell</surname><given-names>R</given-names></name><name><surname>Gamblin</surname><given-names>S</given-names></name><name><surname>Haire</surname><given-names>L</given-names></name><name><surname>Stevens</surname><given-names>D</given-names></name><name><surname>Xiao</surname><given-names>B</given-names></name><name><surname>Ha</surname><given-names>Y</given-names></name><name><surname>Skehel</surname><given-names>J</given-names></name></person-group><year>2004</year><article-title>H1 and H7 influenza haemagglutinin structures extend a structural classification of haemagglutinin subtypes</article-title><source>Virology</source><volume>325</volume><fpage>287</fpage><lpage>296</lpage><pub-id pub-id-type="doi">10.1016/j.virol.2004.04.040</pub-id></element-citation></ref><ref id="bib49"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sheshberadaran</surname><given-names>H</given-names></name><name><surname>Chen</surname><given-names>SN</given-names></name><name><surname>Norrby</surname><given-names>E</given-names></name></person-group><year>1983</year><article-title>Monoclonal antibodies against five structural components of measles virus i. characterization of antigenic determinants on nine strains of measles virus</article-title><source>Virology</source><volume>128</volume><fpage>341</fpage><lpage>353</lpage><pub-id pub-id-type="doi">10.1016/0042-6822(83)90261-1</pub-id></element-citation></ref><ref id="bib50"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Smith</surname><given-names>DJ</given-names></name><name><surname>Lapedes</surname><given-names>AS</given-names></name><name><surname>de Jong</surname><given-names>JC</given-names></name><name><surname>Bestebroer</surname><given-names>TM</given-names></name><name><surname>Rimmelzwaan</surname><given-names>GF</given-names></name><name><surname>Osterhaus</surname><given-names>AD</given-names></name><name><surname>Fouchier</surname><given-names>RA</given-names></name></person-group><year>2004</year><article-title>Mapping the antigenic and genetic evolution of influenza virus</article-title><source>Science</source><volume>305</volume><fpage>371</fpage><lpage>376</lpage><pub-id pub-id-type="doi">10.1126/science.1097211</pub-id></element-citation></ref><ref id="bib51"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Stech</surname><given-names>J</given-names></name><name><surname>Garn</surname><given-names>H</given-names></name><name><surname>Wegmann</surname><given-names>M</given-names></name><name><surname>Wagner</surname><given-names>R</given-names></name><name><surname>Klenk</surname><given-names>H</given-names></name></person-group><year>2005</year><article-title>A new approach to an influenza live vaccine: modification of the cleavage site of hemagglutinin</article-title><source>Nature Medicine</source><volume>11</volume><fpage>683</fpage><lpage>689</lpage><pub-id pub-id-type="doi">10.1038/nm1256</pub-id></element-citation></ref><ref id="bib52"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Tien</surname><given-names>M</given-names></name><name><surname>Meyer</surname><given-names>AG</given-names></name><name><surname>Spielman</surname><given-names>SJ</given-names></name><name><surname>Wilke</surname><given-names>CO</given-names></name></person-group><year>2013</year><article-title>Maximum allowed solvent accessibilites of residues in proteins</article-title><source>PLOS ONE</source><volume>8</volume><fpage>e80635</fpage><pub-id pub-id-type="doi">10.1371/journal.pone.0080635</pub-id></element-citation></ref><ref id="bib53"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Valkenburg</surname><given-names>SA</given-names></name><name><surname>Quiñones-Parra</surname><given-names>S</given-names></name><name><surname>Gras</surname><given-names>S</given-names></name><name><surname>Komadina</surname><given-names>N</given-names></name><name><surname>McVernon</surname><given-names>J</given-names></name><name><surname>Wang</surname><given-names>Z</given-names></name><name><surname>Halim</surname><given-names>H</given-names></name><name><surname>Iannello</surname><given-names>P</given-names></name><name><surname>Cole</surname><given-names>C</given-names></name><name><surname>Laurie</surname><given-names>K</given-names></name><name><surname>Kelso</surname><given-names>A</given-names></name><name><surname>Rossjohn</surname><given-names>J</given-names></name><name><surname>Doherty</surname><given-names>PC</given-names></name><name><surname>Turner</surname><given-names>SJ</given-names></name><name><surname>Kedzierska</surname><given-names>K</given-names></name></person-group><year>2013</year><article-title>Acute emergence and reversion of influenza A virus quasispecies within CD8+ T cell antigenic peptides</article-title><source>Nature Communications</source><volume>4</volume><fpage>2663</fpage><pub-id pub-id-type="doi">10.1038/ncomms3663</pub-id></element-citation></ref><ref id="bib54"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Valkenburg</surname><given-names>SA</given-names></name><name><surname>Rutigliano</surname><given-names>JA</given-names></name><name><surname>Ellebedy</surname><given-names>AH</given-names></name><name><surname>Doherty</surname><given-names>PC</given-names></name><name><surname>Thomas</surname><given-names>PG</given-names></name><name><surname>Kedzierska</surname><given-names>K</given-names></name></person-group><year>2011</year><article-title>Immunity to seasonal and pandemic influenza A viruses</article-title><source>Microbes and Infection</source><volume>13</volume><fpage>489</fpage><lpage>501</lpage><pub-id pub-id-type="doi">10.1016/j.micinf.2011.01.007</pub-id></element-citation></ref><ref id="bib55"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Whittle</surname><given-names>JR</given-names></name><name><surname>Zhang</surname><given-names>R</given-names></name><name><surname>Khurana</surname><given-names>S</given-names></name><name><surname>King</surname><given-names>LR</given-names></name><name><surname>Manischewitz</surname><given-names>J</given-names></name><name><surname>Golding</surname><given-names>H</given-names></name><name><surname>Dormitzer</surname><given-names>PR</given-names></name><name><surname>Haynes</surname><given-names>BF</given-names></name><name><surname>Walter</surname><given-names>EB</given-names></name><name><surname>Moody</surname><given-names>MA</given-names></name><name><surname>Kepler</surname><given-names>TB</given-names></name><name><surname>Liao</surname><given-names>HX</given-names></name><name><surname>Harrison</surname><given-names>SC</given-names></name></person-group><year>2011</year><article-title>Broadly neutralizing human antibody that recognizes the receptor-binding pocket of influenza virus hemagglutinin</article-title><source>Proceedings of the National Academy of Sciences of the United States of America</source><volume>108</volume><fpage>14216</fpage><lpage>14221</lpage><pub-id pub-id-type="doi">10.1073/pnas.1111497108</pub-id></element-citation></ref><ref id="bib56"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wiley</surname><given-names>D</given-names></name><name><surname>Wilson</surname><given-names>I</given-names></name><name><surname>Skehel</surname><given-names>J</given-names></name></person-group><year>1981</year><article-title>Structural identification of the antibody-binding sites of hong kong influenza haemagglutinin and their involvement in antigenic variation</article-title><source>Nature</source><volume>289</volume><fpage>373</fpage><lpage>378</lpage><pub-id pub-id-type="doi">10.1038/289373a0</pub-id></element-citation></ref><ref id="bib57"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wiley</surname><given-names>DC</given-names></name><name><surname>Skehel</surname><given-names>JJ</given-names></name></person-group><year>1987</year><article-title>The structure and function of the hemagglutinin membrane glycoprotein of influenza virus</article-title><source>Annual Review of Biochemistry</source><volume>56</volume><fpage>365</fpage><lpage>394</lpage><pub-id pub-id-type="doi">10.1146/annurev.bi.56.070187.002053</pub-id></element-citation></ref><ref id="bib58"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wu</surname><given-names>NC</given-names></name><name><surname>Young</surname><given-names>AP</given-names></name><name><surname>Al-Mawsawi</surname><given-names>LQ</given-names></name><name><surname>Olson</surname><given-names>CA</given-names></name><name><surname>Feng</surname><given-names>J</given-names></name><name><surname>Qi</surname><given-names>H</given-names></name><name><surname>Chen</surname><given-names>SH</given-names></name><name><surname>Lu</surname><given-names>IH</given-names></name><name><surname>Lin</surname><given-names>CY</given-names></name><name><surname>Chin</surname><given-names>RG</given-names></name><name><surname>Luan</surname><given-names>HH</given-names></name><name><surname>Nguyen</surname><given-names>N</given-names></name><name><surname>Nelson</surname><given-names>SF</given-names></name><name><surname>Li</surname><given-names>X</given-names></name><name><surname>Wu</surname><given-names>TT</given-names></name><name><surname>Sun</surname><given-names>R</given-names></name></person-group><year>2014</year><article-title>High-throughput profiling of influenza A virus hemagglutinin gene at single-nucleotide resolution</article-title><source>Scientific Reports</source><volume>4</volume><fpage>4942</fpage><pub-id pub-id-type="doi">10.1038/srep04942</pub-id></element-citation></ref><ref id="bib59"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Yang</surname><given-names>Z</given-names></name><name><surname>Nielsen</surname><given-names>R</given-names></name><name><surname>Goldman</surname><given-names>N</given-names></name><name><surname>Pedersen</surname><given-names>AMK</given-names></name></person-group><year>2000</year><article-title>Codon-substitution models for heterogeneous selection pressure at amino acid sites</article-title><source>Genetics</source><volume>155</volume><fpage>431</fpage><lpage>449</lpage></element-citation></ref><ref id="bib60"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Yewdell</surname><given-names>J</given-names></name><name><surname>Webster</surname><given-names>R</given-names></name><name><surname>Gerhard</surname><given-names>W</given-names></name></person-group><year>1979</year><article-title>Antigenic variation in three distinct determinants of an influenza type A haemagglutinin molecule</article-title><source>Nature</source><volume>279</volume><fpage>246</fpage><lpage>248</lpage><pub-id pub-id-type="doi">10.1038/279246a0</pub-id></element-citation></ref></ref-list></back><sub-article article-type="article-commentary" id="SA1"><front-stub><article-id pub-id-type="doi">10.7554/eLife.03300.029</article-id><title-group><article-title>Decision letter</article-title></title-group><contrib-group content-type="section"><contrib contrib-type="editor"><name><surname>Pascual</surname><given-names>Mercedes</given-names></name><role>Reviewing editor</role><aff><institution>University of Michigan</institution>, <country>United States</country></aff></contrib></contrib-group></front-stub><body><boxed-text><p>eLife posts the editorial decision letter and author response on a selection of the published articles (subject to the approval of the authors). An edited version of the letter sent to the authors after peer review is shown, indicating the substantive concerns or comments; minor concerns are not usually shown. Reviewers have the opportunity to discuss the decision before the letter is sent (see <ext-link ext-link-type="uri" xlink:href="http://elifesciences.org/review-process">review process</ext-link>). Similarly, the author response typically shows only responses to the major concerns raised by the reviewers.</p></boxed-text><p>Thank you for sending your work entitled “The inherent mutational tolerance and antigenic evolvability of influenza hemagglutinin” for consideration at <italic>eLife.</italic> Your article has been favorably evaluated by Ian Baldwin (Senior editor), a Reviewing editor, and 3 reviewers, one of whom, Sarah Cobey, has agreed to reveal her identity.</p><p>The Reviewing editor and the reviewers discussed their comments before we reached this decision, and the Reviewing editor has assembled the following comments to help you prepare a revised submission.</p><p>The reviewers agree that this paper advances knowledge on an important topic in pathogen evolution: the extent to which rapidly adapting viruses like influenza have evolved particularly evolvable phenotypes. They also judge the set of experiments important and of high quality. Finally, they recognize the novelty of combining the experimental results with an evolutionary model. This combination demonstrates that the site-specific amino acid preferences inferred from the experiments describe HA evolution more parsimoniously than traditional codon substitution models.</p><p>The reviewers provide however a number of comments whose clarification and further discussion would strengthen the arguments made in the manuscript and its eventual impact. These comments primarily concern the uncertainty and assumptions over the identity of antigenic sites and the relevant RBS residues for control purposes. It was felt that both these assumptions could be made more explicit and examined further in the ways suggested below, together with other major points.</p><p>1) The conclusions from this study are dependent on knowing which HA residues are physically located in antigenic sites. The authors use the classic <xref ref-type="bibr" rid="bib12">Caton et al. 1982</xref> study to classify which HA residues are located in antigenic sites. This is a reasonable approach because the WSN/1933 virus used in the current study is antigenically similar to the PR8/1934 used in the <xref ref-type="bibr" rid="bib12">Caton et al. 1982</xref> study. The Caton et al. experiments defined antigenic sites by selecting viral escape mutants in the presence of saturating amounts of individual monoclonal antibodies. Based on the Caton et al. experimental approach, viruses that possess mutations at HA residues that severely compromise viral fitness are never selected-essentially, Caton et al. defined the antigenic sites by isolating mutant viruses that were capable of growing. With this in mind, it is perhaps not surprising that these same HA residues are tolerant of mutations in the current study.</p><p>When the Caton et al. escape mutants are mapped onto the 1RVX structure, there are clusters of mutations that correspond to each antigenic site. Within each cluster, there are a few HA residues that appear to be in the cluster (thus likely in the antibody binding site) but these residues never appeared in the Caton et al. viral mutant escape experiments. It has been assumed that this is the case because either (a) there was a limited number of mutant viruses screened by Caton et al. or (b) that these residues within the antigenic site were detrimental to viral fitness.</p><p>The authors should consider making an HA structure analogous to the one shown in <xref ref-type="fig" rid="fig9">Figure 9</xref>, with entropy color labeling for all surface residues. This type of analysis will demonstrate if tolerability is limited to those residues identified in selection assays (assays that only identify residues capable of tolerating mutations) or also found in adjacent residues that are likely located in antibody binding sites but were not identified in the original antigenic mapping studies. It might be helpful to show surface on the structure for this type of analyses.</p><p>Another partial solution is to examine antigenic sites previously defined by other means (such as epitopes defined by crystal structures).</p><p>This is not a critical flaw but should be considered either in the discussion or by completing additional analyses as described above.</p><p>2) It is not clear that the residues most essential for function must be the most conserved ones. We know that naturally occurring neutralizing antibodies can target the RBS (Whittle et al., PNAS, 2011). There is also variation in receptor type within hosts and between species.</p><p>The main text justifies using conserved RBS as a control because these sites are expected to be strongly conserved, and the RBS is defined to exclude non-conserved RBS-associated sites. One could also argue that high mutational tolerance in the RBS could be adaptive to facilitate escape in nearby epitopes;it would be really interesting if this were the case, with the control being some non-immunogenic area near the stalk. The simplest solution may be to show that the results are unchanged when the non-conserved RBS sites are included, assuming this is the case. If it is not, it would help to know which sites the conserved RBS correspond to on the HA structure. (One would hope they're deep-in-the-pocket).</p><p>3) The paper addresses the mutational tolerance and antigenic evolvability of HA. Relative to what? Are these metrics different for other influenza proteins? Other viral glycopropteins? A cursory analysis of the HA site preference data vs. NP site preference (in <xref ref-type="bibr" rid="bib9">Bloom 2014</xref>, bioRxiv, currently in review in MBE?) would further support this claim.</p><p>4) “Big data” mutagenesis scans must sacrifice something in their broad scope; for example, quantitative as opposed to binary fitness measurements, intragenic epistasis (e.g. interactions among substitutions within HA as will occur in this experimental set-up), pool drop out bottlenecks. The authors already address many of these. It would be good to acknowledge them as shortcomings or areas for follow up work, and to discuss the recent publication by Wu et al. of a similar scanning mutagenesis that does report fitness values (Wu et al., High throughput profiling of influenza A virus hemagglutinin gene at the single-nucleotide resolution, Scientific Reports 2013).</p><p>Unlike the paper by Wu et al., however, the present manuscript places the mutational data within an evolutionary framework, which is something generally missing in most such studies. This novelty of the current manuscript could be drawn out more. In addition, both this paper and <xref ref-type="bibr" rid="bib58">Wu et al. (2014)</xref> present a snapshot of the (mostly) single-mutation neighbors from a single point (the sequence of WSN33 HA) in the fitness landscape. Both find that viable paths occur disproportionately at antigenic sites. The fact that these rules, when extrapolated over the phylogeny, greatly improve the fit implies that epistasis is in some ways less pervasive than we think and may not be such a huge obstacle to evolution. Most substitutions appear to be concentrated among a permissive network of sites.</p><p>5) The following aspects of the experiments should be clarified/discussed: (a) The possibility of reversion between the plasmid pool and the passaged virus is not accounted for. Granted the wild type control shows that there is low background mutation, but a weakly functional mutant could revert. (b) During transfection, one would expect that there may be viruses produced from cells expressing two different HA genes. How would such chimeric viruses confound the results in a single passage experiment? (c) Similarly to #2, wouldn't there also be viruses in which the HA on the surface has a given amino acid sequence, while the packaged genomic segments code for a different one? Multiple passages would ferret these out. (d) How were HA molecules sheared and why are there biases in sequence fragmentation? (e) Are more exposed HA residues generally more tolerant to mutations? The authors address this by correcting for relative solvent accessibility, but this description is difficult to follow. The importance of relevant solvent accessibility (RSA) is shown in <xref ref-type="table" rid="tbl3">Table 3</xref> and <xref ref-type="fig" rid="fig9">Figure 9</xref>, but the RSA result is not stated clearly in the main text. Can the authors simplify or clarify this part?</p><p>6) The conclusions would be further strengthened if there was some way to “predict” retrospectively where HA would go next based on the model. It would be interesting to discuss this possibility along the lines of what Luksza and Lassig published in Nature this year (Nature 507:57-61).</p></body></sub-article><sub-article article-type="reply" id="SA2"><front-stub><article-id pub-id-type="doi">10.7554/eLife.03300.030</article-id><title-group><article-title>Author response</article-title></title-group></front-stub><body><p><italic>The reviewers provide however a number of comments whose clarification and further discussion would strengthen the arguments made in the manuscript and its eventual impact. These comments primarily concern the uncertainty and assumptions over the identity of antigenic sites and the relevant RBS residues for control purposes. It was felt that both these assumptions could be made more explicit and examined further in the ways suggested below, together with other major points</italic>.</p><p>Thank you to the editors and reviewers for their very careful review of our manuscript. The critiques were very helpful, and we believe that the revisions that we have made in response have substantially improved the manuscript. Specifically:</p><p>We have included a second expanded definition of antigenic sites to show that the essential finding that HA has high antigenic evolvability is robust to the method used to classify the antigenic sites (see Response 2 below).</p><p>We have clarified that not all receptor-binding site residues have low inherent mutational tolerance – this is only true of conserved residues deep in the pocket. We now include separate classifications for conserved and all receptor-binding residues (see Response 3 below).</p><p>The suggestion to include a comparison to other viral proteins is very helpful; as suggested by the reviewers, we now compare to NP to show that high antigenic evolvability is a property of HA but not NP (see Response 4 below).</p><p>As suggested by the reviewers, we have included a comparison to the results of the Wu <italic>et al.</italic> study that was published while our study was in review. This comparison shows that the results of the studies are correlated, but also explains why our study provides substantially more information, and also comments on issues related to these types of high-throughput studies (see Response 5 below).</p><p><italic>1) The conclusions from this study are dependent on knowing which HA residues are physically located in antigenic sites. The authors use the classic</italic> <xref ref-type="bibr" rid="bib12"><italic>Caton et al. 1982</italic></xref> <italic>study to classify which HA residues are located in antigenic sites. This is a reasonable approach because the WSN/1933 virus used in the current study is antigenically similar to the PR8/1934 used in the</italic> <xref ref-type="bibr" rid="bib12"><italic>Caton et al. 1982</italic></xref> <italic>study. The Caton et al. experiments defined antigenic sites by selecting viral escape mutants in the presence of saturating amounts of individual monoclonal antibodies. Based on the Caton et al. experimental approach, viruses that possess mutations at HA residues that severely compromise viral fitness are never selected-essentially, Caton et al. defined the antigenic sites by isolating mutant viruses that were capable of growing. With this in mind, it is perhaps not surprising that these same HA residues are tolerant of mutations in the current study</italic>.</p><p><italic>When the Caton et al. escape mutants are mapped onto the 1RVX structure, there are clusters of mutations that correspond to each antigenic site. Within each cluster, there are a few HA residues that appear to be in the cluster (thus likely in the antibody binding site) but these residues never appeared in the Caton et al. viral mutant escape experiments. It has been assumed that this is the case because either (a) there was a limited number of mutant viruses screened by Caton et al. or (b) that these residues within the antigenic site were detrimental to viral fitness</italic>.</p><p><italic>The authors should consider making an HA structure analogous to the one shown in</italic> <xref ref-type="fig" rid="fig9"><italic>Figure 9</italic></xref><italic>, with entropy color labeling for all surface residues. This type of analysis will demonstrate if tolerability is limited to those residues identified in selection assays (assays that only identify residues capable of tolerating mutations) or also found in adjacent residues that are likely located in antibody binding sites but were not identified in the original antigenic mapping studies. It might be helpful to show surface on the structure for this type of analyses</italic>.</p><p><italic>Another partial solution is to examine antigenic sites previously defined by other means (such as epitopes defined by crystal structures)</italic>.</p><p><italic>This is not a critical flaw but should be considered either in the discussion or by completing additional analyses as described above</italic>.</p><p>This is a very astute comment. Essentially, the reviewers are noting that Caton <italic>et al.</italic> defined the antigenic sites largely by mapping monoclonal-antibody escape mutants. The Caton <italic>et al.</italic> study therefore may have had a bias towards mapping sites that were tolerant to mutations, potentially confounding our interpretation.</p><p>To address this concern, we have added a second expanded classification of antigenic sites which includes the specific antigenic sites mapped by Caton <italic>et al.</italic> plus all surface-exposed residues that are in contact with these antigenic sites. This second classification should alleviate the concern about any bias in the Caton <italic>et al.</italic> mapping, because even if only some residues in a cluster had mutations selected by Caton <italic>et al.</italic>, our expanded structural classification will also include all other residues in the same physical cluster.</p><p>In the revised manuscript, we now perform the analysis for both the original Caton <italic>et al.</italic> classification and this second expanded classification. The results are described in <xref ref-type="fig" rid="fig10">Figure 10</xref> and <xref ref-type="table" rid="tbl4">Table 4</xref>. These results show that the antigenic sites have significantly higher inherent mutational tolerance regardless of whether we use the original Caton <italic>et al.</italic> classification or the new expanded structural classification.</p><p>In addition, we have followed the reviewers’ recommendation and included an entropy-colored surface-rendered structure of HA in <xref ref-type="fig" rid="fig10">Figure 10</xref> as well as individual structures showing just the antigenic sites. This new image is panel A of the revised <xref ref-type="fig" rid="fig10">Figure 10</xref>. We think this image visually supports the statistical analyses showing that the expanded classification of antigenic sites has higher inherent mutational tolerance.</p><p>Finally, we have substantially revised the text in the section entitled “The inherent evolvability of antigenic sites on HA” to explain the point made by this reviewer comment and how we have addressed it. We refer the reviewers to the revised manuscript for the full text, but here is the most relevant portion:</p><p>One possible concern is that <xref ref-type="bibr" rid="bib12">Caton et al. (1982)</xref> mapped antigenic sites largely by selecting monoclonal-antibody escape mutants, and so these sites might be biased towards being more mutationally tolerant. We therefore also made a broader classification of ‘antigenic sites and contacting residues’ consisting of the <xref ref-type="bibr" rid="bib12">Caton et al. (1982)</xref> antigenic sites plus all surface-exposed residues in contact with these sites (see Methods for details). This broader classification includes all residues in regions of the HA surface targeted by antibodies, and so should not be biased by whether sites are amenable to the selection of monoclonal-antibody escape mutants. We hypothesized that both sets of anti-genic sites would have unusually high mutational tolerance.</p><p><italic>2) It is not clear that the residues most essential for function must be the most conserved ones. We know that naturally occurring neutralizing antibodies can target the RBS (Whittle et al.,</italic> <italic>PNAS, 2011</italic><italic>). There is also variation in receptor type within hosts and between species</italic>.</p><p><italic>The main text justifies using conserved RBS as a control because these sites are expected to be strongly conserved, and the RBS is defined to exclude non-conserved RBS-associated sites. One could also argue that high mutational tolerance in the RBS could be adaptive to facilitate escape in nearby epitopes; it would be really interesting if this were the case, with the control being some non-immunogenic area near the stalk. The simplest solution may be to show that the results are unchanged when the non-conserved RBS sites are included, assuming this is the case. If it is not, it would help to know which sites the conserved RBS correspond to on the HA structure. (One would hope they're deep-in-the-pocket)</italic>.</p><p>This is another very astute comment. In the original version, we chose conserved receptor-binding residues as a control group for the antigenic sites because we wanted residues that we expected to be under strong constraint. However, as the reviewers point out, this could give the incorrect impression that all receptor-binding residues are highly con-served. In fact, a handful of “deep in the pocket” residues are highly conserved, but many other residues on the periphery of the receptor binding pocket are relatively amenable to change.</p><p>To address this comment, we now define two groups of receptor binding residues: the conserved ones (mostly deep in the pocket) and the set of all residues that contact the receptor. In the revised <xref ref-type="fig" rid="fig10">Figure 10</xref> and <xref ref-type="table" rid="tbl4">Table 4</xref> we analyze both of these groups separately, and discuss the results in the revised section “The inherent evolvability of antigenic sites on HA.” We refer the reviewers to the revised section and figure/table for full details, here we summarize the main findings by quoting from the most relevant portions of the revised section:</p><p>“For comparison to the antigenic sites, we used two classifications of receptor-binding residues (see Methods). The first classification consists of residues that have important roles in receptor binding (<xref ref-type="bibr" rid="bib38">Martin et al, 1998</xref>) and are conserved in H1 HAs; these residues are mostly deep in the binding pocket. The second classification consists of all residues that contact the sialic-acid receptor in the crystal structure, regardless of their level of conservation. We hypothesized that the core set of conserved receptor-binding residues would have unusually low mutational tolerance, but that the set of all receptor-binding residues would have typical levels of mutational tolerance since influenza routinely escapes from antibodies that target the periphery of the receptor-binding pocket (<xref ref-type="bibr" rid="bib31">Koel et al, 2013</xref>)...”</p><p>We then describe results that validate this hypothesis, and conclude:</p><p>“These results also show that while a core group of conserved residues deep in the receptor-binding pocket have unusually low mutational tolerance, the bulk of residues that contact the receptor are not under exceptional constraint. This fact probably explains why HA is able to escape from antibodies targeting the periphery of the receptor-binding pocket (<xref ref-type="bibr" rid="bib31">Koel et al, 2013</xref>), and why only rare antibodies that penetrate deep into this pocket are broadly neutralizing (<xref ref-type="bibr" rid="bib55">Whittle et al, 2011</xref>).”</p><p><italic>3) The paper addresses the mutational tolerance and antigenic evolvability of HA. Relative to what? Are these metrics different for other influenza proteins? Other viral glycopropteins? A cursory analysis of the HA site preference data vs. NP site preference (in</italic> <xref ref-type="bibr" rid="bib9"><italic>Bloom 2014</italic></xref><italic>, bioRxiv, currently in review in MBE?) would further support this claim</italic>.</p><p>This was an extremely helpful suggestion, and we believe that the changes that we have made in response have substantially strengthened the manuscript. Specifically, we now compare our results for HA to our previous results for influenza NP (which are now published in Molecular Biology and Evolution) in a new <xref ref-type="fig" rid="fig11">Figure 11</xref> and <xref ref-type="table" rid="tbl5">Table 5</xref>. We also include a new section in the Results (“HA’s antigenic evolvability is not shared by all influenza proteins”).</p><p><italic>4) “Big data” mutagenesis scans must sacrifice something in their broad scope; for example, quantitative as opposed to binary fitness measurements, intragenic epistasis (e.g. interactions among substitutions within HA as will occur in this experimental set-up), pool drop out bottlenecks. The authors already address many of these. It would be good to acknowledge them as shortcomings or areas for follow up work, and to discuss the recent publication by Wu et al. of a similar scanning mutagenesis that does report fitness values (Wu et al., High throughput profiling of influenza A virus hemagglutinin gene at the single-nucleotide resolution,</italic> <italic>Scientific Reports 2013</italic><italic>)</italic>.</p><p><italic>Unlike the paper by Wu et al., however, the present manuscript places the mutational data within an evolutionary framework, which is something generally missing in most such studies. This novelty of the current manuscript could be drawn out more. In addition, both this paper and</italic> <xref ref-type="bibr" rid="bib58"><italic>Wu et al. (2014)</italic></xref> <italic>present a snapshot of the (mostly) single-mutation neighbors from a single point (the sequence of WSN33 HA) in the fitness landscape. Both find that viable paths occur disproportionately at antigenic sites. The fact that these rules, when extrapolated over the phylogeny, greatly improve the fit implies that epistasis is in some ways less pervasive than we think and may not be such a huge obstacle to evolution. Most substitutions appear to be concentrated among a permissive network of sites</italic>.</p><p>These are all very good comments. We have tried to be as transparent as possible about the shortcomings as well as the strengths of the high-throughput approach that we have used. For instance, we dedicate an entire figure (<xref ref-type="fig" rid="fig6">Figure 6</xref>) to transparently showing the correlations between our replicates, and honestly discuss how this figure shows that although the experiment is definitely extracting reproducibly meaningful information, it is also subject to a large amount of noise as judged by the variation among replicates. We discuss how bottlenecks and other experimental problems could contribute to this.</p><p>As the reviewers point out, while our manuscript was under review, Wu <italic>et al.</italic> reported a somewhat similar high-throughput study of influenza HA. The study by Wu <italic>et al.</italic> is clearly very valuable. However, we believe that our study contains some advantages: for instance, we survey all amino-acid mutations while Wu <italic>et al.</italic> survey only about 20% of them, we provide full raw data for experimental replicates and controls for possible sources of error while Wu <italic>et al.</italic> do not, and we perform substantially more quantitative analyses of our data including in connection to natural evolution.</p><p>To help address the points that the reviewers have made, we now include a new figure (<xref ref-type="fig" rid="fig7">Figure 7</xref>) and a new section in the Results that compares our study with that of Wu <italic>et al</italic>. We also use this new section to highlight some of the limitations of “big data” studies that the reviewers noted in their comments. Below we quote the full text of this new section, and put in bold the parts that specifically address the limitations noted by the reviewers.</p><p>As our paper was under review, <xref ref-type="bibr" rid="bib58">Wu et al. (2014)</xref> published the results of using a similar strategy to examine the effects of mutations to the WSN HA. In their study, the HA gene was mutated at the nucleotide level, so their experiments surveyed only amino-acid mutations accessible by single-nucleotide codon changes. As a result, they provide data on the effects of only about 20% of the 19 564 = 10716 amino-acid mutations examined in our study. Despite this limitation, their study provides a large dataset of mutational effects to which we can compare our results.</p><p><xref ref-type="fig" rid="fig7">Figure 7</xref> compares the mutational effects determined in our study to those from <xref ref-type="bibr" rid="bib58">Wu et al. (2014)</xref>. There is a highly significant correlation between the results of the two studies; but the inferred mutational effects are certainly not identical. Because <xref ref-type="bibr" rid="bib58">Wu et al. (2014)</xref> do not provide the data for replicates of their experiment, we are unable to assess whether the variability between the two different studies exceeds the variability between experimental replicates within each study. So one can imagine both biologically interesting and uninteresting explanations for the imperfect correlation between the results of the two studies. The interesting explanation is that differences in experimental methodology could lead to different selection pressures on specific mutations: for instance, <xref ref-type="bibr" rid="bib58">Wu et al. (2014)</xref> use A549 cells while we use MDCK-SIAT1 cells, and perhaps the impact of certain mutations is dependent on the cell line. The uninteresting explanation is that the imperfect correlation is simply due to noise in the experimental measurements. Unfortunately, it is not straightforward to distinguish between these two explanations. This difficulty in pinpointing reasons for inter-study variation highlights a limitation of the high-throughput experimental methodology employed by ourselves and <xref ref-type="bibr" rid="bib58">Wu et al. (2014)</xref>: while such experiments provide a wealth of data, numerous factors can create noise in these data (sequencing errors, population bottlenecks, epistasis among mutations, etc.). Realizing the full potential of such studies will therefore require extensive experimental controls and biological replicates to quantify errors and noise to enable comparisons across data sets.</p><p>Nonetheless, <xref ref-type="fig" rid="fig7">Figure 7</xref> shows that there is a highly significant correlation between the results of these two high-throughput studies, despite differences in experimental methodology and unquantified sources of experimental noise. This fact suggests that both studies capture fundamental constraints on HA’s mutational tolerance. In the remaining sections, we apply the more comprehensive data generated by our study to address questions about HA’s natural evolution and antigenic evolvability.</p><p>However, beyond the text quoted above and the new <xref ref-type="fig" rid="fig7">Figure 7</xref>, we are unable to analyze the data of Wu et al. to with respect to the other questions about inherent mutational tolerance and viral evolution that we address in our manuscript. The reason is that Wu et al. only measure the fitness effects of an average of 4 of the possible 19 amino-acid mutations at each site, and this incomplete information is not sufficient to either assess inherent mutational tolerance or construct phylogenetic evolutionary models.</p><p><italic>5) The following aspects of the experiments should be clarified/discussed: (a) The possibility of reversion between the plasmid pool and the passaged virus is not accounted for. Granted the wild type control shows that there is low background mutation, but a weakly functional mutant could revert. (b) During transfection, one would expect that there may be viruses produced from cells expressing two different HA genes. How would such chimeric viruses confound the results in a single passage experiment? (c) Similarly to #2, wouldn't there also be viruses in which the HA on the surface has a given amino acid sequence, while the packaged genomic segments code for a different one? Multiple passages would ferret these out. (d) How were HA molecules sheared and why are there biases in sequence fragmentation? (e) Are more exposed HA residues generally more tolerant to mutations? The authors address this by correcting for relative solvent accessibility, but this description is difficult to follow. The importance of relevant solvent accessibility (RSA) is shown in</italic> <xref ref-type="table" rid="tbl3"><italic>Table 3</italic></xref> <italic>and</italic> <xref ref-type="fig" rid="fig9"><italic>Figure 9</italic></xref><italic>, but the RSA result is not stated clearly in the main text. Can the authors simplify or clarify this part</italic>?</p><p>The reviewers ask in point (a) above about the possibility of reversions. We suspect that reversions are very rare, since the virus is only passaged once which should give a relatively small opportunity for such mutations to arise. But we acknowledge that none of our controls directly measure the rate of reversions. Our sequencing of the wildtype virus shows that mutations overall are rare, but it is possible that reversions are more common. However, a key advantage of our approach is that reversions would only affect a small fraction of our mutations – because we introduce mutations at the codon level, most (54 of 63) of the possible mutations involve multiple nucleotide changes. We deem it extremely unlikely that a single codon would revert two nucleotides during a single viral passage. To address this issue, we have revised the text.</p><p>The reviewers ask in points (b) and (c) above about chimeric viruses that express multiple HA proteins, and about viruses that lack a genotype-to-phenotype link because they possess different HA genes and proteins. We agree that this is a potential concern – however, the problem should be fixed during the low multiplicity of infection (MOI) passaging of the virus. In the current manuscript, we only perform one such passage – so the reviewers ask if it might not be desirable to also perform multiple passages to make sure the genotype-phenotype link is strong. This is a great suggestion, but in fact, in our previous work with NP (<xref ref-type="bibr" rid="bib9">Bloom, 2014</xref>) we did just that. Specifically, in that work, we sequenced NP after both one and two passages, and found that the results after just one passage were essentially indistinguishable from the results after two passages (see the fifth figure of <ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/content/early/2014/06/12/molbev.msu173">http://mbe.oxfordjournals.org/content/early/2014/06/12/molbev.msu173</ext-link>). One reason that a single low-MOI passage is su?cient may be that there is probably substantial selection that also occurs during viral rescue in co-cultures prior to the first low-MOI passage. In any case, based on our previous NP work, we judge that one low-MOI passage is sufficient to ensure a strong genotype-phenotype link for influenza genes, and so for this reason we only performed one low-MOI passage in the current work (there are good reasons for wanting to keep the number of passages as low as possible, primarily to avoid problems like the reversion one mentioned in the reviewers’ previous comment).</p><p>To clarify this point, we have revised the manuscript (section starting “The mutant viruses generated for each replicate were passaged at a relatively low multiplicity of infection”).</p><p>The reviewers ask in point (d) how the HA molecules were fragmented. The fragmentation was done using the Illumina Nextera kit, which is widely used in the preparation of Illumina sequencing libraries. The fragmentation bias almost certainly arises due to weak sequence preferences of the transposon that mediates the fragmentation in the Nextera kit. It is known that this transposon has weak biases for certain sequences; for instance, see <ext-link ext-link-type="uri" xlink:href="http://genomebiology.com/2010/11/12/r119">http://genomebiology.com/2010/11/12/r119</ext-link>. However, as should be clear from <xref ref-type="fig" rid="fig3s3">Figure 3—figure supplement 3</xref> these biases are fairly mild, and lead to at most a five-fold variation in coverage at different sites.</p><p>To clarify this issue, we have revised the manuscript (section starting “In order to reduce the sequencing error rate”).</p><p>The reviewers ask in point (e) whether residues with higher relative solvent accessibility (RSA) are generally more tolerant to mutations. The answer is yes as shown in <xref ref-type="table" rid="tbl4">Table 4</xref> and <xref ref-type="fig" rid="fig10">Figure 10</xref>, but as the reviewers point out this result was not clearly stated in the main text. We have revised the text to make this point more clearly.</p><p><italic>6) The conclusions would be further strengthened if there was some way to ’predict‘ retrospectively where HA would go next based on the model. It would be interesting to discuss this possibility along the lines of what Luksza and Lassig published in Nature this year (Nature 507:57-61)</italic>.</p><p>The reviewers ask if it might be possible to predict the evolution of influenza using data from our deep mutational scanning. In particular, they mention the work of Luksza and Lassig (2014), who develop a fitness model that enables improved forecasting about which of a variety of closely related existing strains of epidemic influenza are likely to predominate in future years. We do see some ways in which our data might synergize with their approach. For instance, a key component of their model (see Equation 2 of their paper) is assigning a “mutational load” to non-antigenic mutations. One could imagine that deep mutational scanning might enable more accurate assignment of the “mutational load” caused by specific mutations, although it remains unproven whether this would actually work.</p><p>For now, we mention this possibility with the following sentence in the Discussion:</p><p>“It also may be possible to utilize high-throughput experimental data on mutational effects to better estimate the fitness of naturally occurring strains in a way that aids in prediction of the year-to-year strain dynamics of influenza (Luksza and Lassig, 2014).”</p><p>We have also provided links in the manuscript to our complete raw data and source code as well as detailed documentation of our computational analysis pipeline. We hope that this transparent availability of all our raw data will help facilitate the work of others who wish to develop approaches for viral forecasting. However, we are currently unable to offer any sort of effective predictive model ourselves, and believe that prediction of viral evolution remains a daunting problem.</p></body></sub-article></article> |