Permalink
Switch branches/tags
Nothing to show
Find file
Fetching contributors…
Cannot retrieve contributors at this time
2 lines (1 sloc) 154 KB
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.1d1 20130915//EN" "JATS-archivearticle1.dtd"><article article-type="research-article" dtd-version="1.1d1" xmlns:xlink="http://www.w3.org/1999/xlink"><front><journal-meta><journal-id journal-id-type="nlm-ta">elife</journal-id><journal-id journal-id-type="hwp">eLife</journal-id><journal-id journal-id-type="publisher-id">eLife</journal-id><journal-title-group><journal-title>eLife</journal-title></journal-title-group><issn publication-format="electronic">2050-084X</issn><publisher><publisher-name>eLife Sciences Publications, Ltd</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="publisher-id">03528</article-id><article-id pub-id-type="doi">10.7554/eLife.03528</article-id><article-categories><subj-group subj-group-type="display-channel"><subject>Research article</subject></subj-group><subj-group subj-group-type="heading"><subject>Biochemistry</subject></subj-group><subj-group subj-group-type="heading"><subject>Genomics and evolutionary biology</subject></subj-group></article-categories><title-group><article-title>Extensive translation of small Open Reading Frames revealed by Poly-Ribo-Seq</article-title></title-group><contrib-group><contrib contrib-type="author" id="author-10480"><name><surname>Aspden</surname><given-names>Julie L</given-names></name><xref ref-type="aff" rid="aff1"/><xref ref-type="fn" rid="con1"/><xref ref-type="fn" rid="conf1"/><xref ref-type="other" rid="dataro1"/></contrib><contrib contrib-type="author" id="author-14941"><name><surname>Eyre-Walker</surname><given-names>Ying Chen</given-names></name><xref ref-type="aff" rid="aff1"/><xref ref-type="fn" rid="con2"/><xref ref-type="fn" rid="conf1"/><xref ref-type="other" rid="dataro1"/></contrib><contrib contrib-type="author" id="author-17843"><name><surname>Phillips</surname><given-names>Rose J</given-names></name><xref ref-type="aff" rid="aff1"/><xref ref-type="fn" rid="con3"/><xref ref-type="fn" rid="conf1"/></contrib><contrib contrib-type="author" id="author-14943"><name><surname>Amin</surname><given-names>Unum</given-names></name><xref ref-type="aff" rid="aff1"/><xref ref-type="fn" rid="con4"/><xref ref-type="fn" rid="conf1"/></contrib><contrib contrib-type="author" id="author-14944"><name><surname>Mumtaz</surname><given-names>Muhammad Ali S</given-names></name><xref ref-type="aff" rid="aff1"/><xref ref-type="fn" rid="con5"/><xref ref-type="fn" rid="conf1"/></contrib><contrib contrib-type="author" id="author-14945"><name><surname>Brocard</surname><given-names>Michele</given-names></name><xref ref-type="aff" rid="aff1"/><xref ref-type="fn" rid="con6"/><xref ref-type="fn" rid="conf1"/></contrib><contrib contrib-type="author" corresp="yes" id="author-14627"><name><surname>Couso</surname><given-names>Juan-Pablo</given-names></name><xref ref-type="aff" rid="aff1"/><xref ref-type="corresp" rid="cor1">*</xref><xref ref-type="other" rid="par-1"/><xref ref-type="fn" rid="con7"/><xref ref-type="fn" rid="conf1"/><xref ref-type="other" rid="dataro1"/></contrib><aff id="aff1"><institution content-type="dept">School of Life Sciences</institution>, <institution>University of Sussex</institution>, <addr-line><named-content content-type="city">Brighton</named-content></addr-line>, <country>United Kingdom</country></aff></contrib-group><contrib-group content-type="section"><contrib contrib-type="editor"><name><surname>Gingeras</surname><given-names>Thomas R</given-names></name><role>Reviewing editor</role><aff><institution>Cold Spring Harbor Laboratory</institution>, <country>United States</country></aff></contrib></contrib-group><author-notes><corresp id="cor1"><label>*</label>For correspondence: <email>j.p.couso@sussex.ac.uk</email></corresp></author-notes><pub-date date-type="pub" publication-format="electronic"><day>21</day><month>08</month><year>2014</year></pub-date><pub-date pub-type="collection"><year>2014</year></pub-date><volume>3</volume><elocation-id>e03528</elocation-id><history><date date-type="received"><day>30</day><month>05</month><year>2014</year></date><date date-type="accepted"><day>19</day><month>08</month><year>2014</year></date></history><permissions><copyright-statement>© 2014, Aspden et al</copyright-statement><copyright-year>2014</copyright-year><copyright-holder>Aspden et al</copyright-holder><license xlink:href="http://creativecommons.org/licenses/by/4.0/"><license-p>This article is distributed under the terms of the <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution License</ext-link>, which permits unrestricted use and redistribution provided that the original author and source are credited.</license-p></license></permissions><self-uri content-type="pdf" xlink:href="elife03528.pdf"/><abstract><object-id pub-id-type="doi">10.7554/eLife.03528.001</object-id><p>Thousands of small Open Reading Frames (smORFs) with the potential to encode small peptides of fewer than 100 amino acids exist in our genomes. However, the number of smORFs actually translated, and their molecular and functional roles are still unclear. In this study, we present a genome-wide assessment of smORF translation by ribosomal profiling of polysomal fractions in <italic>Drosophila</italic>. We detect two types of smORFs bound by multiple ribosomes and thus undergoing productive translation. The ‘longer’ smORFs of around 80 amino acids resemble canonical proteins in translational metrics and conservation, and display a propensity to contain transmembrane motifs. The ‘dwarf’ smORFs are in general shorter (around 20 amino-acid long), are mostly found in 5′-UTRs and non-coding RNAs, are less well conserved, and have no bioinformatic indicators of peptide function. Our findings indicate that thousands of smORFs are translated in metazoan genomes, reinforcing the idea that smORFs are an abundant and fundamental genome component.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03528.001">http://dx.doi.org/10.7554/eLife.03528.001</ext-link></p></abstract><abstract abstract-type="executive-summary"><object-id pub-id-type="doi">10.7554/eLife.03528.002</object-id><title>eLife digest</title><p>To produce a protein, a stretch of DNA must first be transcribed to produce a molecule of messenger RNA (mRNA). The genetic information copied from the DNA is then read three letters at a time, in groups called codons. Each codon either encodes a particular amino acid to be added into a protein or provides further instructions: ‘start codons’ mark the beginning of a protein; ‘stop codons’ mark its end. The DNA between these two points is called an open reading frame (or ORF)—however, not all ORFs produce proteins.</p><p>Most proteins are made of several hundred amino acids, but the genomes of animals contain thousands of ORFs that would generate much smaller proteins made of fewer than 100 amino acids, if they were translated. It is, however, unclear how many of these small ORFs are converted into mRNA molecules and functional proteins.</p><p>Ribosomes are large molecular machines that translate the code in mRNA molecules and join together the appropriate amino acids in the right order to make a protein. Ribosome profiling is a technique that identifies which mRNA molecules are translated into proteins by determining the sequences of all the mRNA molecules bound to ribosomes at a particular moment. The mRNA sequences can then be compared with the sequence of the whole genome to work out which ORFs they correspond to. Ribosome profiling has been used to detect translated small ORFs, but the method yields a relatively high false positive rate as some mRNAs can bind to ribosomes without being translated.</p><p>To better detect small protein-producing ORFs, Aspden et al. developed a technique based on ribosome profiling called Poly-Ribo-Seq. The method takes advantage of the fact that during active translation, clusters of multiple ribosomes, called polysomes, bind mRNAs. Poly-Ribo-Seq isolates these polysomes and determines the sequence bound by each of the ribosomes, thereby reducing the number of false positives.</p><p>Applying Poly-Ribo-Seq to cells from the fruit fly <italic>Drosophila</italic> allowed Aspden et al. to identify two types of short ORF. The first type codes for proteins that are around 80 amino acids long and are translated with the same efficiency as larger ORFs. The sequences of these ORFs are found in other species, match at least in part sequences of known functional ORFs, and the proteins produced are found in specific locations inside cells. These small proteins may contribute to membrane integrity or function. Together, these properties suggest that these mRNAs create functional small proteins.</p><p>The second pool consists of very small ORFs (‘dwarf smORFs’) that code for around 20 amino acids, which are not translated so often and do not show many similarities with other species.</p><p>As the findings of Aspden et al. suggest that a large fraction of <italic>Drosophila</italic> small ORFs are translated into proteins, the next challenge will be to determine the roles of these small proteins in cells.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03528.002">http://dx.doi.org/10.7554/eLife.03528.002</ext-link></p></abstract><kwd-group kwd-group-type="author-keywords"><title>Author keywords</title><kwd>small open reading Frames</kwd><kwd>non-coding RNAs</kwd><kwd>transmembrane peptides</kwd></kwd-group><kwd-group kwd-group-type="research-organism"><title>Research organism</title><kwd><italic>D. melanogaster</italic></kwd></kwd-group><funding-group><award-group id="par-1"><funding-source><institution-wrap><institution-id institution-id-type="FundRef">http://dx.doi.org/10.13039/100004440</institution-id><institution>Wellcome Trust</institution></institution-wrap></funding-source><award-id>087516</award-id><principal-award-recipient><name><surname>Couso</surname><given-names>Juan-Pablo</given-names></name></principal-award-recipient></award-group><funding-statement>The funder had no role in study design, data collection and interpretation, or the decision to submit the work for publication.</funding-statement></funding-group><custom-meta-group><custom-meta><meta-name>elife-xml-version</meta-name><meta-value>2</meta-value></custom-meta><custom-meta specific-use="meta-only"><meta-name>Author impact statement</meta-name><meta-value>Thousands of small Open Reading Frames are translated, and form two distinct classes based on their translational efficiency and bioinformatic indicators.</meta-value></custom-meta></custom-meta-group></article-meta></front><body><sec id="s1" sec-type="intro"><title>Introduction</title><p>Small open reading frames (smORFs) of fewer than 100 amino acids exist in eukaryotic genomes in hundreds of thousands, but their annotation has been hindered by their size: short sequences are unable to obtain the high conservation scores that are the accepted indicator of functionality (<xref ref-type="bibr" rid="bib34">Ladoukakis et al., 2011</xref>). This handicap, coupled with the limited numbers of experimentally proven functional smORFs in eukaryotes (<xref ref-type="bibr" rid="bib27">Kastenmayer et al., 2006</xref>; <xref ref-type="bibr" rid="bib19">Galindo et al., 2007</xref>; <xref ref-type="bibr" rid="bib29">Kondo et al., 2007</xref>; <xref ref-type="bibr" rid="bib21">Hanada et al., 2012</xref>; <xref ref-type="bibr" rid="bib36">Magny et al., 2013</xref>; <xref ref-type="bibr" rid="bib37">Pauli et al., 2014</xref>; reviewed in <xref ref-type="bibr" rid="bib1">Andrews and Rothnagel, 2014</xref>), has so far precluded the reliable annotation of smORFs as coding sequences. Targeted bioinformatic approaches predict that hundreds of smORFs could be translated and functional in bacteria (<xref ref-type="bibr" rid="bib23">Hemm et al., 2008</xref>), yeast (<xref ref-type="bibr" rid="bib3">Basrai et al., 1997</xref>; <xref ref-type="bibr" rid="bib27">Kastenmayer et al., 2006</xref>), plants (<xref ref-type="bibr" rid="bib21">Hanada et al., 2012</xref>) and metazoans including <italic>Drosophila</italic> (<xref ref-type="bibr" rid="bib34">Ladoukakis et al., 2011</xref>), mouse (<xref ref-type="bibr" rid="bib17">Frith et al., 2006</xref>; <xref ref-type="bibr" rid="bib10">Crappe et al., 2013</xref>), and humans (<xref ref-type="bibr" rid="bib42">Slavoff et al., 2013</xref>), but this is in contrast to their low rate of detection in biochemical (proteomic) (<xref ref-type="bibr" rid="bib16">Falth et al., 2006</xref>; <xref ref-type="bibr" rid="bib42">Slavoff et al., 2013</xref>) and functional (genetic) (<xref ref-type="bibr" rid="bib33">Kumar et al., 2002</xref>) screens. Thus, genome-wide assessments of smORF translation in a number of species are needed to determine the actual number of translated smORFs in eukaryotic genomes.</p><p>The new technique of ribosome profiling has corroborated and expanded the proteomes of yeast (<xref ref-type="bibr" rid="bib25">Ingolia et al., 2009</xref>; <xref ref-type="bibr" rid="bib13">Duncan and Mata, 2014</xref>; <xref ref-type="bibr" rid="bib43">Smith et al., 2014</xref>), mouse (<xref ref-type="bibr" rid="bib26">Ingolia et al., 2011</xref>), zebrafish (<xref ref-type="bibr" rid="bib7">Chew et al., 2013</xref>; <xref ref-type="bibr" rid="bib4">Bazzini et al., 2014</xref>), and <italic>Drosophila</italic> (<xref ref-type="bibr" rid="bib14">Dunn et al., 2013</xref>). Thousands of new translated sequences have been described in each case, whether novel exons of annotated genes, alternative initiation sites, or entire ORFs. However, the application of ribosome profiling outside canonical translated sequences can lead to differing conclusions (<xref ref-type="bibr" rid="bib7">Chew et al., 2013</xref>; <xref ref-type="bibr" rid="bib20">Guttman et al., 2013</xref>). The problem remains that a ribosomal footprint cannot always be equated with translation; non-productive binding of single ribosomes to mRNAs and scanning 40S ribosomal subunits can result in footprints, yet do not constitute translation. It has been suggested that some smORFs associate with ribosomes in such a non-productive manner and do not undergo productive translation (<xref ref-type="bibr" rid="bib46">Wilson and Masel, 2011</xref>). Moreover, smORF mRNAs are short and present a small target for ribosomal binding and generation of footprints, potentially making traditional ribosome profiling less suitable for the study of smORFs. To distinguish genuine translation events from background, a number of metrics, statistical treatments, and bioinfomatic analyses have been proposed (<xref ref-type="bibr" rid="bib25">Ingolia et al., 2009</xref>; <xref ref-type="bibr" rid="bib7">Chew et al., 2013</xref>; <xref ref-type="bibr" rid="bib20">Guttman et al., 2013</xref>; <xref ref-type="bibr" rid="bib4">Bazzini et al., 2014</xref>). We have taken an alternative approach by enhancing the biochemical foundation of ribosome profiling and developed ‘Poly-Ribo-Seq’, an improvement to ribosome profiling that should be of use for the study of smORFs and canonical, longer ORFs alike. Instead of profiling all ribosomal-bound mRNAs, we perform ribosome footprinting on polysomal fractions. In this way, mRNAs bound by multiple ribosomes and hence actively translated can be isolated and distinguished from mRNAs bound by sporadic, putatively non-productive single ribosomes or ribosomal subunits. Although this method may overlook very short smORFs that cannot fit multiple ribosomes, this loss should be offset by the increased stringency of discarding false positive footprints.</p><p>Application of Poly-Ribo-Seq to <italic>Drosophila</italic> S2 cells reveals extensive translation of thousands of smORFs of two putative types. ‘Longer’ smORFs encode around 80-aa-long peptides, which are translated in the same proportions (83% of transcribed coding sequences) and as efficiently as canonical long ORFs. We nearly double the number of these smORFs shown to be translated, and show that these smORF peptides further resemble canonical proteins in that they are conserved across species, show specific subcellular localisations, and display a specific amino acid composition with the potential to form transmembrane alpha-helices. ‘Dwarf’ smORFs encode peptides in 5′-UTRs and non-coding RNAs, are not detected by gene prediction programs, and are around 20-aa-long. They are less conserved, and translated at lower proportions and efficiencies than canonical proteins.</p><p>We corroborate these findings by two independent methods and observe smORF peptide expression at or near mitochondria. Extrapolation of our results suggests that thousands of smORFs are translated in higher organisms and that smORFs could have diverse functions, including expression of peptides active in cell membranes.</p></sec><sec id="s2" sec-type="results"><title>Results</title><p>We have chosen to assess the translation of smORFs in <italic>Drosophila</italic>, because of the well-annotated genome of this organism and the availability of an equally well-characterised standard cell line (S2 cells) (<xref ref-type="bibr" rid="bib40">Schneider, 1972</xref>) providing abundant and reproducible material.</p><p>The annotation of the <italic>Drosophila</italic> genome contains double the proportion of predicted smORF-encoding genes than other metazoan genomes (some 829 smORF genes, or 4% of the total, <xref ref-type="table" rid="tbl1">Table 1</xref>) (FlyBase, Ensembl). However, closer scrutiny reveals that although these genes have well-corroborated transcriptional data (modENCODE), less than a quarter of these have corroborated translation and peptide function. Only 164 annotated smORFs have at least two out of three markers indicating translation or peptide function: (1) molecular GO term indicating protein function (based on direct assays or presence of protein domains); (2) matches with peptides from proteomic experiments; and (3) conservation of the coding sequence beyond insects (<xref ref-type="fig" rid="fig1">Figure 1A</xref>). These ‘corroborated’ smORFs have in most cases a gene name (e.g., <italic>Defensin</italic>) and associated literature. The translation of the remaining 665 putative smORFs is thus not yet fully proven, and in most cases (494 smORFs) no evidence of translation is recorded. The majority of these ‘uncorroborated’ smORFs only have a cognate identifier (e.g., CG34200) and their ‘coding’ status varies between genome releases (unpublished observation). Thus, the <italic>Drosophila</italic> annotated smORFs offer an ideal framework to test for translation of smORFs and their biological importance.<table-wrap id="tbl1" position="float"><object-id pub-id-type="doi">10.7554/eLife.03528.003</object-id><label>Table 1.</label><caption><p>Annotated smORFs in different organisms</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03528.003">http://dx.doi.org/10.7554/eLife.03528.003</ext-link></p></caption><table frame="hsides" rules="groups"><thead><tr><th/><th align="center">smORFs</th><th align="center">ORFs</th><th align="center">% smORFs</th></tr></thead><tbody><tr><td>Drosophila</td><td align="char" char=".">829</td><td align="char" char=".">21,870</td><td align="char" char=".">3.8</td></tr><tr><td>Zebrafish</td><td align="char" char=".">854</td><td align="char" char=".">43,148</td><td align="char" char=".">2.0</td></tr><tr><td>Mouse</td><td align="char" char=".">1131</td><td align="char" char=".">51,745</td><td align="char" char=".">2.2</td></tr><tr><td>Human</td><td align="char" char=".">1938</td><td align="char" char=".">104,109</td><td align="char" char=".">1.9</td></tr></tbody></table></table-wrap><fig-group><fig id="fig1" position="float"><object-id pub-id-type="doi">10.7554/eLife.03528.004</object-id><label>Figure 1.</label><caption><title>Poly-Ribo-Seq of small and large polysomes.</title><p>(<bold>A</bold>) Venn diagram categorising annotated <italic>Drosophila</italic> smORFs as corroborated or uncorroborated based on evidence (FlyBase) from two out of three of: GO molecular function term assignment (green), peptidomic evidence (blue), and conservation outside of insects (red). Based on this, out of the total of 829 annotated smORFs, 665 are uncorroborated, and 494 have no evidence of translation. (<bold>B</bold>) Schematic of Poly-Ribo-Seq with representative UV absorbance profile for sucrose density gradient. Small (purple) and large (blue) polysomes are separated and subject to ribosome footprinting. (<bold>C</bold>) Composite plot from all FlyBase protein-coding genes of Poly-Ribo-Seq read counts across mRNAs in the vicinity of start (upper) and stop codons (lower) in small polysomes. (<bold>D</bold>) Median translational efficiencies of CDS, 5′ and 3′-UTR regions for all protein-coding genes, error bars represent SE.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03528.004">http://dx.doi.org/10.7554/eLife.03528.004</ext-link></p></caption><graphic xlink:href="elife03528f001"/></fig><fig id="fig1s1" position="float" specific-use="child-fig"><object-id pub-id-type="doi">10.7554/eLife.03528.005</object-id><label>Figure 1—figure supplement 1.</label><caption><title>Poly-Ribo-Seq of small and large polysomes.</title><p>(<bold>A</bold>) RT-PCR of RNA recovered from sucrose gradient fractions for one standard ORF mRNA (heph), three annotated smORF mRNAs (CG14818, CG9032, and CG43194) and one long non-coding RNA (roX1), with -RT control. Fractions corresponding to small (purple, 2-6 ribosomes) and large (blue, 7 or more ribosomes) polysomes are indicated. (<bold>B</bold>) Read densities (RPKM) from two biological replicates of the total cytoplasmic mRNA control exhibit very high correlation (R<sup>2</sup> = 0.96). (<bold>C</bold> and <bold>D</bold>) Read density plots showing phasing of ribosome footprinting reads in triplets corresponding to codons in CDS (<bold>C</bold>) and an absence of triplet phasing in 3′-UTRs (<bold>D</bold>) (small polysome data).</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03528.005">http://dx.doi.org/10.7554/eLife.03528.005</ext-link></p></caption><graphic xlink:href="elife03528fs001"/></fig><fig id="fig1s2" position="float" specific-use="child-fig"><object-id pub-id-type="doi">10.7554/eLife.03528.006</object-id><label>Figure 1—figure supplement 2.</label><caption><title>Schematic interpretation of Poly-Ribo-Seq.</title><p>Schematic summary of characterised (<bold>A</bold>–<bold>C</bold>) and theoretical (<bold>D</bold>) translation scenarios. Diagrams of ribosome–mRNA complexes are shown along with the polysome fraction in which it is detected, translational metrics and interpretation of this information, for (<bold>A</bold>–<bold>C</bold>) long canonical ORFs, (<bold>C</bold>) smORFs, and (<bold>D</bold>) canonical ORF containing a theoretical small ORF.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03528.006">http://dx.doi.org/10.7554/eLife.03528.006</ext-link></p></caption><graphic xlink:href="elife03528fs002"/></fig></fig-group></p><sec id="s2-1"><title>Development of ‘Poly-Ribo-Seq’, ribosome profiling of polysome fractions</title><p>Given the controversy over ribosome footprinting on lncRNAs (<xref ref-type="bibr" rid="bib7">Chew et al., 2013</xref>; <xref ref-type="bibr" rid="bib20">Guttman et al., 2013</xref>), we wanted to improve upon the ribosome profiling method to ensure that the RNAs, on which ribosome footprinting occurs, are undergoing active translation. That is to say they are engaged by polysomes rather than just bound by sporadic, putatively non-productive single ribosomes or ribosomal subunits. We therefore developed an approach for performing ribosome profiling on polysome complexes, using a modified ribosome footprinting method. Polysomal fractionation was used to separate RNAs, depending on the number and type of ribosomes bound to them (<xref ref-type="fig" rid="fig1">Figure 1B</xref>). In this way, mRNAs bound by multiple ribosomes and hence actively translated can be isolated and distinguished from mRNAs bound by non-productive 80S ribosomes. This biochemically purified material was then subjected to ribosome profiling, in which the footprinting reaction was optimized for profiling purified polysomal fractions rather than all ribosome-mRNA complexes.</p><p>To specifically enrich for actively translating single smORF-containing mRNAs, over canonical protein-coding mRNAs, we took advantage of the limited space within a smORF (≤303 nt) for ribosomes to be bound. Ribosomes have been reported to reach densities of 1 ribosome every 80 nt (<xref ref-type="bibr" rid="bib2">Arava et al., 2003</xref>), therefore on a smORF the maximum number of ribosomes associated would be five ribosomes (one at the start codon and one every 80 nt). Although RT-PCR of polysome sucrose gradient fractions confirmed that smORF mRNAs were enriched in small polysomes compared to large polysomes (<xref ref-type="fig" rid="fig1s1">Figure 1—figure supplement 1A</xref>), it suggested that <italic>Drosophila</italic> smORFs can be bound by up to six ribosomes, perhaps reflecting a tighter packing of ribosomes. We therefore chose to isolate polysomal fractions containing 2–6 ribosomes/mRNA to enrich for smORFs. These small polysomes can also contain mRNAs for canonical longer ORFs, which would be not fully covered by ribosomes and hence translated at less than maximum level (<xref ref-type="fig" rid="fig1s2">Figure 1—figure supplement 2</xref>).</p><p>We subjected both small and large polysomal fractions to ribosome profiling separately and performed RNAseq on the total cytoplasmic mRNA as a control (<xref ref-type="fig" rid="fig1s1">Figure 1—figure supplement 1B</xref>). Our ‘Poly-Ribo-Seq’ captured regions of active translation as ∼80% of reads mapped to coding sequences of canonical protein-coding genes (<xref ref-type="supplementary-material" rid="SD1-data">Supplementary file 1A</xref>) with read densities dropping off before the start and after stop codons (<xref ref-type="fig" rid="fig1">Figure 1C</xref>). To quantify the translation of individual coding sequences we considered two metrics: (1) the ribosomal density in the ORF (expressed as RPKM) (<xref ref-type="bibr" rid="bib25">Ingolia et al., 2009</xref>) and (2) coverage of the ORF by ribosome footprints (0–1). This metric indicates whether ribosomes bind across the ORF or just in a small fragment of it, which could be due to overlapping or internal ORFs (<xref ref-type="fig" rid="fig1s2">Figure 1—figure supplement 2</xref>). To be considered translated, we required ribosome densities to be at least 11.8 RPKM and footprint coverage of the ORF to be at least 0.57, which are above the 90th percentile of the values we obtained for the 3′-UTRs from canonical coding mRNAs (see ‘Materials and methods’ for a full explanation of filters and cut-offs). These cut-offs are more stringent than previous ribosomal profiling experiments and standard RNAseq practice, and their combination should provide robust identification of transcripts that undergo active translation. To overcome the possible dependence of ribosome density on RNAseq efficiency or transcript abundance (<xref ref-type="bibr" rid="bib20">Guttman et al., 2013</xref>), we also used the relative metric known as translational efficiency (TE), which is the RPKM of ribosome footprints/RPKM of total mRNA control reads (<xref ref-type="bibr" rid="bib25">Ingolia et al., 2009</xref>). We observed that the median TE of all annotated protein-coding transcripts was significantly higher in CDSs compared to 5′- and 3′-UTRs (<xref ref-type="fig" rid="fig1">Figure 1D</xref>) indicating that ‘Poly-Ribo-Seq’ defines regions of active translation. As previously reported for ribosomal profiling, we observe triplet phasing in the mapping of our Poly-Ribo-Seq reads (<xref ref-type="fig" rid="fig1s1">Figure 1—figure supplement 1C</xref>), reflecting the positioning of ribosomes on codons, which is not globally seen in UTRs (<xref ref-type="fig" rid="fig1s1">Figure 1—figure supplement 1D</xref>).</p></sec><sec id="s2-2"><title>Poly-Ribo-Seq detects smORF translation</title><p>Small and large polysomes showed a marked difference in genome-wide ribosomal densities (<xref ref-type="fig" rid="fig2">Figure 2A</xref>), suggesting that these two fractions contain mRNAs translated at different levels. Small polysomes contain mRNAs encoding long ORFs, but these have lower TE than when isolated from large polysomes (<xref ref-type="table" rid="tbl2">Table 2</xref>), confirming that they were bound by fewer ribosomes. As intended, Poly-Ribo-Seq detected smORFs with translation signatures, and these were enriched in small polysomes, which contained double the number and all of the smORFs detected in large polysomes (<xref ref-type="fig" rid="fig2">Figure 2C</xref>). The low smORF TE values in large polysomes are similarly consistent with low levels of smORF mRNAs being present in large polysomal complexes. However, the TE of smORFs from small polysomes is similar to the TE of long ORFs from large polysomes, indicating that smORFs can be translated at similar levels to standard protein-coding ORFs (<xref ref-type="table" rid="tbl2">Table 2</xref>; <xref ref-type="fig" rid="fig1s2">Figure 1—figure supplement 2</xref>).<fig-group><fig id="fig2" position="float"><object-id pub-id-type="doi">10.7554/eLife.03528.007</object-id><label>Figure 2.</label><caption><title>Poly-Ribo-Seq reveals translation of smORFs.</title><p>(<bold>A</bold>) Ribosome footprinting densities (RPKM) from small polysomes correlate poorly with large polysomes (whereas two replicates of total cytoplasmic mRNA controls do, see <xref ref-type="fig" rid="fig1s1">Figure 1—figure supplement 1B</xref>). (<bold>B</bold>) Ribosome footprinting densities (RPKM) from small polysomes correlate highly between two biological replicates (R<sup>2</sup> = 0.83). (<bold>C</bold>) All 106 smORFs detected in large polysomes (blue) were also present in the 191 detected in small polysomes (purple). smORF footprints are much more abundant in small polysomes, as indicated by a higher TE value. (<bold>D</bold>) High coincidence of annotated smORFs detected as translated in three different Poly-Ribo-Seq experiments. Small polysome extensive experiment probes most deeply with 224 smORFs detected as translated (small polysomes: purple, small polysomes extensive: yellow, -rRNA: turquoise). (<bold>E</bold>) Numbers and proportions of transcribed ORFs, which are translated, according to Poly-Ribo-Seq data (translated: green, untranslated: blue). The proportion of annotated smORFs translated is similar to that of standard CDSs. 121 annotated smORFs are newly detected as translated, plus 2708 uORFs and 313 smORFs from ncRNAs. (<bold>F</bold>) Venn diagram showing overlap between Poly-Ribo-Seq (dark green), our mass spectrometry experiments (purple) and Peptide Atlas proteomic data (red).</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03528.007">http://dx.doi.org/10.7554/eLife.03528.007</ext-link></p></caption><graphic xlink:href="elife03528f002"/></fig><fig id="fig2s1" position="float" specific-use="child-fig"><object-id pub-id-type="doi">10.7554/eLife.03528.008</object-id><label>Figure 2—figure supplement 1.</label><caption><title>Poly-Ribo-Seq reveals translation of smORFs.</title><p>(<bold>A</bold>) Results of Poly-Ribo-Seq experiments with all (-rRNA: turquoise), large (blue), and small (purple) polysomes showing the number of canonical protein-coding ORFs (longer than 100 aa) translated and the overlap between experiments. (<bold>B</bold>) Venn diagram showing the overlap in the detection of translation between Poly-Ribo-Seq (dark green) and proteomic experiments (pink). Median RPKMs from Poly-Ribo-Seq are indicated.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03528.008">http://dx.doi.org/10.7554/eLife.03528.008</ext-link></p></caption><graphic xlink:href="elife03528fs003"/></fig></fig-group><table-wrap id="tbl2" position="float"><object-id pub-id-type="doi">10.7554/eLife.03528.009</object-id><label>Table 2.</label><caption><p>Summary of median TEs</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03528.009">http://dx.doi.org/10.7554/eLife.03528.009</ext-link></p></caption><table frame="hsides" rules="groups"><thead><tr><th>Median TE</th><th align="center">Small polysomes</th><th align="center">Large polysomes</th></tr></thead><tbody><tr><td>Annotated smORFs</td><td align="char" char=".">1.131</td><td align="char" char=".">0.265</td></tr><tr><td>standard ORFs</td><td align="char" char=".">0.829</td><td align="char" char=".">1.110</td></tr><tr><td>5′-UTR</td><td align="char" char=".">0.355</td><td align="char" char=".">0.566</td></tr><tr><td>3′-UTR</td><td align="char" char=".">0.162</td><td align="char" char=".">0.196</td></tr><tr><td>uORFs</td><td align="char" char=".">0.276</td><td align="char" char=".">0.347</td></tr><tr><td>ncRNA smORFs</td><td align="char" char=".">0.384</td><td align="char" char=".">0.000</td></tr></tbody></table><table-wrap-foot><fn><p>Median translational efficiency for ORFs in small and large polysomal fractions.</p></fn></table-wrap-foot></table-wrap></p><p>Altogether 191 annotated smORFs passed the cut-off values to be deemed translated in this initial Poly-Ribo-Seq experiment. This is ∼70% of the smORFs transcribed in S2 cells in the total mRNA controls (<xref ref-type="fig" rid="fig2">Figure 2D</xref>, small polysomes). To ensure that initial Poly-Ribo-Seq experiments sequenced to an adequate depth and to potentially extend the catalogue of translated smORFs, we repeated the experiment but exclusively sequenced small polysomes. This extensive small polysome profiling yielded nearly four times the number ORF-mapping reads obtained in the previous small polysome profiling (<xref ref-type="supplementary-material" rid="SD1-data">Supplementary file 1A</xref>) and detected translation of 224 smORFs (<xref ref-type="fig" rid="fig2">Figure 2D</xref>) expanding the number from both experiments to 227, which is 83% of smORFs we observed transcribed in S2 cells (<xref ref-type="fig" rid="fig2">Figure 2E</xref>). The genome-wide distribution of ribosome densities in two independent Poly-Ribo-Seq experiments was strongly correlated (R<sup>2</sup> = 0.83), suggesting that Poly-Ribo-Seq is highly reproducible (<xref ref-type="fig" rid="fig2">Figure 2B</xref>).</p><p>The majority of reads in both our experiments and previous ribosomal profiling consist of rRNA sequences released during footprinting (<xref ref-type="supplementary-material" rid="SD1-data">Supplementary file 1A</xref>; <xref ref-type="bibr" rid="bib26">Ingolia et al., 2011</xref>). This reduces the depth of profiling and could preclude the detection of further smORFs. Therefore, we designed rRNA-depletion beads for use during footprint extraction (‘Materials and methods’, <xref ref-type="supplementary-material" rid="SD2-data">Supplementary file 2</xref>), which produced a marked improvement in the ratio of reads mapping to mRNAs (<xref ref-type="supplementary-material" rid="SD1-data">Supplementary file 1A</xref>) and increased the total number of ORFs detected (<xref ref-type="fig" rid="fig2s1">Figure 2—figure supplement 1A</xref>, −rRNA) but only expanded our overall catalogue of putatively translated smORFs by one (<xref ref-type="fig" rid="fig2">Figure 2D</xref>). The results of the three independent experiments are highly overlapping as 80% of putative translated smORFs were detected in all three data sets (<xref ref-type="fig" rid="fig2">Figure 2D</xref>). By combining the three experiments, we provide evidence that 228 smORFs are translated out of 274 transcribed in S2 cells (83%), which is very similar to the proportion of standard length protein-coding ORFs translated (81%) (<xref ref-type="fig" rid="fig2">Figure 2E</xref>). These similar proportions may indicate the extent of translational regulation in S2 cells. Altogether this data almost doubles the previous repertoire of translated smORFs in <italic>Drosophila</italic> from 164 to 285 (164 previously corroborated [<xref ref-type="fig" rid="fig1">Figure 1A</xref>] and 121 new translated smORFs).</p></sec><sec id="s2-3"><title>Validation of smORF translation</title><p>The high overlap of our experiments suggests that the results do not arise from artefactual random sampling of smORFs, but most likely from the detection of the bona-fide population of annotated smORFs translated in S2 cells. To confirm this and independently validate our data, we compared our results with peptidomics data (Peptide Atlas, <xref ref-type="bibr" rid="bib6">Brunner et al., 2007</xref>). Poly-Ribo-Seq increases nearly fourfold the number of smORFs with evidence of translation in S2 cells from 59 (Peptide Atlas) to 228 (Poly-Ribo-Seq). Poly-Ribo-Seq detects 86% of smORFs with Peptide Atlas evidence in S2 cells (51 out of 59 smORFs; <xref ref-type="fig" rid="fig2">Figure 2F</xref>), whilst only 8 smORFs, which have Peptide Atlas evidence are not shown to be translated by Poly-Ribo-Seq.</p><p>Detection of small peptides requires specific peptidomic methods (<xref ref-type="bibr" rid="bib5">Boerjan et al., 2010</xref>; <xref ref-type="bibr" rid="bib42">Slavoff et al., 2013</xref>), and this could have limited the number of smORF peptides detected in the generic proteomic experiments of Peptide Atlas. Therefore, we specifically searched for smORF peptides by performing mass spectrometry on two biological replicates of S2 cells after gel purifying small proteins 5 to 15 KDa in size, which corresponds to peptides predicted to be 45 to 130 aa in length. We detected a total of 60 annotated smORF peptides, of which 40 are not detected in Peptide Atlas S2 data sets (<xref ref-type="fig" rid="fig2">Figure 2F</xref>), thus bringing the combined pool of smORFs peptides detected by proteomics in S2 cells to 99 (<xref ref-type="fig" rid="fig2s1">Figure 2—figure supplement 1B</xref>). Despite this increase, Poly-Ribo-Seq was still more extensive. Poly-Ribo-Seq revealed 228 smORFs as translated, including 90 of the 99 in the combined proteomics pool (<xref ref-type="fig" rid="fig2s1">Figure 2—figure supplement 1B</xref>), and 59 of the 60 peptides detected by us (<xref ref-type="fig" rid="fig2">Figure 2F</xref>). The Poly-Ribo-Seq RPKM values of smORFs detected by peptidomics are over three times as high as those that are not (<xref ref-type="fig" rid="fig2s1">Figure 2—figure supplement 1B</xref>), suggesting that mass spectrometry detects peptides arising from the most highly translated smORFs, as also observed by other authors (<xref ref-type="bibr" rid="bib6">Brunner et al., 2007</xref>; <xref ref-type="bibr" rid="bib4">Bazzini et al., 2014</xref>).</p><p>To further validate the results of our smORF Poly-Ribo-Seq, we designed a peptide-tagging transfection assay. smORF coding sequences were tagged with a C-terminal FLAG tag lacking its own start codon. The constructs contained the full smORF 5′-UTR (which includes the Kozak and other sequences regulating translation [<xref ref-type="bibr" rid="bib30">Kozak, 2005</xref>]) (<xref ref-type="fig" rid="fig3s1">Figure 3—figure supplement 1A</xref>). The resulting construct was transfected into S2 cells, where any FLAG signal would therefore be the result of smORF translation (<xref ref-type="fig" rid="fig3s1">Figure 3—figure supplement 1A,B</xref>). Transfection and staining of S2 cells with these FLAG-tagged smORFs confirmed the translation of all 12 smORFs tested, which exhibit a range of translational indicators (<xref ref-type="fig" rid="fig3">Figure 3A,B</xref>; <xref ref-type="table" rid="tbl3">Table 3</xref>) and peptidomics evidence, indicating that even lower levels of translation can give rise to smORF peptides detectable by this tagging method. Immunoblotting confirmed the expected sizes of the tagged peptides (<xref ref-type="fig" rid="fig3s1">Figure 3—figure supplement 1C</xref>). Tagged peptides exhibit distinct subcellular localisations, which are suggestive of different peptide functions (<xref ref-type="fig" rid="fig3">Figure 3A,B</xref>; <xref ref-type="table" rid="tbl3">Table 3</xref>). Six smORFs display a reticular distribution resembling mitochondria, an inference supported by their co-localization with the mitochondrial marker Mitotracker Red (<xref ref-type="fig" rid="fig3">Figure 3A</xref> and <xref ref-type="fig" rid="fig3s1">Figure 3—figure supplement 1D,E</xref>; <xref ref-type="table" rid="tbl3">Table 3</xref>) and the available information from homologues of two ‘corroborated’ smORFs in this group, CG32230 and CG14482 (<xref ref-type="bibr" rid="bib44">Tripoli et al., 2005</xref>). Six smORFs exhibited other types of anisotropic cytoplasmic localisation, similarly to ER-expressed Sarcolamban smORF (<xref ref-type="bibr" rid="bib36">Magny et al., 2013</xref>), indicating that they may localise to other cytoplasmic compartments (<xref ref-type="fig" rid="fig3">Figure 3B</xref> and <xref ref-type="fig" rid="fig3s1">Figure 3—figure supplement 1D,E</xref>; <xref ref-type="table" rid="tbl3">Table 3</xref>).<fig-group><fig id="fig3" position="float"><object-id pub-id-type="doi">10.7554/eLife.03528.010</object-id><label>Figure 3.</label><caption><title>Validation of smORF translation by tagging assay.</title><p>(<bold>A</bold>–<bold>D</bold>) Ribosome footprints from small polysomes (pink) and mRNA reads (grey) mapped to smORFs, along with transcript and ORF models of (<bold>A</bold>) CG7630, (<bold>B</bold>) CG33774, (<bold>C</bold>) CR30055 (ncRNA), and (<bold>D</bold>) FBtr0072084_1 (uORF). Corresponding transfection assays in S2 cells are shown (FLAG antibody: green, F-actin stained with phalloidin: red, scale bars = 5 μm) together with Poly-Ribo-Seq metrics (RPKM, coverage and TE). Distribution of each peptide (reticular, other cytoplasmic or limited) is indicated.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03528.010">http://dx.doi.org/10.7554/eLife.03528.010</ext-link></p></caption><graphic xlink:href="elife03528f003"/></fig><fig id="fig3s1" position="float" specific-use="child-fig"><object-id pub-id-type="doi">10.7554/eLife.03528.011</object-id><label>Figure 3—figure supplement 1.</label><caption><title>Validation of smORF translation by tagging assay.</title><p>(<bold>A</bold>) Schematic of the transfection construct into which smORF 5′-UTRs and ORFs (no stop codon) were cloned under the Actin promoter, such as to be fused in frame to a C-terminal FLAG tag, with its own AUG start codon mutated to GCG. (<bold>B</bold>) Transfection negative controls, plasmid with no ORF (nor AUG), plasmid with the full-length <italic>tal</italic> transcript (minus 3′-UTR) with ORF-B tagged with FLAG, which has previously been shown not to be translated (<xref ref-type="bibr" rid="bib19">Galindo et al., 2007</xref>), and a plasmid containing a putative smORF that is transcribed but not translated according to our Poly-Ribo-Seq (Uhg2-ORF1). (<bold>C</bold>) Immunoblot showing translation of FLAG-tagged smORFs (<xref ref-type="table" rid="tbl3">Table 3</xref>) corresponding to predicted sizes, along with β- tubulin loading control. (<bold>D</bold>) Different subcellular localisations of FLAG-tagged smORFs (green) corroborated by double staining with Mitotracker Red (red): “mitochondrial”, “other cytoplasmic” and “limited” (scale bar = 5 μm). (<bold>E</bold>) Correlation analysis of colocalisation between FLAG-tagged smORF peptides and Mitotracker Red, error bars represent SD from three experiments. (<bold>F</bold>) 50% of S2-cell translated smORFs show function in previous RNAi screens (Flymine). (<bold>G</bold>) Translation of FLAG-tagged pncr009:3L (ncRNA) ORFs 1, 2, and 3 in transfection assay with translational metric values shown (FLAG antibody: green, F-actin stained with phalloidin: red, scale bars = 5 μm). (<bold>H</bold>) Immunoblot showing detection of FLAG-tagged ORFs from pncr009:3L and CR30055 with predicted sizes (<xref ref-type="table" rid="tbl4">Table 4</xref>), along with β-tubulin loading control. (<bold>I</bold>) Translation of FLAG-tagged uORFs FBtr0072210_1 and FBtr0081720_1 in transfection assays with translational metric values shown (FLAG antibody: green, F-actin stained with phalloidin: red, scale bars = 5 μm).</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03528.011">http://dx.doi.org/10.7554/eLife.03528.011</ext-link></p></caption><graphic xlink:href="elife03528fs004"/></fig><fig id="fig3s2" position="float" specific-use="child-fig"><object-id pub-id-type="doi">10.7554/eLife.03528.012</object-id><label>Figure 3—figure supplement 2.</label><caption><title>Poly-Ribo-Seq reveals translation of ORFs in ncRNAs.</title><p>(<bold>A</bold>) Read density plot showing phasing of ribosome footprinting reads in the frame of smORFs within CR30055 and pncr009:3L detected as translated and confirmed by FLAG immunofluorescence translation assay. (<bold>B</bold>) Correlation of reads obtained by ORFs after Poly-Ribo-Seq (y axis) with reads obtained by sequencing of polysomal fractions before ribosome footprinting (x axis). The correlation is much stronger for canonical long ORFs and putative smORFs (grey) than for ncRNA ORFs (red). Many ncRNA ORFs below the 11.8 RPKM cut-off used to ascertain translation (green dotted line) can show association with polysomes (high Polysomal RNA RPKM), thus translation of ORFs in ncRNAs does not simply stem from non-coding association with polysomes.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03528.012">http://dx.doi.org/10.7554/eLife.03528.012</ext-link></p></caption><graphic xlink:href="elife03528fs005"/></fig></fig-group><table-wrap id="tbl3" position="float"><object-id pub-id-type="doi">10.7554/eLife.03528.013</object-id><label>Table 3.</label><caption><p>Summary of tagged annotated smORFs</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03528.013">http://dx.doi.org/10.7554/eLife.03528.013</ext-link></p></caption><table frame="hsides" rules="groups"><thead><tr><th/><th>Localization</th><th>Peptidomic evidence</th><th># aa</th><th>RPKM</th><th>Coverage</th><th align="center">TE</th><th align="center">Phast Cons</th></tr></thead><tbody><tr><td><bold>CG32230</bold></td><td><bold>Mitochondrial</bold></td><td><bold>Yes</bold></td><td><bold>83</bold></td><td><bold>539.2</bold></td><td><bold>1.00</bold></td><td align="center"><bold>3.05</bold></td><td align="center"><bold>0.54</bold></td></tr><tr><td><bold>CG14482</bold></td><td><bold>Mitochondrial</bold></td><td><bold>Yes</bold></td><td><bold>57</bold></td><td><bold>600.0</bold></td><td><bold>1.00</bold></td><td align="center"><bold>1.09</bold></td><td align="center"><bold>0.72</bold></td></tr><tr><td>CG44242</td><td>Mitochondrial</td><td>Yes</td><td>70</td><td>152.9</td><td>0.97</td><td align="center">1.75</td><td align="center">0.66</td></tr><tr><td>CG7630</td><td>Mitochondrial</td><td>Yes</td><td>90</td><td>702.2</td><td>1.00</td><td align="center">1.05</td><td align="center">0.64</td></tr><tr><td>CG33199</td><td>Mitochondrial</td><td>No</td><td>79</td><td>95.5</td><td>1.00</td><td align="center">1.17</td><td align="center">0.59</td></tr><tr><td>CG32582</td><td>Mitochondrial</td><td>No</td><td>52</td><td>16.5</td><td>0.57</td><td align="center">2.82</td><td align="center">0.51</td></tr><tr><td><bold>sclA</bold></td><td><bold>Other cytoplasmic</bold></td><td><bold>NA</bold></td><td><bold>28</bold></td><td><bold>NA</bold></td><td><bold>NA</bold></td><td align="center"><bold>NA</bold></td><td align="center"><bold>NA</bold></td></tr><tr><td><bold>CG12384</bold></td><td><bold>Other cytoplasmic</bold></td><td><bold>Yes</bold></td><td><bold>96</bold></td><td><bold>205.6</bold></td><td><bold>1.00</bold></td><td align="center"><bold>1.37</bold></td><td align="center"><bold>0.71</bold></td></tr><tr><td>CG33774</td><td>Other cytoplasmic</td><td>No</td><td>40</td><td>115.3</td><td>1.00</td><td align="center">1.13</td><td align="center">0.73</td></tr><tr><td>CG33170</td><td>Other cytoplasmic</td><td>No</td><td>71</td><td>84.2</td><td>0.84</td><td align="center">0.75</td><td align="center">0.60</td></tr><tr><td>CG34200</td><td>Limited</td><td>Yes</td><td>52</td><td>331.7</td><td>1.00</td><td align="center">1.66</td><td align="center">0.54</td></tr><tr><td>CG32267</td><td>Limited</td><td>Yes</td><td>49</td><td>82.5</td><td>0.97</td><td align="center">1.13</td><td align="center">0.70</td></tr><tr><td>CG33155</td><td>Limited</td><td>No</td><td>60</td><td>33.8</td><td>0.64</td><td align="center">0.88</td><td align="center">0.67</td></tr><tr><td>tal-B</td><td>None</td><td>NA</td><td>49</td><td>NA</td><td>NA</td><td align="center">NA</td><td align="center">NA</td></tr></tbody></table><table-wrap-foot><fn><p>Details of the Poly-Ribo-Seq and transfection translation assay results for the FLAG-tagged smORFs, with RPKM, coverage and TE values. Previously corroborated smORFs (according to <xref ref-type="fig" rid="fig1">Figure 1A</xref>) are in bold. Scl is a positive control and tal-B is a negative control, but both are not endogenously transcribed in S2 cells, hence ‘NA’ Polysomal Ribo-Seq metrics and Peptidomic evidence.</p></fn></table-wrap-foot></table-wrap></p><p>The putative functionality of smORF peptides is further supported by over half of the translated smORFs having revealed a function in high-throughput S2 cell RNAi screens in previous studies (<xref ref-type="bibr" rid="bib39">Schmidt et al., 2012</xref>; <xref ref-type="fig" rid="fig3s1">Figure 3—figure supplement 1F</xref>). The biological relevance of smORFs is also implied by the transcription of 196 of these translated smORFs in embryos (<xref ref-type="supplementary-material" rid="SD1-data">Supplementary file 1B</xref>). 88 of these smORFs are transcribed throughout the whole of embryogenesis, which might be indicative of a basic cellular or housekeeping role, whereas 47 have stage-specific expression, perhaps indicative of a developmental role.</p></sec><sec id="s2-4"><title>Other sources of smORFs</title><p>Non-annotated smORFs were also scrutinized by Poly-Ribo-Seq. Many putative ncRNAs have been annotated as such because no long ORFs have been detected, but they can still contain smORFs. We looked for evidence of translation in 6438 ORFs that initiate with an AUG start codon within ncRNAs. Our total cytoplasmic mRNA data indicate that 125 ncRNA transcripts (containing 918 different ORFs) are transcribed and present in the cytoplasm of S2 cells. 313 smORFs in these transcripts appear translated by Poly-Ribo-Seq (<xref ref-type="fig" rid="fig2">Figure 2E</xref>), but ncRNA smORFs behaved differently from protein-coding and smORF genes. The median translation efficiency of these putative smORFs within ncRNA genes is lower than for canonical genes and annotated smORFs, and in fact is similar to UTRs (<xref ref-type="table" rid="tbl2">Table 2</xref>). In addition, we could not observe nor obtain peptidomics corroboration for the encoded peptides, and the FLAG signal is limited for the majority of such smORFs tested in the transfection assay (<xref ref-type="fig" rid="fig3">Figure 3C</xref>, <xref ref-type="fig" rid="fig3s1">Figure 3—figure supplement 1G</xref>). Yet a sizeable fraction (34%) of non-coding RNA smORFs displayed Poly-Ribo-Seq metrics above our cut-off values indicating translation (<xref ref-type="fig" rid="fig2">Figure 2E</xref>; <xref ref-type="table" rid="tbl4">Table 4</xref>) and some can display FLAG and Western blot signal similar to annotated smORFs (<xref ref-type="fig" rid="fig3">Figure 3C</xref>, <xref ref-type="fig" rid="fig3s1">Figure 3—figure supplement 1G,H</xref>; <xref ref-type="table" rid="tbl4">Table 4</xref>). Further, these positive-testing smORFs from non-coding RNAs show codon read-phasing (<xref ref-type="fig" rid="fig3s2">Figure 3—figure supplement 2A</xref>). These translation events do not necessarily represent ‘background’ translation of non-coding RNAs associated with polysomes. The comparison of ribosomal footprinting reads with reads resulting from the sequencing of RNA from polysomal fractions before footprinting (as in polysomal profiling) shows a high correlation for canonical coding sequences as expected (<xref ref-type="bibr" rid="bib43">Smith et al., 2014</xref>), but not for non-coding RNAs, where high RNASeq polysomal counts do not necessarily result in significant footprinting (<xref ref-type="fig" rid="fig3s2">Figure 3—figure supplement 2B</xref>). Altogether, our results suggest that a proportion of these so-called non-coding RNA genes actually contain smORFs that are actively translated in S2 cells.<table-wrap id="tbl4" position="float"><object-id pub-id-type="doi">10.7554/eLife.03528.014</object-id><label>Table 4.</label><caption><p>Summary of tagged smORFs from non-coding RNAs and uORFs</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03528.014">http://dx.doi.org/10.7554/eLife.03528.014</ext-link></p></caption><table frame="hsides" rules="groups"><thead><tr><th>smORF</th><th>Localization</th><th>Peptidomic evidence</th><th># aa</th><th>RPKM</th><th>Coverage</th><th>TE</th><th>PhastCons</th></tr></thead><tbody><tr><td>pncr009:3L ORF1</td><td>Other cytoplasmic</td><td>No</td><td align="char" char=".">21</td><td align="char" char=".">135.7</td><td align="char" char=".">1.00</td><td align="char" char=".">0.29</td><td align="char" char=".">0.44</td></tr><tr><td>pncr009:3L ORF2</td><td>Limited</td><td>No</td><td align="char" char=".">30</td><td align="char" char=".">64.7</td><td align="char" char=".">0.58</td><td align="char" char=".">0.63</td><td align="char" char=".">0.49</td></tr><tr><td>pncr009:3L ORF3</td><td>Limited</td><td>No</td><td align="char" char=".">33</td><td align="char" char=".">47.8</td><td align="char" char=".">0.78</td><td align="char" char=".">0.23</td><td align="char" char=".">0.59</td></tr><tr><td>CR30055 ORF1</td><td>Not tested</td><td>No</td><td align="char" char=".">12</td><td align="char" char=".">15.2</td><td align="char" char=".">0.71</td><td align="char" char=".">1.24</td><td align="char" char=".">0.49</td></tr><tr><td>CR30055 ORF2</td><td>Mitochondrial</td><td>No</td><td align="char" char=".">53</td><td align="char" char=".">26.1</td><td align="char" char=".">0.66</td><td align="char" char=".">0.83</td><td align="char" char=".">0.52</td></tr><tr><td>CR30055 ORF3</td><td>Not tested</td><td>No</td><td align="char" char=".">36</td><td align="char" char=".">54.6</td><td align="char" char=".">0.85</td><td align="char" char=".">2.90</td><td align="char" char=".">0.55</td></tr><tr><td>CR30055 ORF4</td><td>Limited</td><td>No</td><td align="char" char=".">17</td><td align="char" char=".">30.0</td><td align="char" char=".">0.64</td><td align="char" char=".">NA</td><td align="char" char=".">0.54</td></tr><tr><td>CR30055 ORF5</td><td>Limited</td><td>No</td><td align="char" char=".">56</td><td align="char" char=".">28.0</td><td align="char" char=".">0.85</td><td align="char" char=".">3.7</td><td align="char" char=".">0.55</td></tr><tr><td>Uhg2-ORF 1</td><td>None</td><td>No</td><td align="char" char=".">36</td><td align="char" char=".">10.5</td><td align="char" char=".">0.27</td><td align="char" char=".">0.83</td><td align="char" char=".">0.54</td></tr><tr><td>FBtr 0072084_1</td><td>Reticular</td><td>No</td><td align="char" char=".">14</td><td align="char" char=".">46.8</td><td align="char" char=".">0.76</td><td align="char" char=".">4.35</td><td align="char" char=".">0.52</td></tr><tr><td>FBtr 0072210_1</td><td>Other cytoplasmic</td><td>No</td><td align="char" char=".">13</td><td align="char" char=".">97.7</td><td align="char" char=".">0.92</td><td align="char" char=".">4.34</td><td align="char" char=".">0.48</td></tr><tr><td>FBtr 0081720_1</td><td>Limited</td><td>No</td><td align="char" char=".">11</td><td align="char" char=".">121.3</td><td align="char" char=".">1.00</td><td align="char" char=".">2.39</td><td align="char" char=".">0.55</td></tr></tbody></table><table-wrap-foot><fn><p>Details of the Poly-Ribo-Seq and transfection translation assay results for the FLAG-tagged smORFs translated from non-coding RNAs and uORFs, with RPKM, coverage and TE values.</p></fn></table-wrap-foot></table-wrap></p><p>Upstream short ORFs, or uORFs, have been described in more than 50% of annotated mammalian transcripts encoding canonical, long ORFs (<xref ref-type="bibr" rid="bib18">Fritsch et al., 2012</xref>). We identified 14,881 uORFs with AUG start codons, within 11,587 5′-UTRs of 28,529 FlyBase annotated transcripts. 9069 of these uORFs were transcribed in S2 cells and of these 2708 (30%) are footprinted by ribosomes (<xref ref-type="fig" rid="fig2">Figure 2E</xref>). Similarly to smORFs in putative non-coding RNAs, translated uORFs display lower median TE than canonical ORFs and translated smORFs (<xref ref-type="table" rid="tbl2">Table 2</xref>), and they are not detected by peptidomics, altogether suggesting low abundance of the encoded peptides. However, tagging of uORFs can occasionally show similar signal to annotated smORFs (<xref ref-type="fig" rid="fig3">Figure 3D</xref>, <xref ref-type="fig" rid="fig3s1">Figure 3—figure supplement 1I</xref>).</p></sec><sec id="s2-5"><title>Bioinformatic analysis of translated smORFs reveals specific characteristics</title><p>We scrutinised our set of annotated translated smORFs for bioinformatic markers, which might further suggest function of smORF peptides. Firstly, we used phastCons (<xref ref-type="bibr" rid="bib41">Siepel et al., 2005</xref>) that measures conservation between 12 insect species.</p><p>We examined the phastCons values in intergenic sequences and canonical long protein-coding sequences (<xref ref-type="fig" rid="fig4">Figure 4A</xref>) and obtained a cut-off value of 0.55 separating them (10% FDR). 93% of S2-translated smORFs have phastCons scores above this threshold (median = 0.66), indicating a conservation level similar to that of canonical long-ORFs, and hence, a similar level of functionality for the coding sequences.<fig-group><fig id="fig4" position="float"><object-id pub-id-type="doi">10.7554/eLife.03528.015</object-id><label>Figure 4.</label><caption><title>Bioinformatic indicators of smORFs.</title><p>(<bold>A</bold>) Distribution of phastCons scores for intergenic regions, standard length protein-coding CDSs (longer than 100 aa), S2 cell-translated annotated smORFs, and all annotated smORFs, with fitted normal curves. Green dotted lines indicate the 90th percentile of intergenic phastCons scores (0.55). (<bold>B</bold>) Relative abundance of particular amino acids in proteins (random expected: black, all CDSs: purple, all annotated smORFs: yellow, and translated smORFs: red). (<bold>C</bold> and <bold>D</bold>) Proportion of (<bold>C</bold>) S2-cell translated (32%) and (<bold>D</bold>) all smORFs (32%) predicted to contain transmembrane α helices (TMHMM). (<bold>E</bold> and <bold>F</bold>) Frequency distribution of smORF peptide lengths for (<bold>E</bold>) translated and (<bold>F</bold>) all annotated smORFs with medians shown by red dotted line.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03528.015">http://dx.doi.org/10.7554/eLife.03528.015</ext-link></p></caption><graphic xlink:href="elife03528f004"/></fig><fig id="fig4s1" position="float" specific-use="child-fig"><object-id pub-id-type="doi">10.7554/eLife.03528.016</object-id><label>Figure 4—figure supplement 1.</label><caption><title>Bioinformatic indicators of smORFs.</title><p>(<bold>A</bold>) Relative abundance of all amino acids in ORFs, (random: grey, all CDS: purple, all annotated smORFs: yellow, and translated annotated smORFs: red). (<bold>B</bold>) Enrichment of GO molecular function terms (GOrilla) within translated annotated smORFs in S2 cells when compared to translated standard protein-coding ORFs. Main overrepresented terms are structural consitituents of ribosome (p = 3.28E-4), oxidoreductase activity and transmembrane transporter activity (p = 2.77E-5). (<bold>C</bold>–<bold>D</bold>) Frequency distribution of peptide lengths, phastCons, and relative abundance of particular amino acids of translated (<bold>C</bold>) uORFs and (<bold>D</bold>) ncRNA ORFs. Red dotted lines indicate the median amino acid lengths and green dotted lines indicate the 90th percentile cut-off from phastCons of intergenic regions, 0.55 (<xref ref-type="fig" rid="fig4">Figure 4A</xref>).</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03528.016">http://dx.doi.org/10.7554/eLife.03528.016</ext-link></p></caption><graphic xlink:href="elife03528fs006"/></fig></fig-group></p><p>As a further indicator of smORF translation, we studied the amino acid composition of translated smORFs, compared to canonical long proteins and expected random usage (<xref ref-type="fig" rid="fig4">Figure 4B</xref>). Annotated smORFs display a lower than random usage of arginine, which is a hallmark of translated proteins (<xref ref-type="bibr" rid="bib28">King and Jukes, 1969</xref>). However, they also display differential usage of several amino acids, which are characteristic of alpha-helices in canonical proteins, being enriched for lysine and phenylalanine, and depleted of serine (<xref ref-type="bibr" rid="bib9">Chou and Fasman, 1974</xref>; <xref ref-type="fig" rid="fig4">Figure 4B</xref>, <xref ref-type="fig" rid="fig4s1">Figure 4—figure supplement 1A</xref>). This finding was corroborated by an abundance of putative transmembrane alpha-helix motifs, in about a third of translated smORFs (<xref ref-type="fig" rid="fig4">Figure 4C</xref>) and all predicted smORFs (<xref ref-type="fig" rid="fig4">Figure 4D</xref>) compared to the expected 20% observed in canonical proteins (<xref ref-type="bibr" rid="bib31">Krogh et al., 2001</xref>). This is in agreement with similar findings in bacteria (<xref ref-type="bibr" rid="bib23">Hemm et al., 2008</xref>) and suggests that smORFs may represent a source of uncharacterised transmembrane peptides. An enrichment for molecular GO terms such as membrane transporter activity in annotated smORFs (<xref ref-type="fig" rid="fig4s1">Figure 4—figure supplement 1B</xref>), and the subcellular localisations we observe for half of the tagged smORFs, are also consistent with these findings.</p><p>The 555 annotated smORFs not transcribed in S2 cells, including 505 smORFs with uncorroborated translation, share the bioinformatic characteristics of the smORFs detected as translated by Poly-Ribo-Seq including: average peptide length (<xref ref-type="fig" rid="fig4">Figure 4E–F</xref>); amino acid usage (<xref ref-type="fig" rid="fig4">Figure 4B</xref>, <xref ref-type="fig" rid="fig4s1">Figure 4—figure supplement 1A</xref>); and abundance of putative transmembrane alpha-helices (<xref ref-type="fig" rid="fig4">Figure 4C,D</xref>). Therefore, our results showing translation and possible peptide function for 83% of the smORFs transcribed in S2 cells could be extrapolated to this wider pool, potentially bringing the number of smORFs encoding functional peptides in <italic>Drosophila</italic> to around 700.</p><p>The peptides encoded by uORFs and ncRNAs did not behave bioinformatically as smORFs and canonical long CDS, and thus we cannot easily extrapolate from our results to the uORFs and ncRNAs not transcribed in S2 cells. No indicator (phastCons, size, aa composition) was able to distinguish translated uORFs and ncRNAs from intergenic or random sequences (<xref ref-type="fig" rid="fig4s1">Figure 4—figure supplement 1C–D</xref>). Even though our Poly-Ribo-Seq detects translation of these uORFs and ncRNAs, perhaps the function of these smORFs is not mediated by their encoded peptides, or at least is compatible with a shorter, more variable and less canonical amino acid sequence (‘Discussion’). It appears that Poly-Ribo-Seq detects translation of two types of smORF: (a) ‘longer’ smORFs that are efficiently translated producing peptides ∼80 aa possessing bioinformatic hallmarks of peptide function; and (b) ‘dwarf’ smORFs translated from ncRNAs and 5′-UTRs, which are in general shorter (∼20 aa), less efficiently translated, and missing such bioinformatic and molecular markers.</p></sec></sec><sec id="s3" sec-type="discussion"><title>Discussion</title><sec id="s3-1"><title>smORFs are translated in high numbers in metazoans</title><p>We have developed an improvement to ribosome profiling, which we term Poly-Ribo-Seq, to ensure that footprinted mRNA sequences represent regions of active translation rather than non-productive events. Using Poly-Ribo-Seq, we have specifically profiled the translation of smORFs in <italic>Drosophila</italic> S2 cells, using a purification of small polysomes to enrich for smORFs.</p><p>Poly-Ribo-Seq doubles the number of annotated smORFs in <italic>Drosophila</italic> S2 cells with evidence of translation from 107 to 228. Translated smORFs seem similar to canonical proteins, both in terms of the fraction actually translated (over 80% in both cases, <xref ref-type="fig" rid="fig2">Figure 2E</xref>) and the level of translation (as revealed by RPKM, coverage, and TE). Extrapolating the proportion of smORFs translated in S2 cells (83%) to the uncorroborrated 544 smORFs transcribed elsewhere, indicates that altogether around 700 annotated smORFs could be translated, which can be tested in further Poly-Ribo-Seq experiments.</p><p>The annotation of the <italic>Drosophila</italic> genome is unusual in its high proportion of annotated smORFs, which is double that of vertebrate genomes (<xref ref-type="table" rid="tbl1">Table 1</xref>). Thus, examination of smORF translation in vertebrate genomes is likely to significantly increase the proteome of these species. Consistent with this prediction the translation of 190 smORFs has been detected in one such experiment in zebrafish (<xref ref-type="bibr" rid="bib4">Bazzini et al., 2014</xref>). Extrapolation of these two sets of results in flies and zebrafish strongly suggests that hundreds of smORFs are translated in higher organisms, including in humans and other mammals. While bioinformatics can be useful in unearthing smORF candidates, ultimately experimental evidence and full functional characterisation is the only way to ascertain the translation and function of each individual smORF.</p><p>Our data confirm the tentative annotation of <italic>Drosophila</italic> smORFs and expand our understanding of smORF translation. Firstly, we corroborate the translation of 121 smORFs for which no previous conclusive evidence existed. Secondly, we fail to detect the translation of some 46 transcribed smORFs, most of which have no evidence of translation, and which (pending new profiling experiments from further biological sources), could be either translationally regulated or non-functional. Thirdly, we also detect the translation of a high number of non-annotated smORFs from 5′-UTRs and non-coding RNAs.</p><p>313 ncRNA smORFs and 2708 uORFs are detected as translated by Poly-Ribo-Seq, although perhaps at low levels, as indicated by their low median TEs. Given the small proportion of non-coding RNAs we detect in S2 cells, it is unclear if the numbers observed here can be extrapolated to the smORFs found in all other ncRNAs. However, we notice that if extrapolated to other animals, such as mammals including humans, even a low number of translated ncRNAs would be significant. The human genome contains some 32,000 transcripts currently annotated as long non-coding RNAs (<xref ref-type="bibr" rid="bib45">Volders et al., 2013</xref>), and a fraction around 44% of long-non-coding RNAs have been detected in the cytoplasm (<xref ref-type="bibr" rid="bib11">Derrien et al., 2012</xref>). If these cytoplasmic lncRNAs were corroborated and translated in the same proportion as we find here in <italic>Drosophila</italic> (some 34%), there could be thousands of human peptides awaiting detection and characterisation. The number swells to many thousands when adding uORFs. The small size of both uORF and ncORF peptides may hinder their detection by mass spectrometry, as in the case of peptides translated by the <italic>Drosophila</italic> genes <italic>tal</italic> and <italic>scl</italic> (unpublished observation), and so absence of proteomic detection (or absence of functional data, see below) should not be used to disprove their translation.</p><p>Poly-Ribo-Seq detects almost all peptides detected by proteomics, but is two to three times more extensive. Furthermore, Poly-Ribo-Seq can define the whole of the translated ORF as opposed to isolated micropeptides detected by peptidomics. Though available proteomic evidence has been useful in confirming the depth of Poly-Ribo-Seq, it is clear that currently peptidomics is not as sensitive as ribosome profiling. High peptide translation level seems a critical factor (but likely not the only one) favouring detection by mass spectrometry. However, the combination of Poly-Ribo-Seq and peptidomics could produce interesting data on peptide stability and degradation.</p></sec><sec id="s3-2"><title>Function of smORFs: translation and beyond?</title><p>The putative function of smORFs and their encoded peptides is a separate issue from their translation, just as the transcription of thousands of apparently non-coding RNAs is an accepted fact separated from their, as yet, not fully understood function. Our present work is concerned with proving smORF translation, as a first step to eventually uncovering their true function. However, the function of a number of smORFs has been identified in animal and plants (reviewed in <xref ref-type="bibr" rid="bib1">Andrews and Rothnagel, 2014</xref>), and our data allow for some speculations.</p><p>We observe that most annotated and translated smORFs have conservation levels similar to canonical proteins and have so far displayed functionality in RNAi tests in some 50% of the cases. They encode peptides of around 80 aa with a high proportion of transmembrane alpha-helix motifs, in agreement with their overall pattern of amino acid composition. Their bioinformatic indicators (conservation, size, aa usage, and transmembrane motifs) appear similar in the 505 uncorroborated smORFs not transcribed in S2 cells, suggesting that, if translated, some 700 smORFs might encode peptides with similar functional potential. The corroborated smORFs display a variety of functions as antibacterial peptides (<xref ref-type="bibr" rid="bib35">Lemaitre and Hoffmann, 2007</xref>), cell signals (<xref ref-type="bibr" rid="bib38">Pueyo and Couso, 2008</xref>), cytoskeletal regulators (<xref ref-type="bibr" rid="bib12">Djakovic et al., 2006</xref>), and other regulators of canonical proteins (<xref ref-type="bibr" rid="bib22">Hanyu-Nakamura et al., 2008</xref>; <xref ref-type="bibr" rid="bib36">Magny et al., 2013</xref>). GO term enrichment analysis shows that the most abundant terms amongst previously corroborated smORFs are ribosome constituents, oxidoreductase activity, and transporter activity (<xref ref-type="fig" rid="fig4s1">Figure 4—figure supplement 1B</xref>). The last two imply an association with biological membranes, and it is thus interesting that amongst uncorroborated smORFs, the frequency of predicted transmembrane alpha-helix motifs is the highest. Furthermore, the observed tagged smORF peptide localization to mitochondria is also compatible with these findings. We surmise that hundreds of small, membrane-associated peptides are awaiting characterisation, and they may alter our understanding of many cellular and organismal processes of biological and medical relevance. We expect that these peptides would interact with canonical proteins as their regulators (<xref ref-type="bibr" rid="bib36">Magny et al., 2013</xref>), as their size limits their structural capabilities; though, such capabilities could be expanded by oligomerization.</p><p>The main described function of most uORFs is to regulate translation of downstream long ORFs, either interfering with the translation of the downstream ORF, or collecting ribosomes in order to promote it. The translated uORF peptide (and hence its aa sequence) can be irrelevant ([<xref ref-type="bibr" rid="bib8">Child et al., 1999</xref>; <xref ref-type="bibr" rid="bib32">Kulkarni et al., 2011</xref>] reviewed in <xref ref-type="bibr" rid="bib30">Kozak, 2005</xref>; <xref ref-type="bibr" rid="bib1">Andrews and Rothnagel, 2014</xref>). Such a <italic>cis</italic>-regulatory role would fit with the low TE and the high sequence variability (low phastCons scores) observed for uORFs. However, we do not observe an overall positive or negative correlation between the translation of uORFs and that of their main downstream ORF (unpublished observation), as perhaps could be expected under this <italic>cis</italic>-regulatory model. Validating uORF function is difficult, since much of the current functional (genetics) data is based on conditions (gene deficiencies, promotor mutations, RNAi) that knock-out whole transcripts, that is, both the short uORF peptide and the downstream long-canonical protein.</p><p>The smORFs in putative long non-coding RNAs with ribosomal signatures might reveal either coding potential, a non-coding association with ribosomes, or a dual function as coding and non-coding for these transcripts. Again, standard genetic techniques are unable to distinguish between these possibilities. This group of smORFs has the potential to encode functional peptides, as shown by the genes <italic>tal</italic> and <italic>scl</italic>, that were previously annotated as putative non-coding RNAs. These genes encode peptides as short as 11 and 28 aa with important functions in development and physiology (<xref ref-type="bibr" rid="bib19">Galindo et al., 2007</xref>; <xref ref-type="bibr" rid="bib29">Kondo et al., 2007</xref>; <xref ref-type="bibr" rid="bib36">Magny et al., 2013</xref>). In the case of Scl, a transmembrane alpha-helical structure was corroborated (<xref ref-type="bibr" rid="bib36">Magny et al., 2013</xref>). Further, a manual study of their homologies identified their conservation in distant species. Thus, a case-by-case bioinformatic and experimental examination of ncRNA smORFs may reveal unknown numbers of new bioactive peptides.</p><p>Altogether our data indicate that thousands of smORFs are translated in metazoan genomes. However, they also suggest the existence of two broad types of translated smORFs. The ‘longer’ smORFs produce conserved 80 aa-long peptides whose translation efficiencies resemble those of canonical proteins and with functions biased towards an association with cell membranes. The ‘dwarf’ smORFs are mostly found in 5′-UTRs and non-coding RNAs, are not detected by gene prediction programs, and on average encode peptides of some 20 aa-long. They are also less conserved, translated at lower efficiencies, and of unclear function as yet.</p></sec></sec><sec id="s4" sec-type="materials|methods"><title>Materials and methods</title><sec id="s4-1"><title>Tissue culture</title><p>S2 cells were grown under standard conditions in Schneiders medium with 10% FBS.</p></sec><sec id="s4-2"><title>Poly-Ribo-Seq</title><p>S2 cells were treated with cycloheximide (Sigma, St Louis, MO) at 100 μg/ml for 3 min at RT before harvesting. The cells were pelleted, washed (1X PBS, 100 μg/ml cycloheximide), and resuspended in lysis buffer; 50 mM Tris–HCl pH8, 150 mM NaCl, 10 mM MgCl<sub>2</sub>, 1 mM DTT, 1% NP40, 100 µg/ml cycloheximide, Turbo DNase (Life Technologies, Carlsbad, CA), RNasin Plus RNase Inhibitor (Promega, Carlsbad, CA), cOmplete Protease Inhibitor (Roche). Nuclei were removed, and cytoplasmic lysates were loaded onto sucrose gradients and subjected to ultracentrifugation. Gradients were pumped out, their absorbance at 254 nm plotted and fractionated. We purified mRNAs in small polysomes, away from monosomes (80S), ribosomal subunits (40S, 60S), and large polysomes. Footprinting was performed overnight at 4°C with RNaseI (Life Technologies), stopped with SUPERase·In RNase inhibitor (Life Technologies) and precipitated. mRNA from total cytoplasmic lysate was purified using oligo (dT) Dynabeads (Life Technologies) and fragmented by alkaline hydrolysis. 28–34 nt ribosome footprints and 50–80 nt mRNA fragments were gel purified and prepared as previously described (<xref ref-type="bibr" rid="bib25">Ingolia et al., 2009</xref>, <xref ref-type="bibr" rid="bib26">2011</xref>, <xref ref-type="bibr" rid="bib24">2012</xref>) for Next Generation Sequencing. Libraries were sequenced on Illumina HiSeq2000 and MiSeq machines with 50 bp SingleEnd read protocol.</p></sec><sec id="s4-3"><title>rRNA depletion</title><p>To generate ssDNA complementary to <italic>Drosophila</italic> rRNA, PCRs were performed using 5′ biotinylated reverse primers (<xref ref-type="supplementary-material" rid="SD2-data">Supplementary file 2</xref>). A 5′ biotinlyated oligo complementary to 2S rRNA and rRNA PCR products were bound to magnetic streptavidin beads (Life Technologies) and their second strands washed away. Two rounds of 50 µl rRNA beads were used to deplete rRNA prior to reverse transcription.</p></sec><sec id="s4-4"><title>RT-PCR</title><p>RNA from sucrose gradient fractions was precipitated with isopropanol and 0.3 M NaCl. Resuspended pellets were treated with Turbo DNaseI (Life Technologies), extracted with phenol/chloroform and re-precipitated. cDNA was synthesised MMLV reverse transcriptase (Promega) and subjected to PCR with mRNA specific primers and Taq Polymerase (Qiagen, Venio, Netherlands).</p></sec><sec id="s4-5"><title>Footprint sequence alignment</title><p>Sequencing reads were clipped, trimmed, and aligned to an rRNA and tRNA reference using Bowtie, discarding the rRNA and tRNA alignments and collecting unaligned reads. Unaligned reads were mapped to FlyBase (Release 5.50) using TopHat. We only retained reads that mapped uniquely, but allowed up to two mismatches.</p></sec><sec id="s4-6"><title>Footprint profile analysis</title><p>Profiles of ribosome footprints across a transcript were constructed by quantifying the number of footprint reads aligned at each position within the feature of interest. Ribosome density was computed by scaling read counts for each feature-by-feature length and by the total number of genome-aligned reads (<xref ref-type="bibr" rid="bib25">Ingolia et al., 2009</xref>). Footprint coverage estimated the percentage of each feature covered by ribosome footprints using the BEDTools coverageBed command.</p><p>We applied several filters to ascertain translation. We have taken the reads in the 3′-UTRs of mRNAs encoding canonical coding sequences (longer than 100 aa), from the small polysomal fraction, as representing ‘background’, that is likely non-coding sequences from mRNAs that are lowly translated. We obtained the RPKM and coverage values from this canonical 3′-UTR signal, and use their 90th percentile values as preliminary cut-offs for accepting translation of coding sequences. These values are 11.8 RPKM and 0.40 coverage. Superimposed onto these we use two additional corrections: first, we raised the coverage cut-off to 0.57 because ‘dwarf’ smORFs of less than 20 aa (<xref ref-type="fig" rid="fig4s1">Figure 4—figure supplement 1C</xref>) containing a single ribosomal binding ‘site’ could still appear as 0.56 covered (32 nt/57 nt); second, we introduce the need for an ORF to obtain at least five reads in a single experiment to be considered translated, to avoid inflation of very few reads by the RPKM metric in such dwarf smORFs.</p><p>Translation efficiency (TE) was calculated as ribosome footprint density (RPKM)/mRNA-seq read density (RPKM) in the feature. As the TE score is not a reliable estimator at low expression levels, we computed a TE score only for those features that had significant mRNA expression above a randomised genomic background (p &lt; 0.01).</p><p>To analyse framing, ribosome-protected fragments (RPF) were aligned to transcript cooordinates. For a given open reading frame, the corresponding P-site position of filtered RPF reads (28–32 nt) was designated as follows: +12 offset for 28 and 29 nt, +13 for 30 to 31 nt, and +14 for 32 nt RPF (<xref ref-type="bibr" rid="bib7">Chew et al., 2013</xref>; <xref ref-type="bibr" rid="bib4">Bazzini et al., 2014</xref>).</p></sec><sec id="s4-7"><title>uORF and ncORFs identification</title><p>We identified uORFs and ncORFs longer than 10 aa with an AUG start codon followed by an in-frame stop codon within the annotated 5′-UTRs and ncRNA transcripts, using the emboss getorf program. To exclude the possibility that the ribosome occupancy observed in 5′-UTRs was due to the presence of such upstream ORFs, we created a modified transcript that contained all regions except the putative uORFs for all our analysis on 5′-UTRs.</p></sec><sec id="s4-8"><title>phastCons values</title><p>phastCons scores for 171,317 alignment blocks were downloaded from UCSC Genome Browser. We computed percentage overlap between the phastCons block and our feature of interest and estimated mean phastCons values.</p></sec><sec id="s4-9"><title>Peptide Atlas</title><p>Lists of peptide CDS coordinates with protein identifiers (FlyBase peptide ID) were downloaded from Peptide Atlas database (<ext-link ext-link-type="uri" xlink:href="http://www.peptideatlas.org/">http://www.peptideatlas.org</ext-link>) and compared to FlyBase predicted smORF peptide sequences.</p></sec><sec id="s4-10"><title>Functional analysis of smORFs</title><p>Prediction of transmembrane alpha-helices was performed using TMHMM (<ext-link ext-link-type="uri" xlink:href="http://www.cbs.dtu.dk/services/TMHMM/">http://www.cbs.dtu.dk/services/TMHMM/</ext-link>). In house perl scripts (available in <xref ref-type="supplementary-material" rid="SD3-data">Supplementary file 3</xref>) calculated amino acid composition of CDS. For the random control, we followed <xref ref-type="bibr" rid="bib28">King and Jukes (1969</xref>). We took all FlyBase transcript sequences and calculated the nucleotide composition of this pool; from this we estimated the likely amino acid usage based on the nucleotide composition of the respective codons. RNAi screen data were accessed through Flymine (<ext-link ext-link-type="uri" xlink:href="http://www.flymine.org/">http://www.flymine.org/</ext-link>), and GO term enrichment was calculated by GOrilla (<ext-link ext-link-type="uri" xlink:href="http://cbl-gorilla.cs.technion.ac.il/">http://cbl-gorilla.cs.technion.ac.il/</ext-link>) (<xref ref-type="bibr" rid="bib15">Eden et al., 2009</xref>).</p></sec><sec id="s4-11"><title>Cloning</title><p>The 5′-UTR and CDS of putative smORFs were cloned by PCR from S2 cell cDNA into pENTR/D-TOPO (Invitrogen) and then into pAWF (<ext-link ext-link-type="uri" xlink:href="http://emb.carnegiescience.edu/labs/murphy/Gateway%20vectors.html#">http://emb.carnegiescience.edu/labs/murphy/Gateway%20vectors.html#</ext-link>), whose ATG start codon was mutated to GCG by site-directed mutagenesis.</p></sec><sec id="s4-12"><title>Transfections and microscopy</title><p>S2 cells were plated on acid-treated coverslips and transfected with plasmid DNA using Xtreme Gene HP (Roche). After 48 hr, the cells were fixed for 20 min with 4% formaldehyde, washed with 1X PBS, 0.1% Triton X–100 (PBS-T), blocked with PBS-T 2% wt/vol BSA before immunostaining with primary mouse anti-FLAG M2 antibody (Sigma) at 1/1000 and secondary anti-mouse FITC (Jackson, West Grove, PA) at 1/400. For subcellular localisation experiments, cells were incubated for 30 min with Rhodamine-Phalloidin (Life Technologies) to highlight F-actin. All transfections were incubated for 10 min with Hoechst (Sigma) according to manufacturer's instructions for nuclei staining and mounted with Vectashield (Vector Labs, Burlingame, CA). For Mitotracker experiments, 48 hr after transfection cells were incubated in 500 nM Mitotracker Red CMXRos (Life Technologies) for 45 min. Imaging was conducted using a Zeiss 63X Plan Apochromat Oil Immersion lens on the LSM510 Axioskop 2. For correlation analysis Z-stack images were taken with slice interval of 0.15 μm, and ImageJ plugin ‘Manders Coefficients’ was used to calculate correlation coefficient of FLAG to Mitotracker signal with at least 15 cells per replicate, with three replicate transfections.</p></sec><sec id="s4-13"><title>Immunoblotting</title><p>Cells were harvested 48 hr post transfection, washed with 1X PBS, resuspended in Tricine Sample buffer (Bio-Rad, Hercules, CA) (2.5% vol/vol βME) and run on 16% Tris-Tricine gels. Immunoblots were incubated with primary antibody: 1:10,000 anti-FLAG M2 (Sigma) and 1:500 anti-β-tubulin E7 (DSHB, Iowa City, IA), and then secondary 1:10,000 goat anti-mouse HRP (Santa Cruz, Dallas, TX). Immunoblots were developed with ECL Prime Chemiluminescent Detection Reagent (GE Healthcare, Little Chalfont, UK).</p></sec><sec id="s4-14"><title>Mass spectrometry</title><p>S2 cells were lysed in 0.075% SDS, 1X c0mplete Protease Inhibitor Cocktail (Roche) with three rounds of freeze thawing and clarified. Total protein (1X Tricine Loading buffer, 2.5% vol/vol βME) was run on 10–20% MiniProtean Tris-Tricine Gels (Bio-Rad) and the 5–15 KDa region excised. Mass spectrometry was performed by Cambridge Centre for Proteomics (University of Cambridge, UK) using in-gel trypsin digestion and LC-ESI-MS/MS using an Orbitrap Velos Instrument (Thermo Fisher Scientific) with the following parameters: 2 missed Trypsin cleavages, 25 ppm Precursor mass error, 0.8 Da fragment mass tolerance, carbamidomethylation of cysteine as a fixed and methionine oxidation as a variable modification. Spectra were matched against <italic>Drosophila melanogaster</italic> (5.55) proteome using generic Mascot algorithm.</p></sec></sec></body><back><ack id="ack"><title>Acknowledgements</title><p>We thank Simon Morley and Nick Ingolia for help with protocols, Claudio Alonso, Jose Ignacio Pueyo, and Emile Magny for manuscript comments. This work was funded by a Wellcome Trust Senior Fellowship (ref 087516).</p></ack><sec sec-type="additional-information"><title>Additional information</title><fn-group content-type="competing-interest"><title>Competing interests</title><fn fn-type="conflict" id="conf1"><p>The authors declare that no competing interests exist.</p></fn></fn-group><fn-group content-type="author-contribution"><title>Author contributions</title><fn fn-type="con" id="con1"><p>JLA, Conception and design, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article</p></fn><fn fn-type="con" id="con2"><p>YCE-W, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article, Contributed unpublished essential data or reagents</p></fn><fn fn-type="con" id="con3"><p>RJP, Acquisition of data, Contributed unpublished essential data or reagents</p></fn><fn fn-type="con" id="con4"><p>UA, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article</p></fn><fn fn-type="con" id="con5"><p>MASM, Acquisition of data, Analysis and interpretation of data</p></fn><fn fn-type="con" id="con6"><p>MB, Conception and design, Contributed unpublished essential data or reagents</p></fn><fn fn-type="con" id="con7"><p>J-PC, Conception and design, Analysis and interpretation of data, Drafting or revising the article</p></fn></fn-group></sec><sec sec-type="supplementary-material"><title>Additional files</title><supplementary-material id="SD1-data"><object-id pub-id-type="doi">10.7554/eLife.03528.017</object-id><label>Supplementary file 1.</label><caption><p>(<bold>A</bold>) Summary of sequencing experiments. Number of reads; from each experiment, that are left after removal of rRNA and tRNA contaminants, that are unique matches and that map to CDS regions of the genome. (<bold>B</bold>) Summary of smORF embryo RNA-seq data. Number of translated smORFs expressed throughout embryonic stages of <italic>Drosophila melanogaster,</italic> according to RNAseq data (modENCODE).</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03528.017">http://dx.doi.org/10.7554/eLife.03528.017</ext-link></p></caption><media mime-subtype="docx" mimetype="application" xlink:href="elife03528s001.docx"/></supplementary-material><supplementary-material id="SD2-data"><object-id pub-id-type="doi">10.7554/eLife.03528.018</object-id><label>Supplementary file 2.</label><caption><p>Primers used for rRNA depletion.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03528.018">http://dx.doi.org/10.7554/eLife.03528.018</ext-link></p></caption><media mime-subtype="docx" mimetype="application" xlink:href="elife03528s002.docx"/></supplementary-material><supplementary-material id="SD3-data"><object-id pub-id-type="doi">10.7554/eLife.03528.019</object-id><label>Supplementary file 3.</label><caption><p>In house Perl scripts.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03528.019">http://dx.doi.org/10.7554/eLife.03528.019</ext-link></p></caption><media mime-subtype="zip" mimetype="application" xlink:href="elife03528s003.zip"/></supplementary-material><sec sec-type="datasets"><title>Major datasets</title><p>The following dataset was generated:</p><p><related-object content-type="generated-dataset" source-id="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE60384" source-id-type="uri" id="dataro1"><collab collab-type="author">Aspden JL</collab>, <collab collab-type="author">Eyre-Walker YC</collab>, <collab collab-type="author">Couso JP</collab>, <year>2014</year><x>, </x><source>Extensive translation of small Open Reading Frames revealed by Poly-Ribo-Seq</source><x>, </x><ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE60384">http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE60384</ext-link><x>, </x><comment>Publicly available at NCBI Gene Expression Omnibus.</comment></related-object></p><p>The following previously published dataset was used:</p><p><related-object content-type="existing-dataset" source-id="http://flybase.org/" source-id-type="uri" id="dataro2"><collab>The Flybase Consortium</collab>, <year>1999</year><x>, </x><source>Flybase</source><x>, </x><ext-link ext-link-type="uri" xlink:href="http://flybase.org/">http://flybase.org/</ext-link><x>, </x><comment><ext-link ext-link-type="uri" xlink:href="http://flybase.org/wiki/FlyBase:About#FlyBase_Copyright">http://flybase.org/wiki/FlyBase:About#FlyBase_Copyright</ext-link>.</comment></related-object></p></sec></sec><ref-list><title>References</title><ref id="bib1"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Andrews</surname><given-names>SJ</given-names></name><name><surname>Rothnagel</surname><given-names>JA</given-names></name></person-group><year>2014</year><article-title>Emerging evidence for functional peptides encoded by short open reading frames</article-title><source>Nature Reviews Genetics</source><volume>15</volume><fpage>193</fpage><lpage>204</lpage><pub-id pub-id-type="doi">10.1038/nrg3520</pub-id></element-citation></ref><ref id="bib2"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Arava</surname><given-names>Y</given-names></name><name><surname>Wang</surname><given-names>Y</given-names></name><name><surname>Storey</surname><given-names>JD</given-names></name><name><surname>Liu</surname><given-names>CL</given-names></name><name><surname>Brown</surname><given-names>PO</given-names></name><name><surname>Herschlag</surname><given-names>D</given-names></name></person-group><year>2003</year><article-title>Genome-wide analysis of mRNA translation profiles in <italic>Saccharomyces cerevisiae</italic></article-title><source>Proceedings of the National Academy of Sciences of USA</source><volume>100</volume><fpage>3889</fpage><lpage>3894</lpage><pub-id pub-id-type="doi">10.1073/pnas.0635171100</pub-id></element-citation></ref><ref id="bib3"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Basrai</surname><given-names>MA</given-names></name><name><surname>Hieter</surname><given-names>P</given-names></name><name><surname>Boeke</surname><given-names>JD</given-names></name></person-group><year>1997</year><article-title>Small open reading frames: beautiful needles in the haystack</article-title><source>Genome Research</source><volume>7</volume><fpage>768</fpage><lpage>771</lpage><pub-id pub-id-type="doi">10.1101/gr.7.8.768</pub-id></element-citation></ref><ref id="bib4"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Bazzini</surname><given-names>AA</given-names></name><name><surname>Johnstone</surname><given-names>TG</given-names></name><name><surname>Christiano</surname><given-names>R</given-names></name><name><surname>Mackowiak</surname><given-names>SD</given-names></name><name><surname>Obermayer</surname><given-names>B</given-names></name><name><surname>Fleming</surname><given-names>ES</given-names></name><name><surname>Vejnar</surname><given-names>CE</given-names></name><name><surname>Lee</surname><given-names>MT</given-names></name><name><surname>Rajewsky</surname><given-names>N</given-names></name><name><surname>Walther</surname><given-names>TC</given-names></name><name><surname>Giraldez</surname><given-names>AJ</given-names></name></person-group><year>2014</year><article-title>Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation</article-title><source>The EMBO Journal</source><volume>33</volume><fpage>981</fpage><lpage>993</lpage><pub-id pub-id-type="doi">10.1002/embj.201488411</pub-id></element-citation></ref><ref id="bib5"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Boerjan</surname><given-names>B</given-names></name><name><surname>Cardoen</surname><given-names>D</given-names></name><name><surname>Bogaerts</surname><given-names>A</given-names></name><name><surname>Landuyt</surname><given-names>B</given-names></name><name><surname>Schoofs</surname><given-names>L</given-names></name><name><surname>Verleyen</surname><given-names>P</given-names></name></person-group><year>2010</year><article-title>Mass spectrometric profiling of (neuro)-peptides in the worker honeybee, <italic>Apis mellifera</italic></article-title><source>Neuropharmacology</source><volume>58</volume><fpage>248</fpage><lpage>258</lpage><pub-id pub-id-type="doi">10.1016/j.neuropharm.2009.06.026</pub-id></element-citation></ref><ref id="bib6"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Brunner</surname><given-names>E</given-names></name><name><surname>Ahrens</surname><given-names>CH</given-names></name><name><surname>Mohanty</surname><given-names>S</given-names></name><name><surname>Baetschmann</surname><given-names>H</given-names></name><name><surname>Loevenich</surname><given-names>S</given-names></name><name><surname>Potthast</surname><given-names>F</given-names></name><name><surname>Deutsch</surname><given-names>EW</given-names></name><name><surname>Panse</surname><given-names>C</given-names></name><name><surname>de Lichtenberg</surname><given-names>U</given-names></name><name><surname>Rinner</surname><given-names>O</given-names></name><name><surname>Lee</surname><given-names>H</given-names></name><name><surname>Pedrioli</surname><given-names>PG</given-names></name><name><surname>Malmstrom</surname><given-names>J</given-names></name><name><surname>Koehler</surname><given-names>K</given-names></name><name><surname>Schrimpf</surname><given-names>S</given-names></name><name><surname>Krijgsveld</surname><given-names>J</given-names></name><name><surname>Kregenow</surname><given-names>F</given-names></name><name><surname>Heck</surname><given-names>AJ</given-names></name><name><surname>Hafen</surname><given-names>E</given-names></name><name><surname>Schlapbach</surname><given-names>R</given-names></name><name><surname>Aebersold</surname><given-names>R</given-names></name></person-group><year>2007</year><article-title>A high-quality catalog of the <italic>Drosophila melanogaster</italic> proteome</article-title><source>Nature Biotechnology</source><volume>25</volume><fpage>576</fpage><lpage>583</lpage><pub-id pub-id-type="doi">10.1038/nbt1300</pub-id></element-citation></ref><ref id="bib7"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Chew</surname><given-names>GL</given-names></name><name><surname>Pauli</surname><given-names>A</given-names></name><name><surname>Rinn</surname><given-names>JL</given-names></name><name><surname>Regev</surname><given-names>A</given-names></name><name><surname>Schier</surname><given-names>AF</given-names></name><name><surname>Valen</surname><given-names>E</given-names></name></person-group><year>2013</year><article-title>Ribosome profiling reveals resemblance between long non-coding RNAs and 5' leaders of coding RNAs</article-title><source>Development</source><volume>140</volume><fpage>2828</fpage><lpage>2834</lpage><pub-id pub-id-type="doi">10.1242/dev.098343</pub-id></element-citation></ref><ref id="bib8"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Child</surname><given-names>SJ</given-names></name><name><surname>Miller</surname><given-names>MK</given-names></name><name><surname>Geballe</surname><given-names>AP</given-names></name></person-group><year>1999</year><article-title>Translational control by an upstream open reading frame in the HER-2/neu transcript</article-title><source>The Journal of Biological Chemistry</source><volume>274</volume><fpage>24335</fpage><lpage>24341</lpage><pub-id pub-id-type="doi">10.1016/S0092-8674(01)00211-2</pub-id></element-citation></ref><ref id="bib9"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Chou</surname><given-names>PY</given-names></name><name><surname>Fasman</surname><given-names>GD</given-names></name></person-group><year>1974</year><article-title>Conformational parameters for amino acids in helical, beta-sheet, and random coil regions calculated from proteins</article-title><source>Biochemistry</source><volume>13</volume><fpage>211</fpage><lpage>222</lpage><pub-id pub-id-type="doi">10.1021/bi00699a001</pub-id></element-citation></ref><ref id="bib10"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Crappe</surname><given-names>J</given-names></name><name><surname>Van Criekinge</surname><given-names>W</given-names></name><name><surname>Trooskens</surname><given-names>G</given-names></name><name><surname>Hayakawa</surname><given-names>E</given-names></name><name><surname>Luyten</surname><given-names>W</given-names></name><name><surname>Baggerman</surname><given-names>G</given-names></name><name><surname>Menschaert</surname><given-names>G</given-names></name></person-group><year>2013</year><article-title>Combining in silico prediction and ribosome profiling in a genome-wide search for novel putatively coding sORFs</article-title><source>BMC Genomics</source><volume>14</volume><fpage>648</fpage><pub-id pub-id-type="doi">10.1186/1471-2164-14-648</pub-id></element-citation></ref><ref id="bib11"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Derrien</surname><given-names>T</given-names></name><name><surname>Johnson</surname><given-names>R</given-names></name><name><surname>Bussotti</surname><given-names>G</given-names></name><name><surname>Tanzer</surname><given-names>A</given-names></name><name><surname>Djebali</surname><given-names>S</given-names></name><name><surname>Tilgner</surname><given-names>H</given-names></name><name><surname>Guernec</surname><given-names>G</given-names></name><name><surname>Martin</surname><given-names>D</given-names></name><name><surname>Merkel</surname><given-names>A</given-names></name><name><surname>Knowles</surname><given-names>DG</given-names></name><name><surname>Lagarde</surname><given-names>J</given-names></name><name><surname>Veeravalli</surname><given-names>L</given-names></name><name><surname>Ruan</surname><given-names>X</given-names></name><name><surname>Ruan</surname><given-names>Y</given-names></name><name><surname>Lassmann</surname><given-names>T</given-names></name><name><surname>Carninci</surname><given-names>P</given-names></name><name><surname>Brown</surname><given-names>JB</given-names></name><name><surname>Lipovich</surname><given-names>L</given-names></name><name><surname>Gonzalez</surname><given-names>JM</given-names></name><name><surname>Thomas</surname><given-names>M</given-names></name><name><surname>Davis</surname><given-names>CA</given-names></name><name><surname>Shiekhattar</surname><given-names>R</given-names></name><name><surname>Gingeras</surname><given-names>TR</given-names></name><name><surname>Hubbard</surname><given-names>TJ</given-names></name><name><surname>Notredame</surname><given-names>C</given-names></name><name><surname>Harrow</surname><given-names>J</given-names></name><name><surname>Guigó</surname><given-names>R</given-names></name></person-group><year>2012</year><article-title>The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression</article-title><source>Genome Research</source><volume>22</volume><fpage>1775</fpage><lpage>1789</lpage><pub-id pub-id-type="doi">10.1101/gr.132159.111</pub-id></element-citation></ref><ref id="bib12"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Djakovic</surname><given-names>S</given-names></name><name><surname>Dyachok</surname><given-names>J</given-names></name><name><surname>Burke</surname><given-names>M</given-names></name><name><surname>Frank</surname><given-names>MJ</given-names></name><name><surname>Smith</surname><given-names>LG</given-names></name></person-group><year>2006</year><article-title>BRICK1/HSPC300 functions with SCAR and the ARP2/3 complex to regulate epidermal cell shape in Arabidopsis</article-title><source>Development</source><volume>133</volume><fpage>1091</fpage><lpage>1100</lpage><pub-id pub-id-type="doi">10.1242/dev.02280</pub-id></element-citation></ref><ref id="bib13"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Duncan</surname><given-names>CD</given-names></name><name><surname>Mata</surname><given-names>J</given-names></name></person-group><year>2014</year><article-title>The translational landscape of fission-yeast meiosis and sporulation</article-title><source>Nature Structural &amp; Molecular Biology</source><volume>21</volume><fpage>641</fpage><lpage>647</lpage><pub-id pub-id-type="doi">10.1038/nsmb.2843</pub-id></element-citation></ref><ref id="bib14"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Dunn</surname><given-names>JG</given-names></name><name><surname>Foo</surname><given-names>CK</given-names></name><name><surname>Belletier</surname><given-names>NG</given-names></name><name><surname>Gavis</surname><given-names>ER</given-names></name><name><surname>Weissman</surname><given-names>JS</given-names></name></person-group><year>2013</year><article-title>Ribosome profiling reveals pervasive and regulated stop codon readthrough in <italic>Drosophila melanogaster</italic></article-title><source>eLife</source><volume>2</volume><fpage>e01179</fpage><pub-id pub-id-type="doi">10.7554/eLife.01179</pub-id></element-citation></ref><ref id="bib15"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Eden</surname><given-names>E</given-names></name><name><surname>Navon</surname><given-names>R</given-names></name><name><surname>Steinfeld</surname><given-names>I</given-names></name><name><surname>Lipson</surname><given-names>D</given-names></name><name><surname>Yakhini</surname><given-names>Z</given-names></name></person-group><year>2009</year><article-title>GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists</article-title><source>BMC Bioinformatics</source><volume>10</volume><fpage>48</fpage><pub-id pub-id-type="doi">10.1186/1471-2105-10-48</pub-id></element-citation></ref><ref id="bib16"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Falth</surname><given-names>M</given-names></name><name><surname>Skold</surname><given-names>K</given-names></name><name><surname>Norrman</surname><given-names>M</given-names></name><name><surname>Svensson</surname><given-names>M</given-names></name><name><surname>Fenyo</surname><given-names>D</given-names></name><name><surname>Andren</surname><given-names>PE</given-names></name></person-group><year>2006</year><article-title>SwePep, a database designed for endogenous peptides and mass spectrometry</article-title><source>Molecular &amp; Cellular Proteomics</source><volume>5</volume><fpage>998</fpage><lpage>1005</lpage><pub-id pub-id-type="doi">10.1074/mcp.M500401-MCP200</pub-id></element-citation></ref><ref id="bib17"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Frith</surname><given-names>MC</given-names></name><name><surname>Forrest</surname><given-names>AR</given-names></name><name><surname>Nourbakhsh</surname><given-names>E</given-names></name><name><surname>Pang</surname><given-names>KC</given-names></name><name><surname>Kai</surname><given-names>C</given-names></name><name><surname>Kawai</surname><given-names>J</given-names></name><name><surname>Carninci</surname><given-names>P</given-names></name><name><surname>Hayashizaki</surname><given-names>Y</given-names></name><name><surname>Bailey</surname><given-names>TL</given-names></name><name><surname>Grimmond</surname><given-names>SM</given-names></name></person-group><year>2006</year><article-title>The abundance of short proteins in the mammalian proteome</article-title><source>PLOS Genetics</source><volume>2</volume><fpage>e52</fpage><pub-id pub-id-type="doi">10.1371/journal.pgen.0020052</pub-id></element-citation></ref><ref id="bib18"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Fritsch</surname><given-names>C</given-names></name><name><surname>Herrmann</surname><given-names>A</given-names></name><name><surname>Nothnagel</surname><given-names>M</given-names></name><name><surname>Szafranski</surname><given-names>K</given-names></name><name><surname>Huse</surname><given-names>K</given-names></name><name><surname>Schumann</surname><given-names>F</given-names></name><name><surname>Schreiber</surname><given-names>S</given-names></name><name><surname>Platzer</surname><given-names>M</given-names></name><name><surname>Krawczak</surname><given-names>M</given-names></name><name><surname>Hampe</surname><given-names>J</given-names></name><name><surname>Brosch</surname><given-names>M</given-names></name></person-group><year>2012</year><article-title>Genome-wide search for novel human uORFs and N-terminal protein extensions using ribosomal footprinting</article-title><source>Genome Research</source><volume>22</volume><fpage>2208</fpage><lpage>2218</lpage><pub-id pub-id-type="doi">10.1101/gr.139568.112</pub-id></element-citation></ref><ref id="bib19"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Galindo</surname><given-names>MI</given-names></name><name><surname>Pueyo</surname><given-names>JI</given-names></name><name><surname>Fouix</surname><given-names>S</given-names></name><name><surname>Bishop</surname><given-names>SA</given-names></name><name><surname>Couso</surname><given-names>JP</given-names></name></person-group><year>2007</year><article-title>Peptides encoded by short ORFs control development and define a new eukaryotic gene family</article-title><source>PLOS Biology</source><volume>5</volume><fpage>e106</fpage><pub-id pub-id-type="doi">10.1371/journal.pbio.0050106</pub-id></element-citation></ref><ref id="bib20"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Guttman</surname><given-names>M</given-names></name><name><surname>Russell</surname><given-names>P</given-names></name><name><surname>Ingolia</surname><given-names>NT</given-names></name><name><surname>Weissman</surname><given-names>JS</given-names></name><name><surname>Lander</surname><given-names>ES</given-names></name></person-group><year>2013</year><article-title>Ribosome profiling provides evidence that large noncoding RNAs do not encode proteins</article-title><source>Cell</source><volume>154</volume><fpage>240</fpage><lpage>251</lpage><pub-id pub-id-type="doi">10.1016/j.cell.2013.06.009</pub-id></element-citation></ref><ref id="bib21"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hanada</surname><given-names>K</given-names></name><name><surname>Higuchi-Takeuchi</surname><given-names>M</given-names></name><name><surname>Okamoto</surname><given-names>M</given-names></name><name><surname>Yoshizumi</surname><given-names>T</given-names></name><name><surname>Shimizu</surname><given-names>M</given-names></name><name><surname>Nakaminami</surname><given-names>K</given-names></name><name><surname>Nishi</surname><given-names>R</given-names></name><name><surname>Ohashi</surname><given-names>C</given-names></name><name><surname>Iida</surname><given-names>K</given-names></name><name><surname>Tanaka</surname><given-names>M</given-names></name><name><surname>Horii</surname><given-names>Y</given-names></name><name><surname>Kawashima</surname><given-names>M</given-names></name><name><surname>Matsui</surname><given-names>K</given-names></name><name><surname>Toyoda</surname><given-names>T</given-names></name><name><surname>Shinozaki</surname><given-names>K</given-names></name><name><surname>Seki</surname><given-names>M</given-names></name><name><surname>Matsui</surname><given-names>M</given-names></name></person-group><year>2012</year><article-title>Small open reading frames associated with morphogenesis are hidden in plant genomes</article-title><source>Proceedings of the National Academy of Sciences of USA</source><volume>110</volume><fpage>2395</fpage><lpage>2400</lpage><pub-id pub-id-type="doi">10.1073/pnas.1213958110</pub-id></element-citation></ref><ref id="bib22"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hanyu-Nakamura</surname><given-names>K</given-names></name><name><surname>Sonobe-Nojima</surname><given-names>H</given-names></name><name><surname>Tanigawa</surname><given-names>A</given-names></name><name><surname>Lasko</surname><given-names>P</given-names></name><name><surname>Nakamura</surname><given-names>A</given-names></name></person-group><year>2008</year><article-title>Drosophila Pgc protein inhibits P-TEFb recruitment to chromatin in primordial germ cells</article-title><source>Nature</source><volume>451</volume><fpage>730</fpage><lpage>733</lpage><pub-id pub-id-type="doi">10.1038/nature06498</pub-id></element-citation></ref><ref id="bib23"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hemm</surname><given-names>MR</given-names></name><name><surname>Paul</surname><given-names>BJ</given-names></name><name><surname>Schneider</surname><given-names>TD</given-names></name><name><surname>Storz</surname><given-names>G</given-names></name><name><surname>Rudd</surname><given-names>KE</given-names></name></person-group><year>2008</year><article-title>Small membrane proteins found by comparative genomics and ribosome binding site models</article-title><source>Molecular Microbiology</source><volume>70</volume><fpage>1487</fpage><lpage>1501</lpage><pub-id pub-id-type="doi">10.1111/j.1365-2958.2008.06495.x</pub-id></element-citation></ref><ref id="bib24"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ingolia</surname><given-names>NT</given-names></name><name><surname>Brar</surname><given-names>GA</given-names></name><name><surname>Rouskin</surname><given-names>S</given-names></name><name><surname>McGeachy</surname><given-names>AM</given-names></name><name><surname>Weissman</surname><given-names>JS</given-names></name></person-group><year>2012</year><article-title>The ribosome profiling strategy for monitoring translation in vivo by deep sequencing of ribosome-protected mRNA fragments</article-title><source>Nature Protocols</source><volume>7</volume><fpage>1534</fpage><lpage>1550</lpage><pub-id pub-id-type="doi">10.1038/nprot.2012.086</pub-id></element-citation></ref><ref id="bib25"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ingolia</surname><given-names>NT</given-names></name><name><surname>Ghaemmaghami</surname><given-names>S</given-names></name><name><surname>Newman</surname><given-names>JR</given-names></name><name><surname>Weissman</surname><given-names>JS</given-names></name></person-group><year>2009</year><article-title>Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling</article-title><source>Science</source><volume>324</volume><fpage>218</fpage><lpage>223</lpage><pub-id pub-id-type="doi">10.1126/science.1168978</pub-id></element-citation></ref><ref id="bib26"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ingolia</surname><given-names>NT</given-names></name><name><surname>Lareau</surname><given-names>LF</given-names></name><name><surname>Weissman</surname><given-names>JS</given-names></name></person-group><year>2011</year><article-title>Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes</article-title><source>Cell</source><volume>147</volume><fpage>789</fpage><lpage>802</lpage><pub-id pub-id-type="doi">10.1016/j.cell.2011.10.002</pub-id></element-citation></ref><ref id="bib27"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kastenmayer</surname><given-names>JP</given-names></name><name><surname>Ni</surname><given-names>L</given-names></name><name><surname>Chu</surname><given-names>A</given-names></name><name><surname>Kitchen</surname><given-names>LE</given-names></name><name><surname>Au</surname><given-names>W-C</given-names></name><name><surname>Yang</surname><given-names>H</given-names></name><name><surname>Carter</surname><given-names>CD</given-names></name><name><surname>Wheeler</surname><given-names>D</given-names></name><name><surname>Davis</surname><given-names>RW</given-names></name><name><surname>Boeke</surname><given-names>JD</given-names></name><name><surname>Snyder</surname><given-names>MA</given-names></name><name><surname>Basrai</surname><given-names>MA</given-names></name></person-group><year>2006</year><article-title>Functional genomics of genes with small open reading frames (sORFs) in <italic>S. cerevisiae</italic></article-title><source>Genome Research</source><volume>16</volume><fpage>365</fpage><lpage>373</lpage><pub-id pub-id-type="doi">10.1101/gr.4355406</pub-id></element-citation></ref><ref id="bib28"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>King</surname><given-names>JL</given-names></name><name><surname>Jukes</surname><given-names>TH</given-names></name></person-group><year>1969</year><article-title>Non-Darwinian evolution</article-title><source>Science</source><volume>164</volume><fpage>788</fpage><lpage>798</lpage><pub-id pub-id-type="doi">10.1126/science.164.3881.788</pub-id></element-citation></ref><ref id="bib29"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kondo</surname><given-names>T</given-names></name><name><surname>Hashimoto</surname><given-names>Y</given-names></name><name><surname>Kato</surname><given-names>K</given-names></name><name><surname>Inagaki</surname><given-names>S</given-names></name><name><surname>Hayashi</surname><given-names>S</given-names></name><name><surname>Kageyama</surname><given-names>Y</given-names></name></person-group><year>2007</year><article-title>Small peptide regulators of actin-based cell morphogenesis encoded by a polycistronic mRNA</article-title><source>Nature Cell Biology</source><volume>9</volume><fpage>660</fpage><lpage>665</lpage><pub-id pub-id-type="doi">10.1038/ncb1595</pub-id></element-citation></ref><ref id="bib30"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kozak</surname><given-names>M</given-names></name></person-group><year>2005</year><article-title>Regulation of translation via mRNA structure in prokaryotes and eukaryotes</article-title><source>Gene</source><volume>361</volume><fpage>13</fpage><lpage>37</lpage><pub-id pub-id-type="doi">10.1016/j.gene.2005.06.037</pub-id></element-citation></ref><ref id="bib31"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Krogh</surname><given-names>A</given-names></name><name><surname>Larsson</surname><given-names>B</given-names></name><name><surname>von Heijne</surname><given-names>G</given-names></name><name><surname>Sonnhammer</surname><given-names>EL</given-names></name></person-group><year>2001</year><article-title>Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes</article-title><source>Journal of Molecular Biology</source><volume>305</volume><fpage>567</fpage><lpage>580</lpage><pub-id pub-id-type="doi">10.1006/jmbi.2000.4315</pub-id></element-citation></ref><ref id="bib32"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kulkarni</surname><given-names>SD</given-names></name><name><surname>Muralidharan</surname><given-names>B</given-names></name><name><surname>Panda</surname><given-names>AC</given-names></name><name><surname>Bakthavachalu</surname><given-names>B</given-names></name><name><surname>Vindu</surname><given-names>A</given-names></name><name><surname>Seshadri</surname><given-names>V</given-names></name></person-group><year>2011</year><article-title>Glucose-stimulated translation regulation of insulin by the 5' UTR-binding proteins</article-title><source>The Journal of Biological Chemistry</source><volume>286</volume><fpage>14146</fpage><lpage>14156</lpage><pub-id pub-id-type="doi">10.1074/jbc.M110.190553</pub-id></element-citation></ref><ref id="bib33"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kumar</surname><given-names>A</given-names></name><name><surname>Harrison</surname><given-names>PM</given-names></name><name><surname>Cheung</surname><given-names>KH</given-names></name><name><surname>Lan</surname><given-names>N</given-names></name><name><surname>Echols</surname><given-names>N</given-names></name><name><surname>Bertone</surname><given-names>P</given-names></name><name><surname>Miller</surname><given-names>P</given-names></name><name><surname>Gerstein</surname><given-names>MB</given-names></name><name><surname>Snyder</surname><given-names>M</given-names></name></person-group><year>2002</year><article-title>An integrated approach for finding overlooked genes in yeast</article-title><source>Nature Biotechnology</source><volume>20</volume><fpage>58</fpage><lpage>63</lpage><pub-id pub-id-type="doi">10.1038/nbt0102-58</pub-id></element-citation></ref><ref id="bib34"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ladoukakis</surname><given-names>E</given-names></name><name><surname>Pereira</surname><given-names>V</given-names></name><name><surname>Magny</surname><given-names>EG</given-names></name><name><surname>Eyre-Walker</surname><given-names>A</given-names></name><name><surname>Couso</surname><given-names>JP</given-names></name></person-group><year>2011</year><article-title>Hundreds of putatively functional small open reading frames in Drosophila</article-title><source>Genome Biology</source><volume>12</volume><fpage>R118</fpage><pub-id pub-id-type="doi">10.1186/gb-2011-12-11-r118</pub-id></element-citation></ref><ref id="bib35"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lemaitre</surname><given-names>B</given-names></name><name><surname>Hoffmann</surname><given-names>J</given-names></name></person-group><year>2007</year><article-title>The host defense of <italic>Drosophila melanogaster</italic></article-title><source>Annual Review of Immunology</source><volume>25</volume><fpage>697</fpage><lpage>743</lpage><pub-id pub-id-type="doi">10.1146/annurev.immunol.25.022106.141615</pub-id></element-citation></ref><ref id="bib36"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Magny</surname><given-names>EG</given-names></name><name><surname>Pueyo</surname><given-names>JI</given-names></name><name><surname>Pearl</surname><given-names>FM</given-names></name><name><surname>Cespedes</surname><given-names>MA</given-names></name><name><surname>Niven</surname><given-names>JE</given-names></name><name><surname>Bishop</surname><given-names>SA</given-names></name><name><surname>Couso</surname><given-names>JP</given-names></name></person-group><year>2013</year><article-title>Conserved regulation of cardiac calcium uptake by peptides encoded in small open reading frames</article-title><source>Science</source><volume>341</volume><fpage>1116</fpage><lpage>1120</lpage><pub-id pub-id-type="doi">10.1126/science.1238802</pub-id></element-citation></ref><ref id="bib37"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pauli</surname><given-names>A</given-names></name><name><surname>Norris</surname><given-names>ML</given-names></name><name><surname>Valen</surname><given-names>E</given-names></name><name><surname>Chew</surname><given-names>GL</given-names></name><name><surname>Gagnon</surname><given-names>JA</given-names></name><name><surname>Zimmerman</surname><given-names>S</given-names></name><name><surname>Mitchell</surname><given-names>A</given-names></name><name><surname>Ma</surname><given-names>J</given-names></name><name><surname>Dubrulle</surname><given-names>J</given-names></name><name><surname>Reyon</surname><given-names>D</given-names></name><name><surname>Tsai</surname><given-names>SQ</given-names></name><name><surname>Joung</surname><given-names>JK</given-names></name><name><surname>Saghatelian</surname><given-names>A</given-names></name><name><surname>Schier</surname><given-names>AF</given-names></name></person-group><year>2014</year><article-title>Toddler: an embryonic signal that promotes cell movement via Apelin receptors</article-title><source>Science</source><volume>343</volume><fpage>1248636</fpage><pub-id pub-id-type="doi">10.1126/science.1248636</pub-id></element-citation></ref><ref id="bib38"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pueyo</surname><given-names>JI</given-names></name><name><surname>Couso</surname><given-names>JP</given-names></name></person-group><year>2008</year><article-title>The 11-aminoacid long Tarsal-less peptides trigger a cell signal in Drosophila leg development</article-title><source>Developmental Biology</source><volume>324</volume><fpage>192</fpage><lpage>201</lpage><pub-id pub-id-type="doi">10.1016/j.ydbio.2008.08.025</pub-id></element-citation></ref><ref id="bib39"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Schmidt</surname><given-names>EE</given-names></name><name><surname>Pelz</surname><given-names>O</given-names></name><name><surname>Buhlmann</surname><given-names>S</given-names></name><name><surname>Kerr</surname><given-names>G</given-names></name><name><surname>Horn</surname><given-names>T</given-names></name><name><surname>Boutros</surname><given-names>M</given-names></name></person-group><year>2012</year><article-title>GenomeRNAi: a database for cell-based and in vivo RNAi phenotypes, 2013 update</article-title><source>Nucleic Acids Research</source><volume>41</volume><fpage>D1021</fpage><lpage>D1026</lpage><pub-id pub-id-type="doi">10.1093/nar/gks1170</pub-id></element-citation></ref><ref id="bib40"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Schneider</surname><given-names>I</given-names></name></person-group><year>1972</year><article-title>Cell lines derived from late embryonic stages of <italic>Drosophila melanogaster</italic></article-title><source>Journal of Embryology and Experimental Morphology</source><volume>27</volume><fpage>353</fpage><lpage>365</lpage></element-citation></ref><ref id="bib41"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Siepel</surname><given-names>A</given-names></name><name><surname>Bejerano</surname><given-names>G</given-names></name><name><surname>Pedersen</surname><given-names>JS</given-names></name><name><surname>Hinrichs</surname><given-names>AS</given-names></name><name><surname>Hou</surname><given-names>M</given-names></name><name><surname>Rosenbloom</surname><given-names>K</given-names></name><name><surname>Clawson</surname><given-names>H</given-names></name><name><surname>Spieth</surname><given-names>J</given-names></name><name><surname>Hillier</surname><given-names>LW</given-names></name><name><surname>Richards</surname><given-names>S</given-names></name><name><surname>Weinstock</surname><given-names>GM</given-names></name><name><surname>Wilson</surname><given-names>RK</given-names></name><name><surname>Gibbs</surname><given-names>RA</given-names></name><name><surname>Kent</surname><given-names>WJ</given-names></name><name><surname>Miller</surname><given-names>W</given-names></name><name><surname>Haussler</surname><given-names>D</given-names></name></person-group><year>2005</year><article-title>Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes</article-title><source>Genome Research</source><volume>15</volume><fpage>1034</fpage><lpage>1050</lpage><pub-id pub-id-type="doi">10.1101/gr.3715005</pub-id></element-citation></ref><ref id="bib42"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Slavoff</surname><given-names>SA</given-names></name><name><surname>Mitchell</surname><given-names>AJ</given-names></name><name><surname>Schwaid</surname><given-names>AG</given-names></name><name><surname>Cabili</surname><given-names>MN</given-names></name><name><surname>Ma</surname><given-names>J</given-names></name><name><surname>Levin</surname><given-names>JZ</given-names></name><name><surname>Karger</surname><given-names>AD</given-names></name><name><surname>Budnik</surname><given-names>BA</given-names></name><name><surname>Rinn</surname><given-names>JL</given-names></name><name><surname>Saghatelian</surname><given-names>A</given-names></name></person-group><year>2013</year><article-title>Peptidomic discovery of short open reading frame-encoded peptides in human cells</article-title><source>Nature Chemical Biology</source><volume>9</volume><fpage>59</fpage><lpage>64</lpage><pub-id pub-id-type="doi">10.1038/nchembio.1120</pub-id></element-citation></ref><ref id="bib43"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Smith</surname><given-names>JE</given-names></name><name><surname>Alvarez-Dominguez</surname><given-names>JR</given-names></name><name><surname>Kline</surname><given-names>N</given-names></name><name><surname>Huynh</surname><given-names>NJ</given-names></name><name><surname>Geisler</surname><given-names>S</given-names></name><name><surname>Hu</surname><given-names>W</given-names></name><name><surname>Coller</surname><given-names>J</given-names></name><name><surname>Baker</surname><given-names>KE</given-names></name></person-group><year>2014</year><article-title>Translation of small open reading frames within unannotated RNA transcripts in <italic>Saccharomyces cerevisiae</italic></article-title><source>Cell Reports</source><volume>7</volume><fpage>1858</fpage><lpage>1866</lpage><pub-id pub-id-type="doi">10.1016/j.celrep.2014.05.023</pub-id></element-citation></ref><ref id="bib44"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Tripoli</surname><given-names>G</given-names></name><name><surname>D'Elia</surname><given-names>D</given-names></name><name><surname>Barsanti</surname><given-names>P</given-names></name><name><surname>Caggese</surname><given-names>C</given-names></name></person-group><year>2005</year><article-title>Comparison of the oxidative phosphorylation (OXPHOS) nuclear genes in the genomes of <italic>Drosophila melanogaster</italic>, <italic>Drosophila pseudoobscura</italic> and <italic>Anopheles gambiae</italic></article-title><source>Genome Biology</source><volume>6</volume><fpage>R11</fpage><pub-id pub-id-type="doi">10.1186/gb-2005-6-2-r11</pub-id></element-citation></ref><ref id="bib45"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Volders</surname><given-names>JH</given-names></name><name><surname>Witteman</surname><given-names>B</given-names></name><name><surname>Mulder</surname><given-names>AH</given-names></name><name><surname>Bosch</surname><given-names>A</given-names></name><name><surname>Kruyt</surname><given-names>PM</given-names></name></person-group><year>2013</year><article-title>Right hemothorax: an unusual presentation of a Barrett's ulcer perforation</article-title><source>International Journal of Surgery Case Reports</source><volume>4</volume><fpage>375</fpage><lpage>377</lpage><pub-id pub-id-type="doi">10.1016/j.ijscr.2012.12.013</pub-id></element-citation></ref><ref id="bib46"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wilson</surname><given-names>BA</given-names></name><name><surname>Masel</surname><given-names>J</given-names></name></person-group><year>2011</year><article-title>Putatively noncoding transcripts show extensive association with ribosomes</article-title><source>Genome Biology and Evolution</source><volume>3</volume><fpage>1245</fpage><lpage>1252</lpage><pub-id pub-id-type="doi">10.1093/gbe/evr099</pub-id></element-citation></ref></ref-list></back><sub-article article-type="article-commentary" id="SA1"><front-stub><article-id pub-id-type="doi">10.7554/eLife.03528.020</article-id><title-group><article-title>Decision letter</article-title></title-group><contrib-group content-type="section"><contrib contrib-type="editor"><name><surname>Gingeras</surname><given-names>Thomas R</given-names></name><role>Reviewing editor</role><aff><institution>Cold Spring Harbor Laboratory</institution>, <country>United States</country></aff></contrib></contrib-group></front-stub><body><boxed-text><p>eLife posts the editorial decision letter and author response on a selection of the published articles (subject to the approval of the authors). An edited version of the letter sent to the authors after peer review is shown, indicating the substantive concerns or comments; minor concerns are not usually shown. Reviewers have the opportunity to discuss the decision before the letter is sent (see <ext-link ext-link-type="uri" xlink:href="http://elifesciences.org/review-process">review process</ext-link>). Similarly, the author response typically shows only responses to the major concerns raised by the reviewers.</p></boxed-text><p>Thank you for sending your work entitled “Extensive translation of small ORFs revealed by Polysomal Ribo-Seq” for consideration at <italic>eLife.</italic> Your article has been evaluated by Chris Ponting (Senior editor), a Reviewing editor, and 3 reviewers, two of whom are members of our Board of Reviewing Editors.</p><p>The Reviewing editor and the other reviewers discussed their comments before we reached this decision, and the Reviewing editor has assembled the following comments to help you prepare a revised submission.</p><p>This manuscript by Aspden et al. entitled “Extensive translation of small ORFs (smORF) revealed by polysomal Ribo-seq” is an exploration into the use of only portions of long non-coding (lnc-) or coding RNAs as template for translation. The reviewers agreed with both the timeliness and importance of this topic and its potential significance. The manuscript is written in a relatively clear manner but in several portions is sparse in needed information and clarity (see below). While there is much to recommend this manuscript, several issues will need to be addressed to assist the reader to better understand or accept the findings.</p><p>Major issues:</p><p>1) The authors emphasize how this new method (Polysomal Ribo-Seq) represents a significant improvement from the previous approach. They should compare their approach to the recently published ribosome footprinting in <italic>Drosophila</italic> method by the Weissman laboratory to indeed determine whether it makes any difference to isolate polysomal fractions and thus test their claim.</p><p>2) The authors seem to choose cutoffs quite arbitrarily. There are four general areas that might be addressed and related to this topic:</p><p>a) It is unclear whether they have tested the performance of these cutoffs on known coding genes. “To be considered translated, we required ribosome densities to be above 7.7 RPKM and footprint coverage of the ORF to be above 0.57”. The authors should benchmark their methods against known translated sequences.</p><p>b) What is needed is the comparison of RNAseq data to peptide (mass spec) sequencing. Values of 7.7 RPKM and 57% coverage should be compared to translation of known protein coding RNAs (especially small proteins, e.g., Dm HSP22 (177 aa), Dm HSP 23, DM HSP 26, etc).</p><p>c) For <xref ref-type="fig" rid="fig2">Figure 2A and B</xref>, it could be that the R^2 value of small vs large may largely be driven by large polysomal mRNAs with very low RPKMs compared to the smORF RNAs. What levels of expression (RPKM) were used as a threshold to be placed into this plot? Is it not possible that many of the large polysomal RNAs are not different but are merely expressed at a lower level vs the small polysomal level?</p><p>d) Validation of ribosomal profiling data suffers from the issue of incompleteness. Detection of Atlas peptides is not fully explained. What percent of peptides predicted by RNAseq matches the atlas peptide; how many atlas peptides/smORF are detected?</p><p>3) The interpretation of the results for the ncRNA smORFs is not the most parsimonious. The entire argument for their translatability rests on the association of the lncRNas with the polysomal riboseq. This raises the issue of whether the detection of RNAs in this fraction is truly indicative of translatability. In the absence of the other criteria mentioned by the authors (mass spec and conservation) the co-fractionation of the RNA with polysomes can be also attributable to the non-specific affinity of the ribosome proteins to RNA. For example, can the authors show that a metagene of the lncRNAs translated show phasing of the ribosome in the ORFs undergoing translation?</p><p>4) Regarding uORFs, the authors state that 3,404 (38%) uORF are footprinted by ribosomes (<xref ref-type="fig" rid="fig2">Figure 2E</xref>). This seems a surprisingly high number, specially taking into account that they find fewer than 300 sORF translated. Are the authors using the same criteria to detect translated uorfs for small ORFs? Is one footprint enough? What criteria are used when there are overlapping ORFs in different reading frames? How do the authors distinguish between two overlapping ORFs to determine which one is indeed being translated?</p><p>5) The data should be made accessible, including sequences/genome coordinates of ORFs identified as well as upstream ORFs. In the current manuscript this data is not accessible except for a few examples (<xref ref-type="table" rid="tbl3">Table 3</xref>).</p><p>6) While the selection of RNAs bound to 2-6 ribosomes is reasonable, what is the relationship of the lengths of the RNAs found to be bound compared to the coverage by the ribosomes? Are these RNAs also short or are they present in long (i.e., ∼1.5-2 Kb) RNAs of which only ∼300 nt are covered? Were annotated mRNAs found in this category that could be been translated in a different reading frame for a short length?</p></body></sub-article><sub-article article-type="reply" id="SA2"><front-stub><article-id pub-id-type="doi">10.7554/eLife.03528.021</article-id><title-group><article-title>Author response</article-title></title-group></front-stub><body><p><italic>1) The authors emphasize how this new method (Polysomal Ribo-Seq) represents a significant improvement from the previous approach. They should compare their approach to the recently published ribosome footprinting in Drosophila method by the Weissman laboratory to indeed determine whether it makes any difference to isolate polysomal fractions and thus test their claim</italic>.</p><p>We argue that our improvement is not a quantitative, but a qualitative one, and this has been re-emphasized in the Introduction. Because our ribosomal profiling reads come from an mRNA with several ribosomes attached to it, one can be more certain that such reads and ribosomal binding indicate productive translation. If anything, our method could be expected to be more astringent than standard ribosomal profiling as it should exclude (or dramatically reduce) reads due to non-productive 80S and 40S binding.</p><p>Regarding Weissman’s lab data (<xref ref-type="bibr" rid="bib14">Dunn et al. 2013</xref>), it is difficult to compare experiments using different techniques but we have compared our data with their S2 cell data. They do not present RNASeq controls for their S2 experiment, therefore certain comparisons are not possible. Nonetheless, if we apply our filters and cut-offs to their data, we observe that more smORFs would appear as translated in their data (264 theirs vs. 228 ours). Therefore our method is either a) less extensive, or b) more astringent than standard ribosomal profiling (for the detection of smORFs). We favour b) since these 36 Weissman-specific smORFs have low RPKMs and only 2 are detected by proteomics. We surmise that these Weisman-specific smORFs may represent either smORFs lowly expressed in our S2 cells, or non-productive ribosomal binding (40S and 80S binding). In fact 14 of them don’t pass our transcription cut-off of 1 RPKM in S2 cells. This is not a criticism of their work, but in our opinion (and others) an unavoidable feature of the standard ribosomal profiling method, and the very reason and justification for our modification.</p><p>2) The authors seem to choose cutoffs quite arbitrarily. There are four general areas that might be addressed and related to this topic:</p><p><italic>a) It is unclear whether they have tested the performance of these cutoffs on known coding genes. “To be considered translated, we required ribosome densities to be above 7.7 RPKM and footprint coverage of the ORF to be above 0.57”. The authors should benchmark their methods against known translated sequences</italic>.</p><p><italic>b) What is needed is the comparison of RNAseq data to peptide (mass spec) sequencing. Values of 7.7 RPKM and 57% coverage should be compared to translation of known protein coding RNAs (especially small proteins, e.g., Dm HSP22 (177 aa), Dm HSP 23, DM HSP 26, etc)</italic>.</p><p>We appreciate these suggestions, which we have evaluated, and that are similar to other ideas we considered in the past.</p><p>There are in general two ways of finding cut-off values:</p><p>A) Starting with positive controls, find a value that eliminates the lowest possible percentile of positive controls (in our case, could be 10% of translated ORFs) while eliminating a reciprocal, or very substantial, percentile of negative controls.</p><p>B) Starting with negative controls, find a value that eliminates the highest possible percentile of negative controls (in our case, 90% of non-coding mRNA sequences) while eliminating a reciprocal, or minimal, percentile of positive controls.</p><p>Regarding A): these heat-shock proteins suggested by the referees are lowly transcribed and translated in S2 cells under standard conditions and therefore their use as benchmarks for cut-offs would considerably reduce the astringency of our experiment. We have tried, as an alternative, small ribosomal proteins, but these are translated at very high levels and hence produce too astringent cut-offs, that would discard most canonical proteins. To avoid the subjectivity involved in selecting this or that set of small proteins as benchmarks one could use all smORFs to obtain cut-offs, but this would produce a circular argument (if no smORFs were really translated, their coding sequences would produce very low metrics but still would have a top 90th percentile).</p><p>Alternatively, we could use a percentile of the values from canonical long ORFs as benchmarks, but this would imply a pre-judgment of the fraction of these long ORFs that we believe to be subjected to translational regulation (that is mRNA is transcribed, but not translated); would this be 10% for a 90th percentile cut-off? Or 25% for a 75th percentile?). It also entails the assumption that smORFs are going to be translated at similar strength as canonical proteins, a premise that is in fact under test, and therefore can’t be accepted a priori.</p><p>To use proteomics-detected peptides as a benchmark for cut-offs does not work either. We have re-written the section in the manuscript that compares and validates our results with proteomics to make it clearer. Our results and those of others indicate that proteomics only detects smORFs with higher profiling metrics (in other words, more strongly translated). Hence, to use proteomics-corroborated smORFs to obtain cut-offs would preclude the detection of smORFs translated less strongly, and will negate the main point of doing ribosomal profiling to begin with: to obtain a more extensive, yet sufficiently astringent, repertoire of translated smORFs.</p><p>In synthesis, in this experiment one does not have independent positive controls: all positives are part of the sample and subject to test.</p><p>Therefore, we used the strategy B) above, the highest percentile of likely negative controls, in our case 3’ UTRs, in accordance with other ribosomal profiling papers. However, we take the point of using only known translated genes, as this avoids circular arguments as discussed above for smORF coding sequences. Using this strategy, note for example how the fraction of annotated (longer) smORFs deemed translated by these cut-offs is very similar to the fraction of translated canonical genes (<xref ref-type="fig" rid="fig2">Figure 2E</xref>). Because the cut-offs were not selected from these smORFs, or even from canonical ORFs, but independently from canonical UTRs, this is a non-circular and non-trivial result.</p><p>Thus, we have revised our cut offs and re-extracted them using only canonical coding genes as suggested:</p><p>We have taken the reads in the 3’-UTRs of mRNAs encoding canonical coding sequences (longer than 100aa) from the low polysomal fraction in our first experiment. These mRNAs should be translated at a low level since they have fewer ribosomes attached to them, and in fact their TE and RPKMs are lower. Hence, reads in their 3’-UTRs represent our best source of independent negative controls (or ‘background’): non-coding sequences that should not show meaningful ribosomal binding, from mRNAs that are lowly translated to begin with. We obtain the RPKM and coverage values from these canonical 3’-UTR reads, and use their 90th percentile value as preliminary cut-offs for accepting translation of coding sequences. These values are 11.8 RPKM (which is considerable more astringent than the usual cut-off of RPKM &gt; 1) and 0.40 coverage (a metric that is astringent and useful in its own way, see later on).</p><p>Superimposed to these 90th 3’-UTR percentile values, we use two additional corrections. To avoid the case where a very short ORF of less than 20aa could contain a single ‘site’ of ribosomal binding, and to better differentiate between overlapping ORFs, the 0.40 coverage was raised to 0.57. For the analysis of ‘dwarf’ smORF profiling (see discussion of major issue 4 below) we also used an additional filter. A minimum of 5 reads in a single experiment was demanded, to avoid again situations where a few reads on a very short ORF get artificially inflated to a high RPKM. A full explanation of these filters and cut-offs has been added to the Methods section.</p><p>The astringency of our new filters is shown by the borderline case of CG15456, which now falls just short of the new cut offs (with an RPKM of 10.1) but however we observed to produce weak but reproducible signal in transfection assays by immunofluorescence and immunoblotting, and would pass our cut-offs using the <xref ref-type="bibr" rid="bib14">Dunn et al. 2013</xref> data. Thus, we may have increased the likelihood of false negatives, but reciprocally, we should have reduced the likelihood of false positives. A truly negative result, the long-non-coding RNA Uhg2, has now been added to the results in <xref ref-type="fig" rid="fig3s1">Figure 3–figure supplement 1B</xref> and <xref ref-type="table" rid="tbl4">Table 4</xref> for comparison.</p><p><italic>c) For</italic> <xref ref-type="fig" rid="fig2"><italic>Figure 2A and B</italic></xref><italic>, it could be that the R^2 value of small vs large may largely be driven by large polysomal mRNAs with very low RPKMs compared to the smORF RNAs</italic>. <italic>What levels of expression (RPKM) were used as a threshold to be placed into this plot? Is it not possible that many of the large polysomal RNAs are not different but are merely expressed at a lower level vs the small polysomal level?</italic></p><p>Yes the mRNAs in small and large polysomes are not different but are footprinted at different levels, this is what we intended to show. No threshold was used in this graph (perhaps the high overlap of low values makes it look like it was).</p><p>The same long canonical mRNAs produce more reads in large polysomes, expressed as higher RPKM (<xref ref-type="fig" rid="fig1s2">Figure 1–figure supplement 2 A</xref> vs <xref ref-type="fig" rid="fig1s2">B</xref>) and TE (<xref ref-type="table" rid="tbl2">Table 2</xref>), than in small polysomes. This fits with the expectation that mRNAs have more ribosomes bound when they are being translated at a higher level.</p><p><italic>d) Validation of ribosomal profiling data suffers from the issue of incompleteness. Detection of Atlas peptides is not fully explained</italic>. <italic>What percent of peptides predicted by RNAseq matches the atlas peptide; how many atlas peptides/smORF are detected?</italic></p><p>These data were included in <xref ref-type="fig" rid="fig2s1">Figure 2–figure supplement 1B and C</xref>. However, several queries focus on this proteomics section, so we have tried to clarify it by changing the text in the ‘Validation of smORF translation’ section, and by swapping <xref ref-type="fig" rid="fig2">Figure 2C</xref> with <xref ref-type="fig" rid="fig2s1">Figure 2–figure supplement 1C</xref>, and eliminating <xref ref-type="fig" rid="fig2s1">Figure 2–figure supplement 1B</xref>. To answer this specific question, 51 out of 59 Peptide Atlas peptides detected in S2 cells are also detected by our Poly-Ribo-Seq (86%). 51 peptides represent 22% of our translated smORFs. In our own proteomics experiments, we detect 60 peptides of which 59 are also detected by Poly-Ribo-Seq. Altogether, 99 peptides are detected by proteomics in S2 cells, and of these, Poly-Ribo-Seq detects 90 (91%); the total percentage of smORF thus corroborated by proteomics is 39%. The peptides detected by proteomics, as mentioned in minor issue 2 below, seem translated at higher levels, and hence perhaps are more abundant.</p><p><italic>3) The interpretation of the results for the ncRNA smORFs is not the most parsimonious. The entire argument for their translatability rests on the association of the lncRNas with the polysomal riboseq. This raises the issue of whether the detection of RNAs in this fraction is truly indicative of translatability. In the absence of the other criteria mentioned by the authors (mass spec and conservation) the co-fractionation of the RNA with polysomes can be also attributable to the non-specific affinity of the ribosome proteins to RNA. For example, can the authors show that a metagene of the lncRNAs translated show phasing of the ribosome in the ORFs undergoing translation</italic>?</p><p>We now provide the framing analysis for the translated smORFs in the two non-coding RNAs highlighted in our figures (<xref ref-type="fig" rid="fig3s2">Figure 3–figure supplement 2A</xref>). Note however, that for unknown reasons framing is not as good in Drosophila as in other organisms (see <xref ref-type="fig" rid="fig1s1">Figure 1–figure supplement 1C</xref> and Dunn, Weisman 2014). This is one of the reasons why we have introduced the coverage metric. Notice also that translation is also corroborated by tagging experiments in <xref ref-type="fig" rid="fig3">Figure 3</xref> and <xref ref-type="fig" rid="fig3s1">Figure 3–figure Supplement 1</xref>.</p><p>The possibility of ‘non-specific ribosomal protein binding’ is present, but is reduced by:</p><p>a) the discard of reads shorter than 25 nt.</p><p>b) the selection (via polysomes) against mRNAs just bound by single ribosomes, or partial ribosomes, or other proteins.</p><p>c) filter for 5 reads and 0.57 coverage.</p><p>We have now added another control that illustrates the specificity of Poly-Ribo-seq footprinting (<xref ref-type="fig" rid="fig3s2">Figure 3–figure supplement 2B</xref>). We have sequenced RNAs that are associated with 2-6 polysomes before footprinting, and correlated it with the footprints we later observe. As expected, in general there is a good correlation for putative coding transcripts (annotated smORFs and canonical proteins) such that, if an RNA is present in a polysomal fraction, it is bound by ribosomes and translated. However, for non-coding RNAs, this correlation is much weaker and in fact, below our footprint RPKM cut-off for accepting translation, a number of lncRNAs can be present in the polysomal fraction in high amounts, yet do not give rise to ribosomal footprints. Hence, association with polysomes does not necessarily mean translation. Association with polysomes and significant generation of footprints, does.</p><p><italic>4) Regarding uORFs, the authors state that 3,404 (38%) uORF are footprinted by ribosomes (</italic><xref ref-type="fig" rid="fig2"><italic>Figure 2E</italic></xref><italic>). This seems a surprisingly high number, specially taking into account they find fewer than 300 smORF translated. Are the authors using the same criteria to detect translated uORFs for small ORFs? Is one footprint enough? What criteria are used when there are overlapping orfs in different reading frames? How do the authors distinguish between two overlapping ORFs to determine which one is indeed being translated</italic>?</p><p>We had applied the same criteria for uORFs as for all other ORFs. However, we had raised the coverage cut-off to 0.57 because of these uORFs, based on the following calculation: the median size of apparently translated uORFs is 57 nucleotides (19 aa, see <xref ref-type="fig" rid="fig4s1">Figure 4–figure supplement 1C</xref>). A bona-fide ribosomal binding site could be as long as 32 nucleotides; hence, a uORF containing a single ribosomal binding ‘site’ could still be 0.56 covered (32nt/57nt). But in addition, the referees identify here another important technical point. The RPKM metric inflates artificially the number of reads in very short sequences, and for example, an 11.8 RPKM cut-off could still leave some dwarf smORFs of around 20aa being considered translated with as little as 2 reads. We have therefore added a further filter for these ‘dwarf’ smORFs (uORFs and non-coding smORFs) of needing a minimum of 5 reads in a single experiment to be considered translated, in addition to the RPKM and coverage filters we apply to all cds. For reference, the lowest number of reads obtained with an annotated smORF is 8. Since these annotated smORFs are on average longer (hence able to generate more reads) and have higher translational efficiencies than dwarf smORFs, a cut-off filter of 5 reads for the latter (the midpoint between 2 and 8) seems reasonable.</p><p>The new and more astringent filters specifically impinge on dwarf smORFs, as shown by a reduction in their numbers from the previous version of the manuscript. Annotated smORFs and standard long cds deemed translated have dropped by 2-3% (from 86 to 83% and from 83 to 81% respectively), whereas uORFs and ncRNA-ORFs drop by 8-9% (from 43 to 34% and from 38 to 30% respectively).</p><p>Despite these corrections (higher RPKM cut-off, new filter of 5 minimum reads) and a corresponding reduction in the number of translated uORFs, 2,708 still pass our filters (some 30% of the total). We also find these numbers intriguing, but other authors have also found a very high number of uORFs apparently translated (Chew 2014, <xref ref-type="bibr" rid="bib4">Bazzini 2014</xref>, <xref ref-type="bibr" rid="bib13">Duncan and Mata 2014</xref>). We would like to stress here, as in the paper, that this translation does not equate to peptide function, and that the role of many uORFs could be simply cis-regulatory. However, it is possible that some of these uORFs produce stable and functional peptides. As a further test, we have tagged several uORFs and the results corroborate the profiling data (see new Figure panels <xref ref-type="fig" rid="fig3">Figure 3D</xref> and <xref ref-type="fig" rid="fig3s1">Figure 3–figure supplement 1I</xref>).</p><p>Finally, our metric used to distinguish which overlapping smORF is translated is coverage. By requiring 0.57 coverage, we greatly increase the likelihood of distinguishing overlapping ORFs, although some overlapping and apparently translated ORFs exist (and can be independently discerned as translated by virtue of their non-overlapping reads). We have added a new figure (<xref ref-type="fig" rid="fig1s2">Figure 1–figure supplement 2</xref>) that clarifies these scenarios.</p><p><italic>5) The data should be made accessible</italic>, <italic>including sequences/genome coordinates of ORFs identified as well as upstream ORFs. In the current manuscript this data is not accessible except for a few examples (</italic><xref ref-type="table" rid="tbl3"><italic>Table III</italic></xref><italic>)</italic></p><p>We have uploaded fastq and other data files as suggested in the GEO website.</p><p><italic>6) While the selection of RNAs bound to 2-6 ribosomes is reasonable, what is the relationship of the lengths of the RNAs found to be bound compared to the coverage by the ribosomes? Are these RNAs also short or are they present in long (i.e., ∼1.5-2 Kb) RNAs of which only ∼300 nt are covered? Were annotated mRNAs found in this category that could be been translated in a different reading frame for a short length</italic>?</p><p>In general there is no clear correlation between the length of the mRNAs and the length covered by ribosomes. We understand here that the referees may be worried by the possibility that reads are generated by a small fraction of a long ORF, thus possibly by a ‘hidden’ smORF within such long ORFs. This would be scenario D) in the new <xref ref-type="fig" rid="fig1s2">Figure 1–figure supplement 2</xref>. This scenario is discarded by the coverage metric. High coverage ensures that when mRNAs encoding canonical ORFs longer than 100aa and detected in the 2-6 ribosome fraction are called translated it is because, as discussed in point 2 above, the canonical ORF is being translated uniformly but at low level, rather than only a small portion or different frame of it.</p></body></sub-article></article>