Skip to content

sjackman/white-spruce-organelles-paper

Repository files navigation

Organellar Genomes of White Spruce (Picea glauca): Assembly and Annotation

Shaun D Jackman, Rene Warren, Ewan Gibb, Ben Vandervalk, Hamid Mohamadi, Justin Chu, Anthony Raymond, Stephen Pleasance, Robin Coope, Mark R Wildung, Carol Ritland, Steven JM Jones, Joerg C Bohlmann, Inanc Birol

Abstract

The genome sequences of the plastid and mitochondrion of white spruce (Picea glauca) are assembled from whole genome Illumina sequencing data using ABySS. Whole genome sequencing data contains reads from both the nuclear and organellar genomes. Reads of the organellar genomes are abundant, because each cell contains hundreds of mitochondria and plastids. One lane of MiSeq data assembles the 123 kbp plastid genome, and one lane of HiSeq data assembles the 5.9 Mbp draft mitochondrial genome. The coding genes, ribosomal RNA and transfer RNA of both genomes are annotated. The transcript abundance of the mitochondrial genes is quantified in three developmental tissues and five mature tissues. C-to-U RNA editing is observed in the majority of mitochondrial genes and modifies ACG codons to create cryptic AUG start codons in four genes. NCBI GenBank contains a single mitochondrial genome of a gymnosperm, and this work contributes the first coniferous mitochondrial genome.

Introduction

Most plant cells contain two types of organelles that harbour their own genomes, mitochondria and plastids. In the Pinaceae, mitochondrial genomes are inherited maternally, and plastid genomes are inherited paternally (Whittle & Johnston 2002).

Complete plastid genomes of the gymnosperms Norway spruce (Picea abies) (Nystedt et al. 2013), Podocarpus lambertii (Nascimento Vieira, Faoro, Rogalski, et al. 2014), Taxus chinensis var. mairei (Zhang et al. 2014) and four Juniperus species (Guo et al. 2014) have recently been published in NCBI Genbank (Benson et al. 2014). These projects used a variety of strategies for isolating cpDNA, using physical separation methods in the lab or computationally separating cpDNA sequences from nuclear sequences, sequencing and assembly, shown in Table 1.

The Picea abies genome used 454 GS FLX Titanium sequencing and Sanger sequencing of PCR amplicons for finishing, BLAST (Altschul et al. 1990) to isolate the cpDNA reads, and the software Newbler to assemble the reads. The P. lambertii genome assembly isolated the cpDNA using the saline Percoll gradient protocol of Nascimento Vieira, Faoro, Freitas Fraga, et al. (2014), Illumina MiSeq sequencing and the software Newbler to assemble the reads. The Juniperus bermudiana genome assembly used long-range PCR to amplify the plastid DNA, a combination of Illumina GAII and Sanger sequencing, and the software Geneious to assemble the reads using C. japonica as a reference genome. The other three Juniperus genome assemblies used Illumina MiSeq sequencing and the software Velvet (Zerbino & Birney 2008) to assemble the reads. The T. chinensis genome assembly used whole-genome Illumina HiSeq 2000 sequencing, BLAT (Kent 2002) to isolate the cpDNA reads and SOAPdenovo (Luo et al. 2012) to assemble the isolated cpDNA reads. All of these projects used DOGMA (Wyman et al. 2004) to annotate the assembly.

Table 1: Methods of cpDNA separation, sequencing and assembly of complete plastid genomes of gymnosperms published. a Finished with PCR and Sanger sequencing
Species cpDNA Separation Sequencing Assembler software
Picea abies BLAST in silico 454 GS FLX Titanium a Newbler
Podocarpus lambertii Saline Percoll gradient Illumina MiSeq Newbler
Juniperus bermudiana Longer-range PCR Illumina GAII a Geneious
Other Juniperus Unspecified Illumina MiSeq Velvet
Taxus chinensis BLAT in silico Illumina HiSeq 2000 SOAPdenovo

Only one complete mitochondrial genome of a gymnosperm has been published, Cycas taitungensis (Chaw et al. 2008), while complete mitochondrial genome sequences of the angiosperms Brassica maritima (Grewe et al. 2014), Brassica oleracea (ibid.), Capsicum annuum (Jo et al. 2014), Eruca sativa (Wang et al. 2014), Helianthus tuberosus (Bock et al. 2014), Raphanus sativus (Jeong et al. 2014), Rhazya stricta (Park et al. 2014) and Vaccinium macrocarpon (Fajardo et al. 2014) have been published in NCBI Genbank. Six of these projects gave details of the sample preparation, sequencing, assembly and annotation strategy. Three projects enriched organellar DNA using varying laboratory methods (Keren et al. 2009; Kim et al. 2007; Chen et al. 2011), and the remainder used total genomic DNA. Three projects used Illumina HiSeq 2000 sequencing and Velvet for assembly, and three projects used Roche 454 GS-FLX sequencing and Newbler for assembly. Most projects used an aligner such as BLAST (Altschul et al. 1990) to isolate sequences with similarity to known mitochondrial sequence, either before or after assembly. Two projects used Mitofy (Alverson et al. 2010) to annotate the genome, and the remainder used a collection of tools such as BLAST, tRNAscan-SE (Lowe & Eddy 1997) and ORF Finder to annotate genes. Plant mitochondrial genomes can substantially vary in size, with some of the largest mitochondrial genomes reported for the basal angiosperm Amborella trichopoda (3.9 Mbp; Rice et al. 2013) and the two Silene species S. noctiflora and S. conica (6.7 Mbp and 11.3 Mbp, respectively; Sloan et al. 2012).

The SMarTForests project has recently published a set of stepwise improved assemblies of the 20 gigabase white spruce (Picea glauca) genome (Birol et al. 2013; Warren, Keeling, et al. 2015), a gymnosperm genome seven times the size of the human genome, sequenced using the Illumina HiSeq and MiSeq sequencing platforms. The whole genome sequencing data contained reads originating from both the nuclear and organellar genomes. Whereas one copy of the diploid nuclear genome is found in each cell, hundreds of organelles are present, and thus hundreds of copies of the organellar genomes. This abundance results in an overrepresentation of the organellar genomes in whole genome sequencing data.

Assembling a single lane of white spruce whole genome sequencing data using the software ABySS (Simpson et al. 2009) yielded an assembly composed of organellar sequences and nuclear repeat elements. The assembled sequences that originate from the organellar genomes were separated from those of nuclear origin by classifying the sequences using their length, depth of coverage and GC content. The plastid genome of white spruce is compared to that of Norway spruce (Picea abies) (Nystedt et al. 2013), and the mitochondrial genome of white spruce is compared to that of prince sago palm (Cycas taitungensis) (Chaw et al. 2008).

Analysis of cpDNA is useful in reconstructing phylogenies of plants (Wu et al. 2007), in determining the origin of an expanding population (Aizawa et al. 2012) and in determining when distinct lineages of a species resulted from multiple colonization events (Jardón-Barbolla et al. 2011). These contrasting inheritance schemes of plastids and mitochondria can be useful in the characterization of species expanding their range. In the case of two previously allopatric species now found in sympatry, the mitochondrial DNA (mtDNA) is contributed by the resident species, whereas introgression of the plastid genome into the expanding species is limited, since pollen is more readily dispersed than seeds (Du et al. 2011). Differential gene flow of cpDNA and mtDNA due to different methods of inheritance and dispersion results in new assemblages of organellar genomes and an increase of genetic diversity after expansion from a refugium (Gerardi et al. 2010).

Material and Methods

DNA, RNA and software materials

Genomic DNA was collected from the apical shoot tissues of a single interior white spruce tree, clone PG29, and sequencing libraries constructed as described in Birol et al. (2013). Because the original intention of this sequencing project was to assemble the nuclear genome, an organelle exclusion method was used to preferentially extract nuclear DNA. Sequencing reads from both organellar genomes were present in sufficient depth however to assemble their genomes.

RNA was extracted from eight samples, three developmental stages and five mature tissues: megagametophyte, embryo, seedling, young buds, xylem, mature needles, flushing buds and bark, described in Warren, Keeling, et al. (2015). These samples were sequenced with the Illumina HiSeq 2000 (Warren, Keeling, et al. 2015). The RNA-seq data was used to quantify the transcript abundance of the annotated mitochondrial genes using the software Salmon (Patro et al. 2014).

The software used in this analysis and their versions are listed in supplementary Table S1. All software tools were installed using Homebrew (http://brew.sh).

Methods used to assemble the plastid genome

A single lane of Illumina MiSeq paired-end sequencing (SRR525215) was used to assemble the plastid genome. Paired-end sequencing usually leaves a gap of unsequenced nucleotides in the middle of the DNA fragment. Because 300 bp paired-end reads were sequenced from a library of 500 bp DNA fragments, the reads are expected to overlap by 100 bp. These overlapping paired-end reads were merged using ABySS-mergepairs, a component of the software ABySS (Simpson et al. 2009). These merged reads were assembled using ABySS. Contigs that are putatively derived from the plastid were separated by length and depth of coverage using thresholds chosen by inspection of a scatter plot (see supplementary Figure S1). These putative plastid contigs were assembled into scaffolds using ABySS-scaffold.

We ran the gap-filling application Sealer (Paulino et al., in review; options -v -j 12 -b 30G -B 300 -F 700 with -k from 18 to 108 with step size 6) on the ABySS assembly of the plastid genome, closing 5 of the remaining 7 gaps, with a resulting assembly consisting of two large (~50 and ~70 kbp) scaftigs. Given the small size of the plastid genome, we opted to manually finish the assembly using the software Consed 20.0 (Gordon & Green 2013). We loaded the resulting gap-filled assembly into Consed and imported Pacific Biosciences (PacBio) sequencing data (SRR2148116 and SRR2148117), 9204 reads 500 bp and larger, into the assembly and aligned them to the plastid genome using cross_match (Green 1999) from within Consed. For each scaftig end, 6 PacBio reads were pulled out and assembled using the mini-assembly feature in Consed. Cross_match alignments of the resulting contigs to the plastid assembly were used to merge the two scaftigs and confirm that the complete circular genome sequence was obtained. In a subsequent step, 7,742 Illumina HiSeq reads were imported and aligned to the assembly using Consed. These reads were selected from the library of 133 million reads used to assemble the mitochondrion on the basis of alignment to our draft plastid genome using BWA 0.7.5a (Li 2013), focusing on regions that would benefit from read import by restricting our search to regions with ambiguity and regions covered by PacBio reads exclusively. The subset of Illumina reads were selected using samtools 0.1.18, mini-assembled with Phrap (Green 1999) and the resulting contigs re-merged to correct bases in gaps filled only by PacBio, namely one gap and sequence at edges confirming the circular topology. The starting base was chosen using the Norway spruce plastid genome sequence (NC_021456, Nystedt et al. 2013). Our assembly was further polished using the Genome Analysis Toolkit (GATK) 2.8-1-g932cd3a FastaAlternateReferenceMaker (McKenna et al. 2010).

The assembled plastid genome was initially annotated using DOGMA (Wyman et al. 2004). Being an interactive web application, it is not convenient for automated annotation. The software MAKER (Campbell et al. 2014) is not interactive and is designed for automated annotation, and we used it to annotate the white spruce plastid using the Norway spruce plastid genome (NC_021456, Nystedt et al. 2013) for both protein-coding and non-coding gene homology evidence. The parameters of MAKER are show in supplementary Table S2. The inverted repeat was identified using MUMmer (Kurtz et al. 2004), shown in supplementary Figure S3.

The assembled plastid genome was aligned to the Norway spruce plastid using BWA-MEM (Li 2013). The two genomes were compared using QUAST (Gurevich et al. 2013) to confirm the presence of the annotated genes of the Norway spruce plastid in the white spruce plastid.

Methods used to assemble the mitochondrial genome

ABySS-Konnector (Vandervalk et al. 2014) was used to fill the gap between the paired-end reads of a single lane of Illumina HiSeq 2000 paired end-sequencing (SRR525196). These connected paired-end reads were assembled using ABySS. Putative mitochondrial sequences were separated from nuclear sequences by their length, depth of coverage and GC content using k-means clustering in R (see supplementary Figure S2). The putative mitochondrial contigs were then assembled into scaffolds using ABySS-scaffold with a single lane of Illumina HiSeq sequencing of a mate-pair library.

The ABySS assembly of the white spruce mitochondrial genome resulted in 71 scaffolds. We ran the gap-filling application Sealer attempting to close the gaps between every combination of two scaffolds. This approach closed 10 gaps and yielded 61 scaffolds, which we used as input to the LINKS scaffolder 1.1 (Warren, Vandervalk, et al. 2015) (options -k 15 -t 1 -l 3 -r 0.4, 19 iterations with -d from 500 to 6000 with step size 250) in conjunction with long PacBio reads, further decreasing the number of scaffolds to 58. The Konnector pseudoreads were aligned to the 58 LINKS scaffolds with BWA 0.7.5a (bwa mem -a multimap), and we created links between two scaffolds when reads aligned within 1000 bp of the edges of any two scaffolds. We modified LINKS to read the resulting SAM alignment file and link scaffolds satisfying this criteria (options LINKS-sam -e 0.9 -a 0.5), bringing the final number of scaffolds to 38. We confirmed the merges using mate-pair reads. The white spruce mate-pair libraries used for confirmation are presented in Birol et al. (2013) and available from DNAnexus (SRP014489 http://sra.dnanexus.com/studies/SRP014489). In brief, mate-pair reads from three fragment size libraries (5, 8 and 12 kbp) were aligned to the 38-scaffold assembly with BWA-MEM 0.7.10-r789 and the resulting alignments parsed with a PERL script. A summary of this validation is presented in supplemental Table S4. Automated gap-closing was performed with Sealer 1.0 (options -j 12 -B 1000 -F 700 -P10 -k96 -k80) using Bloom filters built from the entire white spruce PG29 read data set (Warren, Keeling, et al. 2015) and closed 55 of the 182 total gaps (30.2%). We polished the gap-filled assembly using GATK, as described for the plastid genome.

The assembled scaffolds were aligned to the NCBI nucleotide (nt) database using BLAST to check for hits to mitochondrial genomes and to screen for contamination.

The mitochondrial genome was annotated using MAKER (parameters shown in supplementary Table S3) and Prokka (Seemann 2014), and the two sets of annotations were merged using BEDTools (Quinlan & Hall 2010) and GenomeTools (Gremme et al. 2013), selecting the MAKER annotation when the two tools had overlapping annotations. The proteins of all green plants (Viridiplantae) with complete mitochondrial genome sequences in NCBI GenBank (Benson et al. 2014), 142 species, were used for protein homology evidence, the most closely related of which is the prince sago palm (Cycas taitungensis) (NC_010303 Chaw et al. 2008), being the only gymnosperm with a complete mitochondrial genome. Transfer RNA (tRNA) were annotated using ARAGORN (Laslett & Canback 2004). Ribosomal RNA (rRNA) were annotated using RNAmmer (Lagesen et al. 2007). Prokka uses Prodigal (Hyatt et al. 2010) to annotate open reading frames. Repeats were identified using RepeatMasker (Smit et al. 1996) and RepeatModeler.

The RNA-seq reads were aligned to the annotated mitochondrial genes using BWA-MEM and variants were called using samtools and bcftools requiring a minimum genotype quality of 50 to identify possible sites of C-to-U RNA editing.

Results

The white spruce plastid genome

The assembly and annotation metrics for the plastid and mitochondrial genomes are summarized in Table 2. The plastid genome was assembled into a single circular contig of 123,266 bp containing 114 identified genes: 74 protein coding (mRNA) genes, 36 transfer RNA (tRNA) genes and 4 ribosomal RNA (rRNA) genes, shown in Figure 1.

Table 2: Sequencing, assembly and annotation metrics of the white spruce organellar genomes. The number of distinct genes are shown in parentheses.
Metric Plastid Mitochondrion
Number of lanes 1 MiSeq lane 1 HiSeq lane
Number of read pairs 4.9 million 133 million
Read length 300 bp 150 bp
Number of merged reads 3.0 million 1.4 million
Median merged read length 492 bp 465 bp
Number of assembled reads 21 thousand 377 thousand
Proportion of organellar reads 1/140 or 0.7% 1/350 or 0.3%
Depth of coverage 80x 30x
Assembled genome size 123,266 bp 5.94 Mbp
Number of contigs 1 contig 130 contigs
Contig N50 123 kbp 102 kbp
Number of scaffolds 1 scaffold 36 scaffolds
Scaffold N50 123 kbp 369 kbp
Largest scaffold 123 kbp 1222 kbp
GC content 38.8% 44.7%
Number of genes without ORFs 114 (108) 143 (74)
Protein coding genes (mRNA) 74 (72) 106 (51)
Ribosomal RNA genes (rRNA) 4 (4) 8 (3)
Transfer RNA genes (tRNA) 36 (32) 29 (20)
Open reading frames (ORF)  ≥  300 bp NA 1065
Coding genes containing introns 8 5
Introns in coding genes 9 7
tRNA genes containing introns 6 0

All protein-coding genes are single copy, except psbI and ycf12, which have two copies each. All tRNA genes are single copy, except trnH-GUG, trnI-CAU, trnS-GCU and trnT-GGU, which have two copies each. All rRNA genes are single copy.

The protein-coding genes atpF, petB, petD, rpl2, rpl16, rpoC1 and rps12 each contain one intron, and ycf3 contains two introns. The tRNA genes trnA-UGC, trnG-GCC, trnI-GAU, trnK-UUU, trnL-UAA and trnV-UAC each contain one intron. Of the 15 introns, 11 are determined to be group II introns by RNAweasel (Lang et al. 2007).

The first and smallest exons of the genes petB, petD and rpl16 are 6, 8 and 9 bp respectively. These genes likely belong to polycistronic transcripts (Barkan 1988) of their respective protein complexes, but the short size of their initial exons make them difficult to annotate all the same. The initial exons of these genes were added to their annotations manually.

The gene rps12 of a plastid genome is typically trans-spliced (Hildebrand et al. 1988), which makes it difficult to annotate using MAKER. It is composed of three exons and one cis-spliced intron. It required manually editing the gene annotation to incorporate trans-splicing in the gene model.

Each copy of the inverted repeat (IR) is 445 bp in size, much smaller than most plants, but typical of Pinaceae (Lin et al. 2010). Unlike most inverted repeats, which are typically identical, the two copies differ by a single base. The IR contains a single gene, the tRNA trnI-CAU.

All 114 genes of the Norway spruce plastid genome (Nystedt et al. 2013) are present in the white spruce plastid genome in perfect synteny. Alignment of the white spruce genome to the Norway spruce genome using BWA-MEM (Li 2013) reveal no large-scale structural rearrangements. Alignments of the white spruce plastid genome cover 99.7% of the Norway spruce plastid genome, and the sequence identity in aligned regions is 99.2%.

Figure 1: The complete plastid genome of white spruce, annotated using MAKER and plotted using OrganellarGenomeDRAW (Lohse et al. 2007).

The white spruce mitochondrial genome

The mitochondrial genome was assembled into 38 scaffolds (132 contigs) with a scaffold N50 of 369 kbp (contig N50 of 102 kbp). The largest scaffold is 1222 kbp (Table 2). The scaffolds were aligned to the NCBI nucleotide (nt) database using BLAST. Of the 38 scaffolds, 26 scaffolds align to mitochondrial genomes, 3 small scaffolds (<10 kbp) align to Picea glauca mRNA clones and BAC sequences, 7 small scaffolds (<10 kbp) had no significant hits, and 2 small scaffolds (<5 kbp) align to cloning vectors. These last two scaffolds were removed from the assembly.

The mitochondrial genome contains 106 protein coding (mRNA) genes, 29 transfer RNA (tRNA) genes and 8 ribosomal RNA (rRNA) genes. The 106 protein-coding genes (51 distinct genes) compose 75 kbp (1.3%) of the genome. The 29 tRNA genes are found in 20 distinct species for 15 amino acids. The relative order of the genes on the scaffolds and gene size is shown in Figure 2. The size of each gene family is shown in Figure 3. The precise position of each gene on its scaffold is shown in supplementary Figure S4.

All tRNA genes are single copy, except trnD-GUC which has 3 copies, trnM-CAU which has 7 copies, and trnY-GUA which has 2 copies. The rRNA gene rrn5 has 4 copies, rrn18 has 3 copies, and rrn26 has 1 copy.

A large number of open reading frames are identified: 6265 of at least 90 bp, composing 1.4 Mbp, and 1065 of at least 300 bp, composing 413 kbp. These open reading frames do not have sufficient sequence similarity to the genes of the Viridiplantae mitochondria used for protein homology evidence to be annotated by either MAKER or Prokka.

A total of 7 introns are found in 5 distinct protein-coding genes. The protein-coding genes nad2, nad5, and nad7 each contain one intron, and nad4 and rps3 each contain two introns. All introns are determined to be group II introns by RNAweasel (Lang et al. 2007).

Repeats compose 390 kbp (6.6%) of the mitochondrial genome. Simple repeats and LTR Copia, ERV1 and Gypsy are the most common repeats, shown in Figure 4.

All 39 protein coding genes and 3 rRNA genes of the Cycas taitungensis mitochondrion are seen in white spruce. Of the 22 tRNA genes of Cycas taitungensis, 13 are found in white spruce, and 8 tRNA genes are seen in white spruce that are not seen in Cycas taitungensis.

Figure 2: The relative order of the genes on the scaffolds, and the size of each gene. Each box is proportional to the size of the gene including introns, except that genes smaller than 200 bp are shown as 200 bp. The space between genes is not to scale. An asterisk indicates that the gene name is truncated. Only scaffolds that have annotated genes are shown.

Figure 3: The gene content of the white spruce mitochondrial genome, grouped by gene family. Each box is proportional to the size of the gene including introns. The colour of each gene is unique within its gene family.

Figure 4: The repetitive sequence content of the white spruce mitochondrial genome, annotated using RepeatMasker and RepeatModeler.

The transcriptome of the white spruce mitochondrial genome

The transcript abundance of the mitochondrial coding genes with known function is shown in Figure 5. The transcript abundance of the mitochondrial coding genes including open read frames is shown in Figure 6. Of the samples analyzed, the transcriptomes of megagametophyte and embryo have the highest abundance of coding mitochondrial genes and cluster together.

Of the 106 coding genes with known function, 60 are expressed in at least one of the mature tissues, 29 are expressed in one of the developing tissues but not in a mature tissue, and 17 are not found to be expressed. Of the 6265 ORFs at least 90 bp, 427 (7%) are expressed in at least one of the mature tissues, 2809 (45%) are expressed in one of the developing tissues but not in a mature tissue, and 3029 (48%) are not found to be expressed. A gene with an abundance of at least ten transcripts per million as quantified by Salmon is considered to be expressed. These results are shown in Table 3.

Table 3: Number of expressed protein-coding genes and open reading frames tabulated by developmental stage.
Both Mature only Developing only Neither Sum
CDS 60 0 29 17 106
ORF 411 16 2809 3029 6265
Sum 471 16 2838 3046 6371

Possible C-to-U RNA editing, positions where the genome sequence shows C but the RNA-seq reads shows T, is observed in 68 of 106 coding genes shown in supplementary Table S5, with the most highly edited gene, nad3, seeing 9 edits per 100 bp. It can be difficult to distinguish RNA editing events from genomic SNV and miscalled variants caused by misaligned reads. We note however that 91% (1601 of 1751) of the variants called from the RNA-seq data are C-to-T variants shown in supplementary Table S6, which indicates that a large fraction of these variants are due to C-to-U RNA editing. C-to-U RNA editing can create new start and stop codons, but it is not able to destroy existing start and stop codons. Editing of the ACG codon to AUG to create a cryptic start codon is frequently seen in organellar genomes (Neckermann et al. 1994). Four genes have cryptic ACG start codons and corroborating C-to-U RNA editing evidence in the RNA-seq data: mttB, nad1, rps3 and rps4.

Figure 5: A heatmap of the transcript abundance of mitochondrial protein coding genes. Each column is a tissue sample. Each row is a gene. Each cell represents the transcript abundance of one gene in one sample. The colour scale is log<sub>10</sub>(TPM), where TPM is transcripts per million as measured by Salmon.

Figure 6: A heatmap of the transcript abundance of mitochondrial protein coding genes, including open reading frames. Each column is a tissue sample. Each row is a gene. Each cell represents the transcript abundance of one gene in one sample. The colour scale is log<sub>10</sub>(TPM), where TPM is transcripts per million as measured by Salmon.

Conclusion

One lane of MiSeq sequencing of whole genome DNA is sufficient to assemble the 123 kbp plastid genome, and one lane of HiSeq sequencing of whole genome DNA is sufficient to assemble the 5.9 Mbp mitochondrial genome of white spruce. Additional Illumina and PacBio sequencing is used to improved scaffold contiguity and to close scaffold gaps, after which the plastid genome is assembled in a single contig and the largest mitochondrial scaffold is 1.2 Mbp.

The white spruce plastid genome shows no structural rearrangements when compared with Norway spruce, and all genes of the Norway spruce (Picea abies) plastid are present in the white spruce plastid. All genes of the prince sago palm (Cycas taitungensis) mitochondrion are present in the white spruce mitochondrion.

The protein coding gene content of the mitochondrial genome is quite sparse, with 106 protein coding genes in 5.9 Mbp, in comparison to the plastid genome, with 74 protein coding genes in 123 kbp. Nearly 7% of the mitochondrial genome is composed of repeats, and roughly 1% is composed of coding genes. A significant portion, over 90%, of the unusually large size of the white spruce mitochondrial genome is yet unexplained.

Acknowledgements

This work was supported by Genome Canada, Genome British Columbia and Genome Quebec as part of the SMarTForests Project (www.smartforests.ca). We thank Carson Holt for his help with the MAKER analysis and Martin Krzywinski for his help with Figure 2.

References

Aizawa M, Kim Z-S, Yoshimaru H. 2012. Phylogeography of the korean pine (pinus koraiensis) in northeast asia: Inferences from organelle gene sequences. Journal of plant research. 125:713–723.

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. Journal of molecular biology. 215:403–410.

Alverson AJ et al. 2010. Insights into the evolution of mitochondrial genome size from complete sequences of citrullus lanatus and cucurbita pepo (cucurbitaceae). Molecular biology and evolution. 27:1436–1448.

Barkan A. 1988. Proteins encoded by a complex chloroplast transcription unit are each translated from both monocistronic and polycistronic mRNAs. The EMBO journal. 7:2637.

Benson DA et al. 2014. GenBank. Nucleic Acids Research. 43:D30–D35. doi: 10.1093/nar/gku1216.

Birol I et al. 2013. Assembling the 20 gb white spruce (picea glauca) genome from whole-genome shotgun sequencing data. Bioinformatics. btt178.

Bock DG, Kane NC, Ebert DP, Rieseberg LH. 2014. Genome skimming reveals the origin of the jerusalem artichoke tuber crop species: Neither from jerusalem nor an artichoke. New Phytologist. 201:1021–1030.

Campbell MS et al. 2014. MAKER-p: A tool kit for the rapid creation, management, and quality control of plant genome annotations. Plant physiology. 164:513–524.

Chaw S-M et al. 2008. The mitochondrial genome of the gymnosperm cycas taitungensis contains a novel family of short interspersed elements, bpu sequences, and abundant rNA editing sites. Molecular biology and evolution. 25:603–615.

Chen J et al. 2011. Substoichiometrically different mitotypes coexist in mitochondrial genomes of brassica napus l. PLoS One. 6:e17662.

Du FK et al. 2011. Direction and extent of organelle dNA introgression between two spruce species in the qinghai-tibetan plateau. New Phytologist. 192:1024–1033.

Fajardo D et al. 2014. The american cranberry mitochondrial genome reveals the presence of selenocysteine (tRNA-sec and sECIS) insertion machinery in land plants. Gene. 536:336–343.

Gerardi S, JARAMILLO-CORREA JP, Beaulieu J, Bousquet J. 2010. From glacial refugia to modern populations: New assemblages of organelle genomes generated by differential cytoplasmic gene flow in transcontinental black spruce. Molecular ecology. 19:5265–5280.

Gordon D, Green P. 2013. Consed: A graphical editor for next-generation sequencing. Bioinformatics. 29:2936–2937. doi: 10.1093/bioinformatics/btt515.

Green P. 1999. Documentation for phrap and cross-match. University of Washington, Seattle. http://www.phrap.org/phredphrapconsed.html.

Gremme G, Steinbiss S, Kurtz S. 2013. GenomeTools: A comprehensive software library for efficient processing of structured genome annotations. Computational Biology and Bioinformatics, IEEE/ACM Transactions on. 10:645–656.

Grewe F et al. 2014. Comparative analysis of 11 brassicales mitochondrial genomes and the mitochondrial transcriptome of< i> brassica oleracea</i>. Mitochondrion.

Guo W et al. 2014. Predominant and substoichiometric isomers of the plastid genome coexist within juniperus plants and have shifted multiple times during cupressophyte evolution. Genome biology and evolution. 6:580–590.

Gurevich A, Saveliev V, Vyahhi N, Tesler G. 2013. QUAST: Quality assessment tool for genome assemblies. Bioinformatics. 29:1072–1075.

Hildebrand M, Hallick RB, Passavant CW, Bourque DP. 1988. Trans-splicing in chloroplasts: The rps 12 loci of nicotiana tabacum. Proceedings of the National Academy of Sciences. 85:372–376.

Hyatt D et al. 2010. Prodigal: Prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 11:119. doi: 10.1186/1471-2105-11-119.

Jardón-Barbolla L, Delgado-Valerio P, Geada-López G, Vázquez-Lobo A, Piñero D. 2011. Phylogeography of pinus subsection australes in the caribbean basin. Annals of botany. 107:229–241.

Jeong Y-M et al. 2014. The complete mitochondrial genome of cultivated radish wK10039 (raphanus sativus l.). Mitochondrial DNA. 1–2.

Jo YD, Choi Y, Kim D-H, Kim B-D, Kang B-C. 2014. Extensive structural variations between mitochondrial genomes of cMS and normal peppers (capsicum annuum l.) revealed by complete nucleotide sequencing. BMC genomics. 15:561.

Kent WJ. 2002. BLAT—the bLAST-like alignment tool. Genome research. 12:656–664.

Keren I et al. 2009. AtnMat2, a nuclear-encoded maturase required for splicing of group-iI introns in arabidopsis mitochondria. Rna. 15:2299–2311.

Kim DH, Kang JG, Kim B-D. 2007. Isolation and characterization of the cytoplasmic male sterility-associated orf456 gene of chili pepper (capsicum annuum l.). Plant molecular biology. 63:519–532.

Kurtz S et al. 2004. Versatile and open software for comparing large genomes. Genome biology. 5:R12.

Lagesen K et al. 2007. RNAmmer: Consistent and rapid annotation of ribosomal rNA genes. Nucleic acids research. 35:3100–3108.

Lang BF, Laforest M-J, Burger G. 2007. Mitochondrial introns: A critical view. Trends in Genetics. 23:119–125. doi: 10.1016/j.tig.2007.01.006.

Laslett D, Canback B. 2004. ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. Nucleic Acids Research. 32:11–16.

Li H. 2013. Aligning sequence reads, clone sequences and assembly contigs with bWA-mEM. arXiv preprint arXiv:1303.3997.

Lin C-P, Huang J-P, Wu C-S, Hsu C-Y, Chaw S-M. 2010. Comparative chloroplast genomics reveals the evolution of pinaceae genera and subfamilies. Genome biology and evolution. 2:504–517.

Lohse M, Drechsel O, Bock R. 2007. OrganellarGenomeDRAW (oGDRAW): A tool for the easy generation of high-quality custom graphical maps of plastid and mitochondrial genomes. Current genetics. 52:267–274.

Lowe TM, Eddy SR. 1997. TRNAscan-sE: A program for improved detection of transfer rNA genes in genomic sequence. Nucleic acids research. 25:0955–964.

Luo R et al. 2012. SOAPdenovo2: An empirically improved memory-efficient short-read de novo assembler. Gigascience. 1:18.

McKenna A et al. 2010. The genome analysis toolkit: A mapReduce framework for analyzing next-generation dNA sequencing data. Genome research. 20:1297–1303. doi: 10.1101/gr.107524.110.

Nascimento Vieira L do et al. 2014. An improved protocol for intact chloroplasts and cpDNA isolation in conifers. PloS one. 9:e84792.

Nascimento Vieira L do et al. 2014. The complete chloroplast genome sequence of podocarpus lambertii: Genome structure, evolutionary aspects, gene content and sSR detection. PloS one. 9:e90618.

Neckermann K, Zeltz P, Igloi GL, Kössel H, Maier RM. 1994. The role of RNA editing in conservation of start codons in chloroplast genomes. Gene. 146:177–182. doi: 10.1016/0378-1119(94)90290-9.

Nystedt B et al. 2013. The norway spruce genome sequence and conifer genome evolution. Nature. 497:579–584.

Park S et al. 2014. Complete sequences of organelle genomes from the medicinal plant rhazya stricta (apocynaceae) and contrasting patterns of mitochondrial genome evolution across asterids. BMC genomics. 15:405.

Patro R, Mount SM, Kingsford C. 2014. Sailfish enables alignment-free isoform quantification from rNA-seq reads using lightweight algorithms. Nature biotechnology. 32:462–464. doi: 10.1038/nbt.2862.

Quinlan AR, Hall IM. 2010. BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics. 26:841–842. doi: 10.1093/bioinformatics/btq033.

Rice DW et al. 2013. Horizontal transfer of entire genomes via mitochondrial fusion in the angiosperm amborella. Science. 342:1468–1473.

Seemann T. 2014. Prokka: Rapid prokaryotic genome annotation. Bioinformatics. btu153.

Simpson JT et al. 2009. ABySS: A parallel assembler for short read sequence data. Genome research. 19:1117–1123.

Sloan DB et al. 2012. Rapid evolution of enormous, multichromosomal genomes in flowering plant mitochondria with exceptionally high mutation rates. PLoS biology. 10:e1001241.

Smit A, Hubley R, Green P. 1996. RepeatMasker open-3.0. http://www.repeatmasker.org.

Vandervalk BP et al. 2014. Konnector: Connecting paired-end reads using a bloom filter de bruijn graph. In: 2014 IEEE international conference on bioinformatics and biomedicine (BIBM). Institute of Electrical & Electronics Engineers (IEEE). doi: 10.1109/bibm.2014.6999126.

Wang Y et al. 2014. Complete mitochondrial genome of eruca sativa mill.(Garden rocket). PloS one. 9:e105748.

Warren RL et al. 2015. Improved white spruce (picea glauca) genome assemblies and annotation of large gene families of conifer terpenoid and phenolic defense metabolism. The Plant Journal. doi: 10.1111/tpj.12886.

Warren RL, Vandervalk BP, Jones SJ, Birol I. 2015. LINKS: Scaffolding genome assemblies with kilobase-long nanopore reads. bioRxiv. 016519. doi: 10.1101/016519.

Whittle C-A, Johnston MO. 2002. Male-driven evolution of mitochondrial and chloroplastidial dNA sequences in plants. Molecular biology and evolution. 19:938–949.

Wu C-S, Wang Y-N, Liu S-M, Chaw S-M. 2007. Chloroplast genome (cpDNA) of cycas taitungensis and 56 cp protein-coding genes of gnetum parvifolium: Insights into cpDNA evolution and phylogeny of extant seed plants. Molecular biology and evolution. 24:1366–1379.

Wyman SK, Jansen RK, Boore JL. 2004. Automatic annotation of organellar genomes with dOGMA. Bioinformatics. 20:3252–3255.

Zerbino DR, Birney E. 2008. Velvet: Algorithms for de novo short read assembly using de bruijn graphs. Genome research. 18:821–829.

Zhang Y et al. 2014. The complete chloroplast genome sequence of taxus chinensis var. mairei (taxaceae): Loss of an inverted repeat region and comparative analysis with related species. Gene. 540:201–209.

About

🌲 Organellar Genomes of White Spruce (Picea glauca)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages