Mitochondrial genomes of Pleistocene megafauna retrieved from recent sediment layers of two Siberian lakes

Ancient environmental DNA (aeDNA) from lake sediments has yielded remarkable insights for the reconstruction of past ecosystems, including suggestions of late survival of extinct species. However, translocation and lateral inflow of DNA in sediments can potentially distort the stratigraphic signal of the DNA. Using three different approaches on two short lake sediment cores of the Yamal peninsula, West Siberia, with ages spanning only the past hundreds of years, we detect DNA and identified mitochondrial genomes of multiple mammoth and woolly rhinoceros individuals—both species that have been extinct for thousands of years on the mainland. The occurrence of clearly identifiable aeDNA of extinct Pleistocene megafauna (e.g. >400 K reads in one core) throughout these two short subsurface cores, along with specificities of sedimentology and dating, confirm that processes acting on regional scales, such as extensive permafrost thawing, can influence the aeDNA record and should be accounted for in aeDNA paleoecology.


Introduction
Sedimentary deposits constitute highly valuable archives of past ecosystem changes as they contain dateable layers with organismic remains including ancient DNA (aDNA).Such remains are typically assumed to represent the ecosystem of the time around which the respective stratum was deposited.
aDNA from sediments has yielded remarkable insights regarding paleoecology, phylogeography, and extirpation and extinction events of keystone taxa such as mammoths (Haile et al., 2009;Boessenkool et al., 2012;Graham et al., 2016;Murchie et al., 2021b).Based on such ancient environmental DNA (aeDNA), a recent study proposed that the woolly mammoth (Mammuthus primigenius) may have survived in Eurasia for much longer than previously assumed, as the authors retrieved mammoth DNA sequences in sediment layers that were approximately 4.6-7 thousand years (kyr) younger than the most recent mammoth fossils Wang et al., 2021;however, in response to this interpretation, Miller and Simpson, 2022 opined that these results may be more likely due to taphonomic processes leading to release of aeDNA from the remains of long-dead organisms from permafrost, where it is well preserved.
Conclusions derived from aeDNA isolated from sediment cores rely on the stratified nature of the remains in question and dating of the respective layer by radiometric methods.However, in theory, various physical and geochemical processes such as translocation of DNA through sediment strata (Haile et al., 2007), re-deposition of older sediment carrying DNA of extinct organisms (Arnold et al., 2011), and preservation bias (Boere et al., 2011) can distort the biological signal of aeDNA and thus bias the accuracy of allocation of taxa to specific time periods (Armbrecht et al., 2019).For lake sediments, deposited under aquatic conditions, studies have suggested that leaching is not a concern (Parducci et al., 2017), but it has been observed in soils and cave sediments (Haile et al., 2007).The question of obtaining last appearance dates of extinct taxa using aeDNA in dated sediment layers (Haile et al., 2009) is under discussion, but the surprisingly young records published so far still date to multiple thousands of years before present and thus lie within a timeframe of possible late survival.

Results and discussion
In 2019, we retrieved short subsurface sediment cores from two Arctic thermokarst lakes (LK-001 and LK-007, located approximately 5 km apart, over massive permafrost; Table 1; Dvornikov et al., 2016) on the Yamal peninsula, Siberia, to extract DNA and assess changes in mammal abundances in the Arctic over the past decades and centuries.From lake LK-001, we collected a secondary core which was sliced in the field at 1 cm steps for Pb 210 radiometric dating, which indicated that the sediments at the top of this core were deposited recently, and that the core spanned the past few centuries (Appendix 1-table 7).The cores for DNA extraction were closed in the field immediately after retrieval and were then transported to the dedicated aDNA laboratories of the University of Konstanz, Germany.In this lab facility, established in 2020, no other samples from the Arctic or from any large mammals had been processed previously.From core LK-001, we isolated DNA and produced genomic double-stranded libraries from 23 samples, from 1.5 to 80 cm core depth (Supplement section 1), according to standard procedures.The core was opened and all subsequent steps until index PCR setup were carried out under customary aDNA laboratory conditions.In particular, the core opening facilities and the lab are located in buildings separated from the downstream molecular genetic analyses, the ventilation of the aDNA lab is based on a HEPA filter system and positive air pressure, and the lab is subjected to nightly UV radiation.Work in the lab is conducted under strict aDNA precautions, adhering to established aDNA protocols (Fulton and Shapiro, 2019).We enriched the libraries for mammalian DNA using a custom RNA bait panel produced from complete mitogenome sequences of 17 mammal species that currently or previously occurred in the Arctic (adapted from Murchie et al., 2021a).The enriched libraries were sequenced, and we mapped the sequences against a database of 73 mammal mitogenomes, followed by BLASTn alignment against the complete NCBI nt database.We thus retrieved mitogenomic sequences of mammals that were expected during the age range covered by the core (Appendix 1-table 9), for example, reindeer (Rangifer tarandus), Arctic lemming (Dicrostonyx torquatus), and hare (Lepus); however, throughout the entire core, there were abundant sequences of two species that have been extinct for several thousand years, that is mammoth (Mammuthus primigenius) and woolly rhinoceros (Coelodonta antiquitatis).Twenty-two of the 23 LK-001 libraries produced >1000 reads, each, assigned to Mammuthus, with read counts ranging from 2852 to 72,919 (mean 21,140±17,296).Negative controls (extraction and library blanks) did not produce any reads assigned to mammals.In the sample with the highest M. primigenius read counts (31.5 cm depth, dated to 81 years), the coverage of the reference mitogenome (NCBI accession NC_007596.2) was 95.3%, (434 (±213)-fold).Across all samples, 465,080 reads assigned to mammoth were produced, with 98.3% coverage (2762 to ±1176 fold).Read lengths ranged from 28 to 289 bp (mean 100±44 bp; Figure 1).The number of woolly mammoth reads decreased from lower samples towards the top of the core (Figure 1).Signatures of post-mortem DNA decay were comparably minor (Figure 1), with reference to an M. primigenius genome downloaded from NCBI (accession NC_007596.2),and mapping suggested that the sequences throughout the core originated from multiple individuals.Further analyses of the three libraries with the most mammoth reads   6).The bar chart indicates a maximum-likelihood estimate of the haplogroup proportions derived from the reads from the three sediment core libraries with the most mammoth reads.
using mixemt (Vohr et al., 2017) identified a number of mitochondrial haplogroups in the sequences from the core, suggesting that they originated from a multiple individuals (Figure 2).The haplogroups identified were known to occupy the region, and it seems likely the sequences reflect a history of mammoth occupation at the core site.Twelve of the 23 libraries produced >100 reads, each, assigned to woolly rhinoceros, with a total of 2737 reads and a cumulative coverage of 44% (assembled to NC_012681.1 2).Further analyses are mostly focused on the mammoth sequences as these occurred in substantially higher numbers.As these results were entirely incongruent with the temporal occurrence of these two species, we employed several methods to confirm their validity, that is conventional PCR and Sanger sequencing of a mammoth cytochrome c oxidase subunit I (COI) fragment, mammal metabarcoding, and droplet digital PCR (ddPCR) of a mammoth cytochrome b fragment.We performed these analyses on core LK-001 and, to evaluate whether this is a locally isolated phenomenon, on a second short core, LK-007, from a nearby lake, which had been opened and processed under the same conditions indicated above to prevent contamination.Additionally, we screened the core LK-001 for plant macrofossils, of which three were sent for 14 C AMS dating.
Conventional PCR and Sanger sequencing confirmed amplification of a COI fragment of M. primigenius.Mammal metabarcoding produced mammoth sequences (74 bp) in 13 samples of core LK-001 and 9 samples of core LK-007.In the LK-001 core, the highest M. primigenius read count occurred at 31.5 cm (2,992 reads), and in the LK-007 core at 26 cm M. primigenius (3,580 reads).ddPCR produced M. primigenius sequences in 14 samples of each core.Metabarcoding and ddPCR patterns across the cores were similar, although not completely congruent, as ddPCR appeared to be more sensitive (Appendix 1-figure 1).The dates retrieved by radiocarbon dating were not congruent with the initial age inference suggested by Pb 210 .While the lowest and topmost sample (with ages of 1547±228 and 339±79 uncal.yrs BP respectively) suggest relatively young ages and agree in their temporal succession, the middle sample, at 51 cm, yielded a radiocarbon age of 8677±132 years.
The mammoth was abundant throughout most of Eurasia during the Pleistocene, but populations declined at the end of the Pleistocene, with the species going extinct in the mid-Holocene (Nogués-Bravo et al., 2008).The youngest fossils from mainland Siberia have been dated to 9650 years (Stuart et al., 2002).The woolly rhinoceros was a cold-adapted megaherbivore, which was abundant from western Europe to north-east Siberia during the Middle to Late Pleistocene (Kahlke and Lacombat, 2008).This species was predominantly grazing, probably resorting to browsing only due to seasonal vegetation restrictions (Kahlke and Lacombat, 2008;Rey-Iglesia et al., 2021;Stefaniak et al., 2021).The reasons underlying the extinction of this species at ca 14 ka BP are not entirely clear, but it is largely attributed to sudden climate warming and subsequent habitat unsuitability due to vegetation changes (Stuart and Lister, 2012;Lord et al., 2020;Wang et al., 2021), likely coupled with human influence (Fordham et al., 2022), in the Bølling-Allerød interstadial warming.
The present data, which implies frequent and abundant Pleistocene megafaunal DNA throughout a sediment core deposited in the lake over the past centuries suggests that physical processes, rather than presence of live organisms, are responsible for the recovery of this DNA.While not in itself fully conclusive, our data suggests the source of the DNA of Pleistocene mammals from older permafrost deposits, either from a carcass or from sedimentary materials carrying the DNA.The numbers of mammoth sequences increased with depth downcore, with comparably low abundances over the most recent 11 cm of the core (aged <30 years), pointing to a decrease of input of this DNA through time.The apparently rather limited extent of aDNA damage in the mammoth sequences suggests that the source of this DNA has been preserved exceptionally well, which also suggests an origin from permafrost, and the specific dynamics of thawing and re-deposition of material in the study area offer an explanation.
Here, the active thermokarst likely began during the climatic optimum of the Holocene (9000-3,000 BC; Savvichev et al., 2021).The LK-001 and LK-007 lake basins are interbedded mainly into the IV th marine plain, formed in the Kazantsevskaya, Marine Isotopic Stage 5.The permafrost of these lake catchments was formed no earlier than 70-60 kyr BP after the Kazantsevskaya transgression in more sub-aerial conditions and with mean annual temperatures 6 to 7 °C lower than modern temperatures (Baulin et al., 1981).As this area was under coastal-marine conditions for a long period, these lake basins may be paleo-marine remnants, or they were formed later as a result of thermokarst over the segregated or tabular (massive) ground ice during the Holocene climatic optimum (Kachurin, 1961).
The area is also subject to abrupt permafrost thaw (thermo-denudation), resulting, for example, in the formation of retrogressive-thaw slumps (thermocirques) and the transport of a large amounts of thawed terrestrial material into the lake water (Dvornikov et al., 2018).Such abrupt permafrost thaw processes normally appear adjacent to lakes and can form specific geomorphological elements, that is, thermo-terraces (Kizyakov et al., 2006) within lake catchments and lakes; they are normally polycyclic processes appearing due to the extension of a seasonally thawed layer (active layer) up to the top of massive ice and more favorable thermal conditions within the existing forms (Leibman and Kizyakov, 2007;Kokelj et al., 2009).Traces of past thermo-denudation can be observed within both lake catchments.In catchments of five neighboring lakes, large retrogressive-thaw slumps appeared in recent years (2012)(2013) accompanied by the thaw and lateral transport of modern and Late Pleistocene deposits into lakes.Additionally or alternatively, thermo-erosion of upper geomorphological levels and transport via stream networks could transport ancient material into the modern lacustrine sediments.However, the two studied lakes are headwater lakes (with outlet, no apparent inlet) and this option can only be considered in terms of small thermo-erosional valleys within the catchments.
An alternative mechanism for the redistribution of Late Pleistocene material in the sediments is related to subcap methane emission (bubbling) from degrading permafrost beneath the lake bottom.In-lake bubbling can be observed in a circum-Arctic scale: in North-East Siberia, Alaska and Canada.This is common especially in lakes with a depth exceeding two meters, which do not freeze entirely up to the bottom in winter, leading to the formation of a talik (a layer of year-round unfrozen ground that lies in permafrost area).The expansion of the talik may further trigger subcap methane emission, which can reach 40-70 kg yr -1 of pure (94-100%) methane in neighboring lakes (Kazantsev et al., 2020).The constant methane seepage does not allow ice to be formed in winter (whereas the normal winter ice thickness is approximately 1.5 m) and can potentially disturb the stratigraphy of lake sediments.Additionally, dramatic emissions of methane can form craters in terrestrial and lacustrine environments (Dvornikov et al., 2019).In this case, Late Pleistocene sediments will well be re-distributed within the water-body and the entire stratigraphy will be mixed.
In the case reported here, the simultaneous finding of a Pb 210 chronology indicating recent sediment deposition and of plant macrofossils that dated to >8000 years BP in a sample from 36.5 cm, suggest lateral input of ancient material, including the mammalian DNA, putatively related to permafrost thawing processes.Numerous studies on aDNA discussed possible leaching through sedimentary strata of the DNA itself, yet it was typically considered not an issue as most of these studies were conducted under stable permafrost or similar conditions (e.g.Hansen et al., 2006;Haile et al., 2007, but see Andersen et al., 2012).Permafrost thawing and re-deposition of material adds a new dimension to this problem of temporal interpretation.The fact that we retrieved mammoth sequences from both cores of two lakes located approximately 5 km apart suggests that this is not an isolated phenomenon but occurs on a regional or even larger scale.Given the wide spread of abrupt permafrost thaw processes in the Arctic plains (West Siberia, Taimyr, Chukotka, Alaska, Canadian Arctic; Kizyakov et al., 2006 and references therein), the phenomena of disturbed stratigraphy of lacustrine sediments can potentially be observed at a pan-Arctic scale.While this indicates that temporal interpretation of sedimentary aeDNA records should be exercised with caution, our study also demonstrates that a careful evaluation of available information on the site and ecosystem in conjunction with the use of independent dating techniques can uncover incongruencies.This is more difficult in older time periods, where artefactual stratigraphies caused by equivalent processes acting for a limited time will not be detected as easily as in our case of long extinct species.The same applies to extant taxa or those which have undergone extinction or extirpation more recently, the presence of which cannot as easily be excluded as in the current example.However, we suggest that the inclusion of robust dating techniques and knowledge of local geophysical processes can provide good arguments to evaluate the reliability of aeDNA records.

Field sites, DNA isolation and hybridization capture enrichment
In 2019, sediment cores (6 cm diameter; Table 1) were retrieved from two lakes (LK-001 and LK-007, respectively, Table 1) on the Yamal peninsula, Siberia, using a UWITEC piston corer (UWITEC, Mondsee, Austria).The lakes were located approximately 5 km apart.The cores were transported to the aDNA laboratories of the University of Konstanz, Germany; from lake LK-001, a secondary core was taken which was sliced in the field at 1 cm steps for radiometric dating from 0 to 39.5 cm depth, performed at the Environmental Radioactivity Research Centre, University of Liverpool (Supplement section 9).Additional 14 C dating of three specimens of plant remains, extracted at 36.5 cm, 51 cm, and 74 cm, was performed (Supplement section 9).
Sedimentary DNA was isolated from 23 samples of core LK-001 and from 16 samples of core LK-007 using commercially available kits with modified protocols (Supplement section 1).The extracts of core LK-001 were subjected to library preparation for capture enrichment.Enrichment probes were designed from mitogenomes of 17 herbivorous mammal species that currently or previously occurred in the Arctic (Appendix 1-table 2) and few lichen sequences (Appendix 1-table 3).Genomic libraries were produced according to Li et al., 2013 with some modifications (Seeber et al., 2019;Seeber et al., 2023;Li et al., 2023; Supplement section 2).Filtered reads were mapped to mammalian mitogenomes, followed by BLASTn alignment against the complete NCBI nucleotide database and subsequent metagenomic analyses using MEGAN (Huson et al., 2016).Reads assigned to Mammuthus were mapped to a complete M. primigenius reference mitogenome (NCBI accession NC_007596.2);reads assigned to Coelodonta antiquitatis were mapped to the NCBI reference mitogenome NC_012681.1 2. The reads mapped to mammoth from the top three libraries were assigned to haplogroup using mixemt (https://github.com/svohr/mixemt,Copy archieved at Vohr, 2024) with a custom-made representative panel of 15 mammoth mitogenomes (Figure 2; Appendix 1-table 6).

Conventional PCR, mammal metabarcoding, and ddPCR
Based on the enriched fragments with the highest coverage, PCR primers specific to M. primigenius were designed using Geneious Prime 2022.1.1 (Kearse et al., 2012), that is mamm801 (5`-CCCA TGCA GGAG CTTC TGTA GA-3`) and mamm800r (5`-GACA TAGC TGGA GGTT TTAT GT-3`) to produce a 121 bp amplicon of the CO1 gene.The specific PCR conditions are described in Supplement section 6. Mammal metabarcoding PCRs were performed on 21 DNA extracts of core LK-001 and 27 samples of core LK-007, with eight independent replicates, each.Each batch of PCRs included one nontemplate control.Established metabarcoding primers were used (Giguet-Covex et al., 2014), and human blocking primers (Garcés-Pastor et al., 2021) were included.PCR conditions are described in the supplementary material.Sequencing was performed on an Illumina NovaSeq platform, with 2x150 reads.The raw data were processed as described in the supplement.We used the sample LK-001_66.5 of core LK-001 which had produced one of the highest Mammuthus read counts after enrichment.The specific target amplified by the primer pair were used to design a probe (5`-GGAT ACTC CTGC AAGG TGAA GTG-3`).With this probe, ddPCR was performed with 24 extracts of core LK-001 and with 30 extracts of core LK-007, with three replicates, each (Supplement section 8).
Peter Andreas Seeber: Employed by and owns stock of Thermo Fisher Scientific.Samuel H Vohr: Employed by Embark Veterinary, Inc.The other authors declare that no competing interests exist.Thereafter, libraries were pooled (four libraries/pool) and were purified using a MinElute PCR purification kit (Qiagen; elution: 2x20 µL).Concentrations were measured using a Qubit device (broad-range assay; Thermo Fisher Scientific).

Hybridization capture enrichment and sequencing
To design RNA oligonucleotide baits for hybridization capture, we compiled the mitochondrial genome sequences of 17 selected mammal species (comprising 282,168 nucleotides [nt]; Appendix 1-table 2) and ITS-1 and ITS-2 sequences of Cladonia rangiferina (8546 and 4750 nt, respectively; Appendix 1-table 3).The compiled mitochondrial sequences were submitted to Daicel Arbor Biosciences for custom design of 70 bp baits at threefold tiling, collapsed at 99% identity and 85% overlap, resulting in 9339 unique baits (747,120 nt).
Each purified library pool containing four libraries was subjected to one hybridization capture reaction, apart from libraries produced from extraction blanks (N=15), library blanks (N=15), and PCR non-template controls (N=30), all of which were pooled in one reaction due to the minute DNA concentrations (mostly below detection range).
Hybridization capture was performed according to the MYbaits protocol v. 5.00 (Daicel Arbor Biosciences) with the following modifications: a total amount of 150 ng baits per capture reaction was used, and library pools were incubated for hybridization with the baits at 58 °C for 24 hr.Postcapture, the enriched libraries were removed from the beads and were amplified similarly to the PCR protocol used for the indexing PCR and with primers IS5/IS6 (Illumina, San Diego, CA, USA) and 14 cycles.PCR products were then purified using the MinElute PCR Purification Kit (Qiagen), and the final library concentration was measured on an Agilent Bioanalyzer.For sequencing, the enriched libraries were pooled at equal concentrations, and two library pools (produced from 57 and 110 libraries, respectively) were sequenced on an Illumina NovaSeq platform using three Novaseq SP 2x150 flow cells.

Bioinformatics of capture enrichment data
Adapter sequences were removed and sequences were filtered using leeHom with the default ancient DNA settings (Renaud et al., 2014).Duplicates were removed, and filtered reads were mapped against a database of 73 mammal mitogenomes (Appendix 1-table 4) using BWA v. 0.7.17 (Li and Durbin, 2009) and were blastn-aligned against the complete NCBI Genbank nucleotide database (Benson et al., 2009;retrieved March 14 2022) with a maximum e-value of 0.01.All blast output files were processed using MEGAN Community Edition 6.19.8 using a weighted LCA algorithm (Huson et al., 2016).aDNA degradation was examined using mapDamage 2.2.1 (Ginolhac et al., 2011) against a reference genome of M. primigenius (NC_007596.2).The respective command lines are shown in Appendix 1-table 5.The reads blast-assigned to M. primigenius were mapped to the complete M. primigenius reference mitogenome to produce consensus sequences using Geneious Prime 2023.0.1 (Kearse et al., 2012).

Sediment dating
The dating sections of core LK-001 (Appendix 1-table 13) were dried to constant weight at 65 °C and were then analyzed for 210 Pb, 226 Ra, and 137 Cs by direct gamma assay at the Liverpool University Environmental Radioactivity Laboratory using Ortec HPGe GWL series well-type coaxial low background intrinsic germanium detectors.This allowed dating to a depth of 39.5 cm (Appendix 1table 13).
Appendix 1-table 13. 210 Pb chronology of Yamal lake sediment core LK-001.Core LK-001 was additionally dated using radiocarbon.We collected plant remains throughout the core by first using a scalpel to cut 1 cm thick of bulk sediment at desired depths, followed by washing the sediment chunks with distilled water and filtering using 190 µm mesh sieves.We then collected leaves of deciduous trees and seeds, which were placed in glass vials and dried under a fume hood before sending to the Micadas (Alfred Wegener Institute, Bremerhaven, Germany) for radiocarbon dating.Sample preparation and measurement methodologies at the Micadas were performed as described previously (Mollenhauer et al., 2021).We sent three samples that had sufficient plant materials for dating and retained two dates after removing an inaccurate date caused by high inbuilt age from plant remains (Appendix 1-table 14).

Mammal read numbers
The numbers of reads assigned to all mammals are indicated in Appendix 1-table 15 (core LK-001 only; hybridization capture experiment).

Figure 1 .
Figure 1.aDNA of Mammuthus in recent lake sediments.(A) Read counts assigned to Mammuthus (square-root-transformed proportion of the respective number of raw reads per library) after hybridization capture enrichment of aeDNA of core LK-001 (shown are results of 22 libraries; one library was excluded as it did not produce any reads assigned to mammals); square-root transformation of percentage.Indicated are sample depths (in cm; 1.5-80 cm) and approximate ages as per 210 Pb chronology (Appendix 1-table 7; to a maximum depth of 39.5 cm).The solid line indicates the general trend.Across the 22 libraries: (B) Fragment length distribution and (C) damage patterns (red indicates C-to-T transitions, blue G-to-A transitions.the Y-axis indicates the percentage of positions with a nucleotide change, the X-axis indicates the position along the fragment).

Figure 2 .
Figure 2. Locations of the sediment cores of the present study (Yamal peninsula, Siberia) and previously retrieved mammoth remains and their haplo(sub)groups (Appendix 1-table6).The bar chart indicates a maximum-likelihood estimate of the haplogroup proportions derived from the reads from the three sediment core libraries with the most mammoth reads.

Table 1 .
Sediment cores retrieved from two lakes on the Yamal peninsula, Siberia.

table 5 .
Mitogenome templates of 17 mammal species for design of RNA baits.

table 6 .
Cladonia rangiferina sequences for RNA bait design (shown are the NCBI accessions).