Releases: pr2database/pr2database
PR2 version 5.0.0
Contributors
- Daniel Vaulot - General curation, Diatoms
- Javier del Campo, Fabien Burki, Mahwash Jamy , Laure Guillou - Taxonomy - 9 levels update
- Luciana Santoferrara, Maximilian Ganser - Ciliates
- Luciana Santoferrara - Mixoplankton database
- Andrea de Oliveira da Rocha Franco - Diatoms (4 genera)
- Kenneth Mertens, Haifeng Gu, Se Hyeon Jang - Dinoflagellates
- Pavel Škaloud - Chrysophyceae
- Manon Dünn - Percolomonads
- Megan Gross - Microsporidia - Metchnikovellida
- Alexei Seliuk - Chytrids and Fungi_X
- Miguel Sandin - Radiolaria
- Sebastian Metz - Perkinsea
- Anna Maria Fiore-Donno - Myxogastria
- Richard Dorrell - 16S plastid
Minor updates
5.0.1 - 2023-05-15
Assets 5.0.0 have been updated without changing the version #
- Sequences removed: 6
- Sequences updated: 79 (mostly Burkholderia_sp. which were wrongly labelled)
Main changes
- Upgrade taxonomy from 8 levels to 9 levels
- Add link to EukRibo database (Berney et al. 2022)
- Add link to Mixoplankton Database (Mitra et al. 2023)
Major groups for which taxonomy has been updated
- Bacteria and Archaea
- New organelles sequences (plstids and mitochondria)
- Stramenopiles
- Diatoms
- Chrysophyceae
- Alveolates
- Ciliates
- Dinoflagellates
- Perkinsea
- Fungi
- Chytrids
- Amoebozoa
* Myxogastria - New supergroup added: Provora
Sequences
- Added : 5,718
- Taxonomy Updated: 17,954
- Quarantined/Removed: 100
Detailed changes
Taxonomy structure
- We moved from 8 levels (kingdom to species) to 9 levels (domain to species) with a new level subdivision. The changes are explained here.
Taxonomic groups updated
Obazoa
- Fungi
- Microsporidia - Metchnikovellida:
- Add 6 sequences
- Chytrids
- taxonomy completely revised to follow in particular Tedersoo et al. : 191 tatxa edited
- Add 34 sequences
- Microsporidia - Metchnikovellida:
TSAR
- Alveolata
- Genus Alphamonas corrected from Aphamonas
- Dinoflagellates
- Suessiaceae, Borghiellaceae
- Gonyaucales
- Kareniaceae, Warnowiaceae
- Ciliates
- Spirotrichea updated to follow more closely EukRef annotations. In particular:
- added new sequences, made some corrections, and updated names (sensu Adl et al. 2019).
- removed these artificial groups:
- Leegardiellidae_A and _B: replaced by Leegardiellidae
- Strobilidiidae_A to J: replaced by Strobilidiidae
- Strombidiidae_A to R: replaced by Strombidiidae
- Tontoniidae_A and B: replaced by Tontoniidae
- Strombidiida_A to H: replaced by Strombidiida
- For tintinnids, we included both the order and suborder (Choreotrichida-Tintinnina) in the order column (best compromise, hopefully acceptable)
- removed these artificial groups:
- added new sequences, made some corrections, and updated names (sensu Adl et al. 2019).
- Taxonomy of following families updated
- Discotrichidae
- Plagiocampidae
- Urotrichidae
- Protocruziidae
- Spirotrichea updated to follow more closely EukRef annotations. In particular:
- Perkinsea
- 10 sequences removed
- 19 sequences reassigned
- 614 new sequences added
- Stramenopiles
- Olisthodiscophyceae : New class added created in Barcytė et al. (2021)
- Pseudochattonella was wrongly spelled.
- Diatom taxonomy has been updated with the three recognized classes following Algaebase
- Bacilliarophyceae
- Coscinodiscophyceae
- Mediophyceae
- Diatomea_X (class) is used for taxa that are not assigned to one of these three classes (e.g. Pheodactylum)
- Diatoms genera Anaulus, Asterionellopsis, Ceratanaulus, Eunotogramma, Plagiogramma updated plus 2 new sequences assigned
- Chrysophyceae taxonomy completely revised following in particular Scoble et Cavalier-Smith 2014 and Charvet et al. 2012.
- Sequences updated: 1577
- Sequences added: 160
- Taxa updated: 428
- Taxa added: 46
- Rhizaria
- Radiolaria
- Spumellaria: 67 new sequences annotated
- Radiolaria
Archaeplastida
- Picozoa
- Are now classified within Archaeplastida.
Excavata
- 14 Percolomonads sequences added
Amoebozoa
- Myxogastria: taxonomy updated
Provora
- New supergroup added
Organelles
- 16S plastid: 252 new sequences added (some are shorter than 500 bp.)
- 16S mitochondrion: 1818 sequences from original PR2 database
- 18S nucleomorph: 12 sequences from original PR2 database
Bacteria, Archaea
- Supergroup added
- Cyanobacteria: supergroup replaced by Bacteria_X (before Terrabacteria)
Link PR2 to other databases
EukRibo database version 2.0
See Berney et al. 2022.
- Database is available from Zenodo: https://doi.org/10.5281/zenodo.6327890
- Add sequences that are not present in PR2: 510
- Add taxa that are not present in PR2: 938
- Update sequences taxonomy for all sequences that had no species assigned: 1257
- Fields from EukRibo added to PR2 metadaata for 48,136 sequences
- eukribo_UniEuk_taxonomy_string: taxonomy annotation from EukRibo (number of levels is variable)
- eukribo_V4: Does the sequence contains the V4 region
- eukribo_V9: Does the sequence contains the V4 region
- See tutorial: EukRibo database
Mixoplankton database
See Mitra et al. 2023 (https://doi.org/10.1111/jeu.12972) and database at DOI 10.5281/zenodo.7560582
- Added one column
mixoplankton
in the metadata filter(!is.na(mixoplankton))
for mixotrophic species
- See tutorial: Mixoplankton database
WoRMS database
WoRMS database is an authoritative and comprehensive list of names of marine organisms, including information on synonymy. The content of WoRMS is controlled by taxonomic and thematic experts, not by database managers. For species in PR2 that have an entrin WoRMS we have now added a link to the AphaID (worms_id field) as well as information on the distribution of the species (maine, brackish, freshwater, terrestrial).
Sequences uploaded but not yet annotated
- 35,884 18S rRNA sequences added from GenBank - 2021-03-23 to 2023-02-20
- Only 17 504 of these 35 884 pass the current criteria for PR2 (length >= 500 bp etc..)
Sequences annotated automatically
- 19 405 18S rRNA sequences from GenBank originating from strains corresponding to 2279 new species in taxonomy table
Sequences removed
- Chimeras: 1259 from initial version of PR2
- AY745555.1.1854_U, AY745597.1.1844_U, EF209781.1.1956_U, EF209774.1.1835_U, EF209794.1.1834_U which do not exist on Genbank anymore.
Database structure changes
- pr2_taxonomy and pr2_main have now only the 9-level taxonomy (domain -> species)
- New table: eukribo_v2 from https://zenodo.org/record/6896896
R package
- New function added
- pr2_taxonomy(): Master taxonomy table
- Few bugs fixed
- Add 2 tutorials
Scripts
These scripts are just to show some of the procedures used to update the PR2 database. Do not try to run them, they will not work as they require access to the MySQL PR2 database.
References
Taxonomy structure
- Burki, Fabien, Andrew J. Roger, Matthew W. Brown, et Alastair G.B. Simpson. 2020. « The New Tree of Eukaryotes ». Trends in Ecology and Evolution 35 (1): 43‑55. https://doi.org/10.1016/j.tree.2019.08.008.
Linked databases
- Berney, Cédric, Nicolas Henry, Frédéric Mahé, Daniel J. Richter, et Colomban de Vargas. 2022. EukRibo: A Manually Curated Eukaryotic 18S RDNA Reference Database to Facilitate Identification of New Diversity. Preprint. BiorXiv. https://doi.org/10.1101/2022.11.03.515105.
- Mitra, Aditee, David A. Caron, Emile Faure, Kevin J. Flynn, Suzana Gonçalves Leles, Per J. Hansen, George B. McManus, et al. 2023. « The Mixoplankton Database – Diversity of Photo-Phago-Trophic Plankton in Form, Function and Distribution across the Global Ocean ». Journal of Eukaryotic Microbiology : e12972. https://doi.org/10.1111/jeu.12972.
Excavata - Percolomonads
- Hohlfeld, Manon, Claudia Meyer, Alexandra Schoenle, Frank Nitsche, et Hartmut Arndt. 2023. « Biogeography, Autecology, and Phylogeny of Percolomonads Based on Newly Described Species ». Journal of Eukaryotic Microbiology 70 (1): e12930. https://doi.org/10.1111/jeu.12930.
TSAR - Stramenopiles
- Barcytė, D., E...
PR2 version 4.14.1
New web interface at https://app.pr2-database.org
The database itself is unchanged. Please refer to version 4.14.0 for database text files.
PR2 version 4.14.0
Main changes
A single SSU database
From version 4.14.0, a single SSU database is provided which contains sequences for:
- 18S rRNA from nuclear and nucleomorph
- 16S rRNA from plastid, apicoplast, chromatophore, mitochondrion
- 16S rRNA from a small selection of bacteria
The rationale is that the database can now be used to detect bacterial sequences that are amplified with either 18S rRNA or "universal" primers. These sequences can be further assigned with Silva or GTDB.
In order to allow correct assignation with software such as DECIPHER (IDTax) for organelle, the taxonomy is appended with 4 letters corresponding to the organelle
Organelle | Taxonomy suffix |
---|---|
nucleus | |
nucleomorph | :nucl |
plastid | :plas |
apicoplast | :apic |
chromatophore | :chrom |
mitochondrion | :mito |
New File provided
- pr2_version_4.14.0_SSU.decipher.trained.rds: This is DECIPHER trained file that can be used with the IDTaxa function from the R DECIPHER package. This file should be read with the readRDS function.
Major groups for which taxonomy has been updated
- Apicomplexa
- Labyrinthulids
- Radiolaria
- Foraminifera
- Radiolaria
Quarantined sequences (makes sense in these COVID times...)
We are introducing sequences that have been quarantined. These sequences have been reassigned with DECIPHER IDTax but the bootstrap values were low or they have been flagged as problematic by DECIPHER during the LeaningTax phase. These sequences are not provided with the current version but will be added in the future avec verification of their taxonomic assignement.
List of sequences added or updated
- Added: 9,710
- Updated: 25,298
- Quarantined: 614
- Removed: 462
Contributors
- Daniel Vaulot - General curation, Bacteria and Archaea
- Alex Schoenle - Cafeteria
- Miguel Sandin - Radiolaria
- Raphael Morard - Forams
- Frédéric Mahé - Detecting sequences reverse-complemented or complemented
- Anna Maria Fiore-Donno - Detecting chimeras, ITS and badly assigned sequences
- Javier del Campo and EukRef team - Excavata, 16S plastid, Apicmplexa, Labyrinthulids
Taxonomic groups updated
-
Alveolata - Javier del Campo
- Apicomplexa
- 9955 sequences updated or added.
- 303 sequences quarantined needing phylogeny assignment.
- 583 taxonomy entries revised
- Apicomplexa
-
Chlorophyta
- Ostrobium : 2 sequences added
-
Stramenopiles
- Labyrinthulids - Javier del Campo
- Sequences updated or added: 1280
- Sequences quarantined: 133
- Taxonomy fully revised: 69 species
- Cafeteria - Alex Schoenle following Schoenle et al. (2020)
- sequences updated: 30
- sequences added: 31
- script
- Cafileria marina: 8 sequences added
- Labyrinthulids - Javier del Campo
-
Haptophyta
- Rappephyceae - Kawachi et al. (2021)
- Rappemonads moved into Rappephyceae
- 4 sequences added
- Rappephyceae - Kawachi et al. (2021)
-
Radiolaria - Miguel Sandin
- Total number of valid sequences: 4471
- Taxa updated or added: 232 entries
- Sequences added: 66
- Sequences updated (including new sequences): 4619
- Sequences annotated as chimera: 343
- Process and scripts to build phylogeny from Miguel
- script to upload to PR2
-
Foraminifera - Raphaël Morard
- Total number of validated sequences: 3839
- Taxonomy updated or added: 315 entries
- Sequences added: 1149
- Sequences updated (including new sequences): 2164
- script to upload to PR2
-
Excavata - Javier del Campo and EUkref team
- EUkref team: Martin Kolisko, Olga Flegontova, Anna Karnkowska, Gordon Lax, Julia M. Maritz, Tomáš Pánek, Petr Táborský, Jane M. Carlton, Ivan Cepička, Aleš Horák, Julius Lukeš, Alastair G.B. Simpson, and Vera Tai
- Total number of validated sequences: 6265
- Taxa updated or added: 735
- Sequences added from GenBank: 75
- Sequences updated (existing + new): 1347 + 2875
- Sequences quarantined: 104
- Metadata updated with eukref fields: 6091
-
16S plastid sequences (Ostreobium and Apicomplexa)- Javier del Campo
- 87 sequences reassigned
- 482 sequences added
-
Bacteria, Archaea - Daniel Vaulot
- Sequences added: 7945
- Taxa added: 1571
- These sequences originate from Silva seed alignment v. 132 as found on the mothur site
- They are used as "control" sequences when assigning metabarcodes, especially for primers that are either "universal", i.e. amplify both 18S and 16S or that are "imperfect", in the sense that they also amplify a small fraction of the 16S sequences.
Sequences uploaded but not yet annotated
- 8763 18S rRNA sequences added from GenBank - 2020-05-27 to 2021-03-23 - Script
Sequences removed
- Potential chimera in Radiolaria: 343 (M. Sandin)
- Bad sequences: 6 (F. Mahé)
- chimeras: 95 (A M Fiore-Donno)
- ITS: 20 (A M Fiore-Donno)
- Badly assigned: 6 (A M Fiore-Donno)
Sequences modified (F. Mahé)
- complemented: 26
- reverse complemented: 114 + 189
Script
Metadata added
- A large number of metadata have been downloaded from GenBank such as Genbank taxonomy and references associated with sequences.
Database structure
- pr2_main
- quarantined_version: sequences flagged as quarantined will need to be re-assigned latter.
- pr2_metadata
- gb_references: removed (empty)
- gb_locus: removed (empty)
- gb_division: addede - Three letter code for Genbank division (eg PLN, ENV...)
Metadata added
The following fields were populated from GenBank when the data were missing (413,230 records updated)
- gb_taxonomy
- gb_project
- gb_authors, gb_publication, gb_journal
- gb_sequence
- gb_division
- gb_date
Scripts
Scripts are just provided to show some of the procedures used to update the PR2 database. Do not try to run them, they will not work as they require access to the MySQL PR2 database.
References
Stramenopiles
- Schoenle, A., Hohlfeld, M., Rosse, M., Filz, P., Wylezich, C., Nitsche, F., & Arndt, H. (2020). Global comparison of bicosoecid Cafeteria-like flagellates from the deep ocean and surface waters, with reorganization of the family Cafeteriaceae. European Journal of Protistology, 73, 125665. https://doi.org/10.1016/j.ejop.2019.125665.
- Jirsová, D., Füssy, Z., Richtová, J., Gruber, A., & Oborník, M. (2019). Morphology, ultrastructure, and mitochondrial genome of the marine non-photosynthetic bicosoecid Cafileria marina gen. Et sp. Nov. Microorganisms, 7(8), 240. https://doi.org/10.3390/microorganisms7080240
- Pan, J., del Campo, J., & Keeling, P. J. (2017). Reference Tree and Environmental Sequence Diversity of Labyrinthulomycetes. Journal of Eukaryotic Microbiology, 64(1), 88–96. https://doi.org/10.1111/jeu.12342
Haptophyta
- Kawachi, M., Nakayama, T., Kayama, M., Nomura, M., Miyashita, H., Bojo, O., Rhodes, L., Sym, S., Pienaar, R. N., Probert, I., Inouye, I., & Kamikawa, R. (2021). Rappemonads are haptophyte phytoplankton. Current Biology. https://doi.org/10.1016/j.cub.2021.03.012
Radiolaria
- Adl, S. M., Bass, D., Lane, C. E., Lukeš, J., Schoch, C. L., Smirnov, A., et al. 2019. Revisions to the classification, nomenclature, and diversity of eukaryotes. J. Eukaryot. Microbiol. 66, 4–119. doi:10.1111/jeu.12691
- Biard, T., Bigeard, E., Audic, S., Poulain, J., Stemmann, L., Not, F., 2017. Biogeography and diversity of Collodaria (Radiolaria) in the global ocean. Nat. Publ. Gr. 1–42. doi:10.1038/ismej.2017.12
- Cavalier-Smith, T., Chao, E.E., Lewis, R., 2018. Multigene phylogeny and cell evolution of chromist infrakingdom Rhizaria: contrasting cell organisation of sister phyla Cercozoa and Retaria. Protoplasma 255, 1517–1574. doi:10.1007/s00709-018-1241-1
- Capella-Gutiérrez, S., Silla-Martínez, J.M., Gabaldón, T., 2009. trimAl: A tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973. doi:10.1093/bioinformatics/btp348
- Decelle, J., Suzuki, N., Mahé, F., Vargas, C. De, Not, F., 2012b. Molecular Phylogeny and Morphological Evolution of the Acantharea (Radiolaria). Protist 163, 435–450. doi:10.1016/j.protis.2011.10.002
- Gouy, M., Guindon, S., Gascuel, O., 2010. SeaView Version 4: A Multiplatform Graphical User Interface for Sequence Alignment and Phylogenetic Tree Building. Mol. Biol. Evol. 27, 221–224. doi:10.1093/molbev/msp259
- Katoh, K., Standley, D.M., 2013. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol. Biol. Evol. 30, 772–780...
PR2 version 4.13.0
Summary
List of sequences added or updated
- Added: 2966
- Updated: 933
- Removed: 3817
Contributors
- Daniel Vaulot - General curation
- Javier del Campo - Suessiales, taxonomy update
- Laure Arsenieff - Thalassiosirales
- Ana Maria Cabello - Pelagophyceae
Taxonomy curated
-
Alveolata
- Dinophyceae - Suessiales curated by J. del Campo following Janouškovec et al. (2017) and LaJeunesse et al. (2018)
- sequences updated: 498
- sequences added: 15
- script - Suessiales
- Dinophyceae - Suessiales curated by J. del Campo following Janouškovec et al. (2017) and LaJeunesse et al. (2018)
-
Stramenopiles
- Diatoms - Thalassiosirales - L. Arsenieff following Arsenieff et al. (2020)
- sequences updated: 12
- sequences added: 17
- script - Thalassiosirales
- Pelagophyceae - A. M. Cabello - definition of new environmental clades
- sequences updated: 30
- Pelagophyceae, Sarcinochrysidales - From Han et al. 2018.
- sequences added: 14
- Chrysophyceae - From Andersen et al. 2017
- sequences added: 14
- sequences updated: 144
- script - Chrysophyceae
- Diatoms - Thalassiosirales - L. Arsenieff following Arsenieff et al. (2020)
-
Chlorophyta
- Pyramimonadales replaced by Pyramimonadophyceae following Daugbjerg et al. (2019)
- New division Prasinodermophyta and new Class Prasinodermophyceae following Li et al. (2020)
Sequences added to PR2
- 1,129 18S sequences from the Roscoff Culture Collection (script - Cultures)
- 1,824 18S sequences from Silva version 138 and Genbank annotated based on hash value of sequences
Sequences uploaded but not yet annotated
-
7,032 18S rRNA sequences added from GenBank - 2018-11 to 2020-05 - script
-
333,247 18S rRNA sequences from Silva version 138 (2019-12)
Metadata
- 1,404 entries missing entries added (mostly genomes and metagenomes)
- 165,769 entries for which the Silva version 138 taxonomy has been added (silva_taxonomy) - Script for Silva addition
Sequences removed
- 3817 sequences have been removed from the database
- potential chimera
- bad sequences
- sequences containing at least 2 consecutive Ns (e.g. ...ATTNNGC..)
References
- Daugbjerg N., Fassel NMD., Moestrup Ø. 2019. Microscopy and phylogeny of Pyramimonas tatianae sp. nov. (Pyramimonadales, Chlorophyta), a scaly quadriflagellate from Golden Horn Bay (eastern Russia) and formal description of Pyramimonadophyceae classis nova . European Journal of Phycology 0:1–15. DOI: 10.1080/09670262.2019.1638524
- Janouškovec, Jan, Gregory S. Gavelis, Fabien Burki, Donna Dinh, Tsvetan R. Bachvaroff, Sebastian G. Gornik, Kelley J. Bright, et al. 2017. Major Transitions in Dinoflagellate Evolution Unveiled by Phylotranscriptomics. Proceedings of the National Academy of Sciences 114 (2): E171–80. https://doi.org/10.1073/pnas.1614842114.
- LaJeunesse, Todd C., John Everett Parkinson, Paul W. Gabrielson, Hae Jin Jeong, James Davis Reimer, Christian R. Voolstra, and Scott R. Santos. 2018. Systematic Revision of Symbiodiniaceae Highlights the Antiquity and Diversity of Coral Endosymbionts. Current Biology 28 (16): 2570-2580.e6. https://doi.org/10.1016/j.cub.2018.07.008.
- Arsenieff L., Le Gall F., Rigaut-Jalabert F., Mahé F., Sarno D., Gouhier L., Baudoux A-C., Simon N. 2020. Diversity and dynamics of relevant nanoplanktonic diatoms in the Western English Channel. The ISME Journal. DOI: 10.1038/s41396-020-0659-6.
- Han KY., Graf L., Reyes CP., Melkonian B., Andersen RA., Yoon HS., Melkonian M. 2018. A Re-investigation of Sarcinochrysis marina (Sarcinochrysidales, Pelagophyceae) from its Type Locality and the Descriptions of Arachnochrysis, Pelagospilus, Sargassococcus and Sungminbooa genera nov. Protist 169:79–106. DOI: 10.1016/j.protis.2017.12.004.
- Andersen RA., Graf L., Malakhov Y., Yoon HS. 2017. Rediscovery of the Ochromonas type species Ochromonas triangulata (Chrysophyceae) from its type locality (Lake Veysove, Donetsk region, Ukraine). Phycologia 56:591–604. DOI: 10.2216/17-15.1.
- Li L., Wang S., Wang H., Sahu SK., Marin B., Li H., Xu Y., Liang H., Li Z., Cheng S., Reder T., Çebi Z., Wittek S., Petersen M., Melkonian B., Du H., Yang H., Wang J., Wong GK., Xu X., Liu X., Van de Peer Y., Melkonian M., Liu H. 2020. The genome of Prasinoderma coloniale unveils the existence of a third phylum within green plants. Nature Ecology & Evolution. DOI: 10.1038/s41559-020-1221-7.
Database structure
-
Table pr2_metadata - add fields
- pr2_depth: depth of sample in meters
- gb_id: Genbank ID number (big integer)
- gb_project_id: Genbank project ID for metagenomes
- gb_sequence - original gb_sequence (longtext)
-
Table pr2_metadata - remove fields and move to list_countries table
- pr2_continent
- pr2_country_geocode
- pr2_country_lon
- pr2_country_lat
-
New Tables (for internal use only)
- list_countries - Table with information on each country
- pr2_country
- pr2_continent
- pr2_country_geocode
- pr2_country_lon
- pr2_country_lat
- pr2_assign_bayes - Contains assignement of uncurated sequences using dada2::AssignTaxonomy against PR2 4.12.0
- pr2_assign_silva - Contains assignement of uncurated sequences from Silva version 138
- list_countries - Table with information on each country
Scripts
Scripts (see links above) are just provided to show some of the procedures used to update the PR2 database. Do not try to run them, they will not work as they require access to the MySQL PR2 database.
Files provided
- For this version we do not provide the SQLite format. It will be provided again for relase 5.0.0
- A version of the database compatible with the DECIPHER R package is available here
- Files also available on Zenodo
PR2 version 4.12.0
Date : 2019-08-08 (updated 2019-08-17)
Contributors
- Javier del Campo - U of Miami - Apicomplexa
- Ramon Massana - ICM, Barcelona - Stramenopiles
- Daniel Vaulot - CNRS Roscoff - Metadata geolocalisation - based on a idea of Margaret Brisbin
- Johan Decelle and Daniel Vaulot - CNRS Grenoble and Roscoff - Plastid sequences (PhytoRef database)
- Chetan Gaonkar - Naples - Chaetoceros - with help from B. Edvardsen
Database structure
- Table pr2_main - add fields
- gene - 18S_RNA, 16S_RNA
- organelle - nucleus, plastid, mitochondria, nucleomorph, apicoplast (left empty for cyanobacteria)
- Table pr2_metadata - add or modify fields
- gb_organelle - import the corresponding gb field
- pr2_sequence_origin - add other possibilities such as genome and metagenome
- pr2_continent, pr2_country, pr2_country_lat, pr2_country_lon - geographical origin extracted from gb_country field
- pr2_location, pr2_location_lat, pr2_location_lon - geographical origin extracted from gb_country field.
- pr2_ocean, pr2_sea, pr2_sea_lat, pr2_sea_lon - extracted from gb_country field and gb_isolation_source
- Table pr2_sequences - add fields
- sequence_hash - hash value of the sequence (using R function
digest::sha1
, see https://cran.r-project.org/web/packages/openssl/vignettes/crypto_hashing.html)
- sequence_hash - hash value of the sequence (using R function
- Table pr2_taxonomy - add fields
- taxon_trophic_mode - detailed trophic mode (e.g. "C-fixation constitutive; Mixotroph")
Clean up
- 1692 sequences that had more than 2 consecutive "NN" have been removed
Files provided
- We are now providing separate files for 18S nuclear and 16S plastid sequences for UTAX, dada2, fasta and mothur/Qiime formats.
- The merged file contains both 18S and 16S sequences.
- The metadata file is not provided any more since metadata can be found in the merged file.
- The whole pr2 database is also provided as an SQLite file. It contains the different tables making up pr2.
Taxonomy changed
- Apicomplexa
- Taxonomy completely revised following del Campo et al. (2019)
- New sequences: 2619
- Updated sequences: 5889+239
- Removed sequences: 89
- Stramenopiles - Higher ranks changed according to Massana et al. 2014, Derelle et al 2016, and Adl et al. 2019 compiled by R. Massana.
- Diatoms - Chaetoceros - 196 new sequences have been added from Gaonkar et al. (2019) with help from B. Edvardsen
- Chlorophyta
- Mamiellophyceae - Micromonas clades have been updated according to Tragin and Vaulot 2019.
- Prasinophytes clade IX - Separation between clades IXA and IXB removed waiting for analysis.
- Cryptophyceae - Cryptomonadales moved from family to order.
- Cercozoa - Class Chlorarachniophyceae replaces Filosa-Chlorarachnea
Plastid 16S sequences and cyanos
Data originate from the PhytoRef database (Decelle et al. 2015). Taxonomy has been harmonized with the PR2 taxonomy framework. In particular going from 12 levels to 8 taxonomy levels. This integration of plastid sequences should be helpful to researchers that get metabarcodes for both 16S and 18S rRNA.
- 16S plastid sequences added: 6049
- 16S cyanobacteria sequence added: 42
Metadata
Sequence geo-localisation : Following up on the very good post of Margaret Brisbin, the geoname server (http://www.geonames.org/export/geonames-search.html) and the fuzzywuzzy Python library has been used to provide information about sequence location origin. Country and/or ocean are now provided for 90,788 GenBank entries with countries/ocean coordinates.
References
- Adl, S.M., Bass, D., Lane, C.E., Lukeš, J., Schoch, C.L., Smirnov, A., Agatha, S. et al. 2019. Revisions to the Classification, Nomenclature, and Diversity of Eukaryotes. J. Eukaryot. Microbiol. 66:4–119.
- Decelle, J., Romac, S., Stern, R.F., Bendif, E.M., Zingone, A., Audic, S., Guiry, M.D. et al. 2015. PhytoREF: a reference database of the plastidial 16S rRNA gene of photosynthetic eukaryotes with curated taxonomy. Mol. Ecol. Resour. 15:1435–1445.
- Derelle, R., López-García, P., Timpano, H. & Moreira, D. 2016. A Phylogenomic Framework to Study the Diversity and Evolution of Stramenopiles (=Heterokonts). Mol. Biol. Evol. 33:2890–8.
- Gaonkar, C.C., Piredda, R., Minucci, C., Mann, D.G., Montresor, M., Sarno, D. & Kooistra, W.H.C.F. 2018. Annotated 18S and 28S rDNA reference sequences of taxa in the planktonic diatom family Chaetocerotaceae. PLoS One. 13:e0208929.
- Massana, R., del Campo, J., Sieracki, M.E., Audic, S. & Logares, R. 2014. Exploring the uncultured microeukaryote majority in the oceans: reevaluation of ribogroups within stramenopiles. ISME J. 8:854–66.
- del Campo J., Heger T., Rodríguez-Martínez R., Worden AZ., Richards TA., Massana R., Keeling PJ. 2018. A new framework for the study of apicomplexan diversity across environments. bioRxiv 1. DOI: 10.1101/494880.
R Scripts used
PR2 version 4.11.1
Version 4.11.1
Mostly small changes and bug fixes
Date : 13 December 2018
Database changes
- Fields eukref_publication, eukref_authors, eukref_journal merged into gb_publication, gb_authors, gb_journal
- Field gb_reference added
Bug fixes
- 1 sequence duplicated removed GU824068.1.1173_U
- Sequence KT860933 shortened because end bad
PR2 version 4.11.0
Annotators : Daniel Vaulot, Adriana Lopes dos Santos, Vittorio Boscaro (Eukref)
Date : 30 October 2018
Database changes
- Database is now available as a R datafile
- Remove sequences shorter than 500 bp - 173 sequences affected
- The PR2 database can installed as an R package using the devtools package
install.packages(devtools)
devtools::install_github("vaulot/pr2database")
Bug fixes
- Correct some Genbank clone information (
gb-clone
) that were wrongly formatted as dates (e.g.07-02
was mis-labelled asjul.-02
). - Correct PR2 accession numbers, start, end and label - 1170 sequences affected
- GenBank entries with 2 PR2 sequences and different taxonomy: 4 sequences corrected and 2 removed (AF245249, AY706334, FJ848510, JF276416, KM020045, KM020071)
- R Script - PR2 update 4.11 - Management
Chlorophyta
- Orders Prasinococcales and Palmophyllales incorporated into class Palmophyllophyceae.
- Prasinophytes clade VII moved to the two newly created classes: Chloropicophyceae and Picocystophycaeae - 198 sequences updated and 104 sequences added (Lopes dos Santos et al. 2018).
- Mamiellophyceae (Micromonas, Ostreococcus, Mantoniella) - 872 sequences updated and 8 sequences added (Tragin and Vaulot 2018)
- R Script - PR2 update 4.11 - Chloropicophyceae
- R Script - PR2 update 4.11 - Mamiellophyceae
Ciliates (from Eukref)
- Follows the publication of Boscaro et al.
- Sequences annotated by Eukref as either "low quality" (125) or "chimeras" (283) have been annotated and removed from PR2.
- Sequences with taxonomy updated: 4550
- New sequences added: 2478 sequences
- Sequences removed (will need to be re-examined later) : 652
- R Script - PR2 update 4.11 - Ciliophora
References
- Lopes dos Santos, A., Pollina, T., Gourvil, P., Corre, E., Marie, D., Garrido, J.L., Rodríguez, F. et al. 2017. Chloropicophyceae, a new class of picophytoplanktonic prasinophytes. Sci. Rep. 7:14019.
- Tragin, M. & Vaulot, D. 2018. Novel diversity within marine Mamiellophyceae (Chlorophyta) unveiled by metabarcoding. bioRxiv.
- Boscaro, V., Santoferrara, L.F., Zhang, Q., Gentekaki, E., Syberg-Olsen, M.J., del Campo, J. & Keeling, P.J. 2018. EukRef-Ciliophora: A manually curated, phylogeny-based database of small subunit rRNA gene sequences of ciliates. Environ. Microbiol.
Version 4.10.0
Version 4.10.0
- 1102 PR2 sequences were longer than the corresponding GenBank sequences (see Issue #6). Some of these sequences were recovered from the original PR2 database (from 2012) while some other were reextracted from the GenBank record.
- 7 PR2 sequences that were identified as chimeras in the original PR2 database have been removed
- 21537 PR2 sequences have been labelled as reference sequences from the original PR2 database.
Version 4.9.2
Version 4.9.2
Minor changes. Two entries have been fixed:
- LC054938.1.1770_U - Taxonomy had a hard return in one the field
- KJ995958.1.1684_U - Remove space at start of the PR2 accession id
Version 4.9.1
Version 4.9.1
This is a minor update.
The taxonomy of the following three sequences has been fixed.
- FJ402948.1.1186_U
- FJ402949.1.1210_U
- LC054937.1.1751_U
The wiki has also been updated to explain how to convert files from UNIX to DOS.