Skip to content

Releases: pr2database/pr2database

PR2 version 5.0.0

06 Apr 11:16
Compare
Choose a tag to compare

Contributors

DOI

Minor updates since release 5.0.0

2023-10-23 - emu database files added

  • A database file for emu has been added: pr2_version_5.0.0_emu_db.tar.gz. Emu allows to obtain the composition of protist communities sequenced by long read technology such Nanopore or PacBio.

2023-05-15 - version 5.0.1

Assets 5.0.0 have been updated without changing the version #

  • Sequences removed: 6
  • Sequences updated: 79 (mostly Burkholderia_sp. which were wrongly labelled)

Main changes

  • Upgrade taxonomy from 8 levels to 9 levels
  • Add link to EukRibo database (Berney et al. 2022)
  • Add link to Mixoplankton Database (Mitra et al. 2023)

Major groups for which taxonomy has been updated

  • Bacteria and Archaea
  • New organelles sequences (plstids and mitochondria)
  • Stramenopiles
    • Diatoms
    • Chrysophyceae
  • Alveolates
    • Ciliates
    • Dinoflagellates
    • Perkinsea
  • Fungi
    • Chytrids
  • Amoebozoa
    * Myxogastria
  • New supergroup added: Provora

Sequences

Changes

  • Added : 5,718
  • Taxonomy Updated: 17,954
  • Quarantined/Removed: 100

Detailed changes

Taxonomy structure

  • We moved from 8 levels (kingdom to species) to 9 levels (domain to species) with a new level subdivision. The changes are explained here.

Taxonomic groups updated

Obazoa

  • Fungi
    • Microsporidia - Metchnikovellida:
      • Add 6 sequences
    • Chytrids
      • taxonomy completely revised to follow in particular Tedersoo et al. : 191 tatxa edited
      • Add 34 sequences

TSAR

  • Alveolata
    • Genus Alphamonas corrected from Aphamonas
    • Dinoflagellates
      • Suessiaceae, Borghiellaceae
      • Gonyaucales
      • Kareniaceae, Warnowiaceae
    • Ciliates
      • Spirotrichea updated to follow more closely EukRef annotations. In particular:
        • added new sequences, made some corrections, and updated names (sensu Adl et al. 2019).
          • removed these artificial groups:
            • Leegardiellidae_A and _B: replaced by Leegardiellidae
            • Strobilidiidae_A to J: replaced by Strobilidiidae
            • Strombidiidae_A to R: replaced by Strombidiidae
            • Tontoniidae_A and B: replaced by Tontoniidae
            • Strombidiida_A to H: replaced by Strombidiida
          • For tintinnids, we included both the order and suborder (Choreotrichida-Tintinnina) in the order column (best compromise, hopefully acceptable)
      • Taxonomy of following families updated
        • Discotrichidae
        • Plagiocampidae
        • Urotrichidae
        • Protocruziidae
    • Perkinsea
      • 10 sequences removed
      • 19 sequences reassigned
      • 614 new sequences added
  • Stramenopiles
    • Olisthodiscophyceae : New class added created in Barcytė et al. (2021)
    • Pseudochattonella was wrongly spelled.
    • Diatom taxonomy has been updated with the three recognized classes following Algaebase
      • Bacillariophyceae
      • Coscinodiscophyceae
      • Mediophyceae
      • Diatomea_X (class) is used for taxa that are not assigned to one of these three classes (e.g. Pheodactylum)
    • Diatoms genera Anaulus, Asterionellopsis, Ceratanaulus, Eunotogramma, Plagiogramma updated plus 2 new sequences assigned
    • Chrysophyceae taxonomy completely revised following in particular Scoble et Cavalier-Smith 2014 and Charvet et al. 2012.
      • Sequences updated: 1577
      • Sequences added: 160
      • Taxa updated: 428
      • Taxa added: 46
  • Rhizaria
    • Radiolaria
      • Spumellaria: 67 new sequences annotated

Archaeplastida

  • Picozoa
    • Are now classified within Archaeplastida.

Excavata

  • 14 Percolomonads sequences added

Amoebozoa

  • Myxogastria: taxonomy updated

Provora

  • New supergroup added

Organelles

  • 16S plastid: 252 new sequences added (some are shorter than 500 bp.)
  • 16S mitochondrion: 1818 sequences from original PR2 database
  • 18S nucleomorph: 12 sequences from original PR2 database

Bacteria, Archaea

  • Supergroup added
  • Cyanobacteria: supergroup replaced by Bacteria_X (before Terrabacteria)

Link PR2 to other databases

EukRibo database version 2.0

See Berney et al. 2022.

  • Add sequences that are not present in PR2: 510
  • Add taxa that are not present in PR2: 938
  • Update sequences taxonomy for all sequences that had no species assigned: 1257
  • Fields from EukRibo added to PR2 metadaata for 48,136 sequences
    • eukribo_UniEuk_taxonomy_string: taxonomy annotation from EukRibo (number of levels is variable)
    • eukribo_V4: Does the sequence contains the V4 region
    • eukribo_V9: Does the sequence contains the V4 region
  • See tutorial: EukRibo database

Mixoplankton database

See Mitra et al. 2023 (https://doi.org/10.1111/jeu.12972) and database at DOI 10.5281/zenodo.7560582

  • Added one column mixoplankton in the metadata
  • filter(!is.na(mixoplankton)) for mixotrophic species

WoRMS database

WoRMS database is an authoritative and comprehensive list of names of marine organisms, including information on synonymy. The content of WoRMS is controlled by taxonomic and thematic experts, not by database managers. For species in PR2 that have an entrin WoRMS we have now added a link to the AphaID (worms_id field) as well as information on the distribution of the species (maine, brackish, freshwater, terrestrial).

Sequences uploaded but not yet annotated

  • 35,884 18S rRNA sequences added from GenBank - 2021-03-23 to 2023-02-20
  • Only 17 504 of these 35 884 pass the current criteria for PR2 (length >= 500 bp etc..)

Sequences annotated automatically

  • 19 405 18S rRNA sequences from GenBank originating from strains corresponding to 2279 new species in taxonomy table

Sequences removed

  • Chimeras: 1259 from initial version of PR2
  • AY745555.1.1854_U, AY745597.1.1844_U, EF209781.1.1956_U, EF209774.1.1835_U, EF209794.1.1834_U which do not exist on Genbank anymore.

Database structure changes

R package

Scripts

These scripts are just to show some of the procedures used to update the PR2 database. Do not try to run them, they will not work as they require access to the MySQL PR2 database.

List of scripts

References

Taxonomy structure

  • Burki, Fabien, Andrew J. Roger, Matthew W. Brown, et Alastair G.B. Simpson. 2020. « The New Tree of Eukaryotes ». Trends in Ecology and Evolution 35 (1): 43‑55. https://doi.org/10.1016/j.tree.2019.08.008.

Linked databases

  • Berney, Cédric, Nicolas Henry, Frédéric Mahé, Daniel J. Richter, et Colomban de Vargas. 2022. EukRibo: A Manually Curated Eukaryotic 18S RDNA Reference Database to Facilitate Identification of New Diversity. Preprint. BiorXiv. https://doi.org/10.1101/2022.11.03.515105.
  • Mitra, Aditee, David A. Caron, Emile Faure, Kevin J. Flynn, Suzana Gonçalves Leles, Per J. Hansen, George B. McManus, et al. 2023. « The Mixoplankton Database – Diversity of Photo-Phago-Trophic Plankton in Form, Function and Distribution across the Global Ocean ». Journal of Eukaryotic Microbiology : e12972. [https://doi....
Read more

PR2 version 4.14.1

25 Nov 08:44
Compare
Choose a tag to compare

New web interface at https://app.pr2-database.org

The database itself is unchanged. Please refer to version 4.14.0 for database text files.

PR2 version 4.14.0

25 Jun 10:58
Compare
Choose a tag to compare

Main changes

A single SSU database

From version 4.14.0, a single SSU database is provided which contains sequences for:

  • 18S rRNA from nuclear and nucleomorph
  • 16S rRNA from plastid, apicoplast, chromatophore, mitochondrion
  • 16S rRNA from a small selection of bacteria

The rationale is that the database can now be used to detect bacterial sequences that are amplified with either 18S rRNA or "universal" primers. These sequences can be further assigned with Silva or GTDB.

In order to allow correct assignation with software such as DECIPHER (IDTax) for organelle, the taxonomy is appended with 4 letters corresponding to the organelle

Organelle Taxonomy suffix
nucleus
nucleomorph :nucl
plastid :plas
apicoplast :apic
chromatophore :chrom
mitochondrion :mito

New File provided

  • pr2_version_4.14.0_SSU.decipher.trained.rds: This is DECIPHER trained file that can be used with the IDTaxa function from the R DECIPHER package. This file should be read with the readRDS function.

Major groups for which taxonomy has been updated

  • Apicomplexa
  • Labyrinthulids
  • Radiolaria
  • Foraminifera
  • Radiolaria

Quarantined sequences (makes sense in these COVID times...)

We are introducing sequences that have been quarantined. These sequences have been reassigned with DECIPHER IDTax but the bootstrap values were low or they have been flagged as problematic by DECIPHER during the LeaningTax phase. These sequences are not provided with the current version but will be added in the future avec verification of their taxonomic assignement.

List of sequences added or updated

  • Added: 9,710
  • Updated: 25,298
  • Quarantined: 614
  • Removed: 462

Contributors

Taxonomic groups updated

  • Alveolata - Javier del Campo

    • Apicomplexa
      • 9955 sequences updated or added.
      • 303 sequences quarantined needing phylogeny assignment.
      • 583 taxonomy entries revised
  • Chlorophyta

    • Ostrobium : 2 sequences added
  • Stramenopiles

    • Labyrinthulids - Javier del Campo
      • Sequences updated or added: 1280
      • Sequences quarantined: 133
      • Taxonomy fully revised: 69 species
    • Cafeteria - Alex Schoenle following Schoenle et al. (2020)
      • sequences updated: 30
      • sequences added: 31
      • script
    • Cafileria marina: 8 sequences added
  • Haptophyta

    • Rappephyceae - Kawachi et al. (2021)
      • Rappemonads moved into Rappephyceae
      • 4 sequences added
  • Radiolaria - Miguel Sandin

  • Foraminifera - Raphaël Morard

    • Total number of validated sequences: 3839
    • Taxonomy updated or added: 315 entries
    • Sequences added: 1149
    • Sequences updated (including new sequences): 2164
    • script to upload to PR2
  • Excavata - Javier del Campo and EUkref team

    • EUkref team: Martin Kolisko, Olga Flegontova, Anna Karnkowska, Gordon Lax, Julia M. Maritz, Tomáš Pánek, Petr Táborský, Jane M. Carlton, Ivan Cepička, Aleš Horák, Julius Lukeš, Alastair G.B. Simpson, and Vera Tai
    • Total number of validated sequences: 6265
    • Taxa updated or added: 735
    • Sequences added from GenBank: 75
    • Sequences updated (existing + new): 1347 + 2875
    • Sequences quarantined: 104
    • Metadata updated with eukref fields: 6091
  • 16S plastid sequences (Ostreobium and Apicomplexa)- Javier del Campo

    • 87 sequences reassigned
    • 482 sequences added
  • Bacteria, Archaea - Daniel Vaulot

    • Sequences added: 7945
    • Taxa added: 1571
    • These sequences originate from Silva seed alignment v. 132 as found on the mothur site
    • They are used as "control" sequences when assigning metabarcodes, especially for primers that are either "universal", i.e. amplify both 18S and 16S or that are "imperfect", in the sense that they also amplify a small fraction of the 16S sequences.

Sequences uploaded but not yet annotated

  • 8763 18S rRNA sequences added from GenBank - 2020-05-27 to 2021-03-23 - Script

Sequences removed

  • Potential chimera in Radiolaria: 343 (M. Sandin)
  • Bad sequences: 6 (F. Mahé)
  • chimeras: 95 (A M Fiore-Donno)
  • ITS: 20 (A M Fiore-Donno)
  • Badly assigned: 6 (A M Fiore-Donno)

Sequences modified (F. Mahé)

  • complemented: 26
  • reverse complemented: 114 + 189
    Script

Metadata added

  • A large number of metadata have been downloaded from GenBank such as Genbank taxonomy and references associated with sequences.

Database structure

  • pr2_main
    • quarantined_version: sequences flagged as quarantined will need to be re-assigned latter.
  • pr2_metadata
    • gb_references: removed (empty)
    • gb_locus: removed (empty)
    • gb_division: addede - Three letter code for Genbank division (eg PLN, ENV...)

Metadata added

The following fields were populated from GenBank when the data were missing (413,230 records updated)

  • gb_taxonomy
  • gb_project
  • gb_authors, gb_publication, gb_journal
  • gb_sequence
  • gb_division
  • gb_date

Scripts

Scripts are just provided to show some of the procedures used to update the PR2 database. Do not try to run them, they will not work as they require access to the MySQL PR2 database.

References

Stramenopiles

  • Schoenle, A., Hohlfeld, M., Rosse, M., Filz, P., Wylezich, C., Nitsche, F., & Arndt, H. (2020). Global comparison of bicosoecid Cafeteria-like flagellates from the deep ocean and surface waters, with reorganization of the family Cafeteriaceae. European Journal of Protistology, 73, 125665. https://doi.org/10.1016/j.ejop.2019.125665.
  • Jirsová, D., Füssy, Z., Richtová, J., Gruber, A., & Oborník, M. (2019). Morphology, ultrastructure, and mitochondrial genome of the marine non-photosynthetic bicosoecid Cafileria marina gen. Et sp. Nov. Microorganisms, 7(8), 240. https://doi.org/10.3390/microorganisms7080240
  • Pan, J., del Campo, J., & Keeling, P. J. (2017). Reference Tree and Environmental Sequence Diversity of Labyrinthulomycetes. Journal of Eukaryotic Microbiology, 64(1), 88–96. https://doi.org/10.1111/jeu.12342

Haptophyta

  • Kawachi, M., Nakayama, T., Kayama, M., Nomura, M., Miyashita, H., Bojo, O., Rhodes, L., Sym, S., Pienaar, R. N., Probert, I., Inouye, I., & Kamikawa, R. (2021). Rappemonads are haptophyte phytoplankton. Current Biology. https://doi.org/10.1016/j.cub.2021.03.012

Radiolaria

  • Adl, S. M., Bass, D., Lane, C. E., Lukeš, J., Schoch, C. L., Smirnov, A., et al. 2019. Revisions to the classification, nomenclature, and diversity of eukaryotes. J. Eukaryot. Microbiol. 66, 4–119. doi:10.1111/jeu.12691
  • Biard, T., Bigeard, E., Audic, S., Poulain, J., Stemmann, L., Not, F., 2017. Biogeography and diversity of Collodaria (Radiolaria) in the global ocean. Nat. Publ. Gr. 1–42. doi:10.1038/ismej.2017.12
  • Cavalier-Smith, T., Chao, E.E., Lewis, R., 2018. Multigene phylogeny and cell evolution of chromist infrakingdom Rhizaria: contrasting cell organisation of sister phyla Cercozoa and Retaria. Protoplasma 255, 1517–1574. doi:10.1007/s00709-018-1241-1
  • Capella-Gutiérrez, S., Silla-Martínez, J.M., Gabaldón, T., 2009. trimAl: A tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973. doi:10.1093/bioinformatics/btp348
  • Decelle, J., Suzuki, N., Mahé, F., Vargas, C. De, Not, F., 2012b. Molecular Phylogeny and Morphological Evolution of the Acantharea (Radiolaria). Protist 163, 435–450. doi:10.1016/j.protis.2011.10.002
  • Gouy, M., Guindon, S., Gascuel, O., 2010. SeaView Version 4: A Multiplatform Graphical User Interface for Sequence Alignment and Phylogenetic Tree Building. Mol. Biol. Evol. 27, 221–224. doi:10.1093/molbev/msp259
  • Katoh, K., Standley, D.M., 2013. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol. Biol. Evol. 30, 772–780...
Read more

PR2 version 4.13.0

17 Mar 01:28
Compare
Choose a tag to compare

Summary

List of sequences added or updated

  • Added: 2966
  • Updated: 933
  • Removed: 3817

Contributors

Taxonomy curated

  • Alveolata

    • Dinophyceae - Suessiales curated by J. del Campo following Janouškovec et al. (2017) and LaJeunesse et al. (2018)
  • Stramenopiles

    • Diatoms - Thalassiosirales - L. Arsenieff following Arsenieff et al. (2020)
    • Pelagophyceae - A. M. Cabello - definition of new environmental clades
      • sequences updated: 30
    • Pelagophyceae, Sarcinochrysidales - From Han et al. 2018.
      • sequences added: 14
    • Chrysophyceae - From Andersen et al. 2017
  • Chlorophyta

    • Pyramimonadales replaced by Pyramimonadophyceae following Daugbjerg et al. (2019)
    • New division Prasinodermophyta and new Class Prasinodermophyceae following Li et al. (2020)

Sequences added to PR2

  • 1,129 18S sequences from the Roscoff Culture Collection (script - Cultures)
  • 1,824 18S sequences from Silva version 138 and Genbank annotated based on hash value of sequences

Sequences uploaded but not yet annotated

  • 7,032 18S rRNA sequences added from GenBank - 2018-11 to 2020-05 - script

  • 333,247 18S rRNA sequences from Silva version 138 (2019-12)

Metadata

Sequences removed

  • 3817 sequences have been removed from the database
    • potential chimera
    • bad sequences
    • sequences containing at least 2 consecutive Ns (e.g. ...ATTNNGC..)

References

  • Daugbjerg N., Fassel NMD., Moestrup Ø. 2019. Microscopy and phylogeny of Pyramimonas tatianae sp. nov. (Pyramimonadales, Chlorophyta), a scaly quadriflagellate from Golden Horn Bay (eastern Russia) and formal description of Pyramimonadophyceae classis nova . European Journal of Phycology 0:1–15. DOI: 10.1080/09670262.2019.1638524
  • Janouškovec, Jan, Gregory S. Gavelis, Fabien Burki, Donna Dinh, Tsvetan R. Bachvaroff, Sebastian G. Gornik, Kelley J. Bright, et al. 2017. Major Transitions in Dinoflagellate Evolution Unveiled by Phylotranscriptomics. Proceedings of the National Academy of Sciences 114 (2): E171–80. https://doi.org/10.1073/pnas.1614842114.
  • LaJeunesse, Todd C., John Everett Parkinson, Paul W. Gabrielson, Hae Jin Jeong, James Davis Reimer, Christian R. Voolstra, and Scott R. Santos. 2018. Systematic Revision of Symbiodiniaceae Highlights the Antiquity and Diversity of Coral Endosymbionts. Current Biology 28 (16): 2570-2580.e6. https://doi.org/10.1016/j.cub.2018.07.008.
  • Arsenieff L., Le Gall F., Rigaut-Jalabert F., Mahé F., Sarno D., Gouhier L., Baudoux A-C., Simon N. 2020. Diversity and dynamics of relevant nanoplanktonic diatoms in the Western English Channel. The ISME Journal. DOI: 10.1038/s41396-020-0659-6.
  • Han KY., Graf L., Reyes CP., Melkonian B., Andersen RA., Yoon HS., Melkonian M. 2018. A Re-investigation of Sarcinochrysis marina (Sarcinochrysidales, Pelagophyceae) from its Type Locality and the Descriptions of Arachnochrysis, Pelagospilus, Sargassococcus and Sungminbooa genera nov. Protist 169:79–106. DOI: 10.1016/j.protis.2017.12.004.
  • Andersen RA., Graf L., Malakhov Y., Yoon HS. 2017. Rediscovery of the Ochromonas type species Ochromonas triangulata (Chrysophyceae) from its type locality (Lake Veysove, Donetsk region, Ukraine). Phycologia 56:591–604. DOI: 10.2216/17-15.1.
  • Li L., Wang S., Wang H., Sahu SK., Marin B., Li H., Xu Y., Liang H., Li Z., Cheng S., Reder T., Çebi Z., Wittek S., Petersen M., Melkonian B., Du H., Yang H., Wang J., Wong GK., Xu X., Liu X., Van de Peer Y., Melkonian M., Liu H. 2020. The genome of Prasinoderma coloniale unveils the existence of a third phylum within green plants. Nature Ecology & Evolution. DOI: 10.1038/s41559-020-1221-7.

Database structure

  • Table pr2_metadata - add fields

    • pr2_depth: depth of sample in meters
    • gb_id: Genbank ID number (big integer)
    • gb_project_id: Genbank project ID for metagenomes
    • gb_sequence - original gb_sequence (longtext)
  • Table pr2_metadata - remove fields and move to list_countries table

    • pr2_continent
    • pr2_country_geocode
    • pr2_country_lon
    • pr2_country_lat
  • New Tables (for internal use only)

    • list_countries - Table with information on each country
      • pr2_country
      • pr2_continent
      • pr2_country_geocode
      • pr2_country_lon
      • pr2_country_lat
    • pr2_assign_bayes - Contains assignement of uncurated sequences using dada2::AssignTaxonomy against PR2 4.12.0
    • pr2_assign_silva - Contains assignement of uncurated sequences from Silva version 138

Scripts

Scripts (see links above) are just provided to show some of the procedures used to update the PR2 database. Do not try to run them, they will not work as they require access to the MySQL PR2 database.

Files provided

  • For this version we do not provide the SQLite format. It will be provided again for relase 5.0.0
  • A version of the database compatible with the DECIPHER R package is available here
  • Files also available on Zenodo

PR2 version 4.12.0

17 Aug 03:58
Compare
Choose a tag to compare

Date : 2019-08-08 (updated 2019-08-17)

Contributors

Database structure

  • Table pr2_main - add fields
    • gene - 18S_RNA, 16S_RNA
    • organelle - nucleus, plastid, mitochondria, nucleomorph, apicoplast (left empty for cyanobacteria)
  • Table pr2_metadata - add or modify fields
    • gb_organelle - import the corresponding gb field
    • pr2_sequence_origin - add other possibilities such as genome and metagenome
    • pr2_continent, pr2_country, pr2_country_lat, pr2_country_lon - geographical origin extracted from gb_country field
    • pr2_location, pr2_location_lat, pr2_location_lon - geographical origin extracted from gb_country field.
    • pr2_ocean, pr2_sea, pr2_sea_lat, pr2_sea_lon - extracted from gb_country field and gb_isolation_source
  • Table pr2_sequences - add fields
  • Table pr2_taxonomy - add fields
    • taxon_trophic_mode - detailed trophic mode (e.g. "C-fixation constitutive; Mixotroph")

Clean up

  • 1692 sequences that had more than 2 consecutive "NN" have been removed

Files provided

  • We are now providing separate files for 18S nuclear and 16S plastid sequences for UTAX, dada2, fasta and mothur/Qiime formats.
  • The merged file contains both 18S and 16S sequences.
  • The metadata file is not provided any more since metadata can be found in the merged file.
  • The whole pr2 database is also provided as an SQLite file. It contains the different tables making up pr2.

Taxonomy changed

  • Apicomplexa
    • Taxonomy completely revised following del Campo et al. (2019)
    • New sequences: 2619
    • Updated sequences: 5889+239
    • Removed sequences: 89
  • Stramenopiles - Higher ranks changed according to Massana et al. 2014, Derelle et al 2016, and Adl et al. 2019 compiled by R. Massana.
  • Diatoms - Chaetoceros - 196 new sequences have been added from Gaonkar et al. (2019) with help from B. Edvardsen
  • Chlorophyta
    • Mamiellophyceae - Micromonas clades have been updated according to Tragin and Vaulot 2019.
    • Prasinophytes clade IX - Separation between clades IXA and IXB removed waiting for analysis.
  • Cryptophyceae - Cryptomonadales moved from family to order.
  • Cercozoa - Class Chlorarachniophyceae replaces Filosa-Chlorarachnea

Plastid 16S sequences and cyanos

Data originate from the PhytoRef database (Decelle et al. 2015). Taxonomy has been harmonized with the PR2 taxonomy framework. In particular going from 12 levels to 8 taxonomy levels. This integration of plastid sequences should be helpful to researchers that get metabarcodes for both 16S and 18S rRNA.

  • 16S plastid sequences added: 6049
  • 16S cyanobacteria sequence added: 42

Metadata

Sequence geo-localisation : Following up on the very good post of Margaret Brisbin, the geoname server (http://www.geonames.org/export/geonames-search.html) and the fuzzywuzzy Python library has been used to provide information about sequence location origin. Country and/or ocean are now provided for 90,788 GenBank entries with countries/ocean coordinates.

References

R Scripts used

PR2 version 4.11.1

13 Dec 08:24
Compare
Choose a tag to compare

Version 4.11.1

Mostly small changes and bug fixes
Date : 13 December 2018

Database changes

  • Fields eukref_publication, eukref_authors, eukref_journal merged into gb_publication, gb_authors, gb_journal
  • Field gb_reference added

Bug fixes

  • 1 sequence duplicated removed GU824068.1.1173_U
  • Sequence KT860933 shortened because end bad

PR2 version 4.11.0

30 Oct 13:09
Compare
Choose a tag to compare

Annotators : Daniel Vaulot, Adriana Lopes dos Santos, Vittorio Boscaro (Eukref)
Date : 30 October 2018

Database changes

  • Database is now available as a R datafile
  • Remove sequences shorter than 500 bp - 173 sequences affected
  • The PR2 database can installed as an R package using the devtools package
install.packages(devtools)
devtools::install_github("vaulot/pr2database")

Bug fixes

  • Correct some Genbank clone information (gb-clone) that were wrongly formatted as dates (e.g. 07-02 was mis-labelled as jul.-02).
  • Correct PR2 accession numbers, start, end and label - 1170 sequences affected
  • GenBank entries with 2 PR2 sequences and different taxonomy: 4 sequences corrected and 2 removed (AF245249, AY706334, FJ848510, JF276416, KM020045, KM020071)
  • R Script - PR2 update 4.11 - Management

Chlorophyta

Ciliates (from Eukref)

  • Follows the publication of Boscaro et al.
  • Sequences annotated by Eukref as either "low quality" (125) or "chimeras" (283) have been annotated and removed from PR2.
  • Sequences with taxonomy updated: 4550
  • New sequences added: 2478 sequences
  • Sequences removed (will need to be re-examined later) : 652
  • R Script - PR2 update 4.11 - Ciliophora

References

Version 4.10.0

07 Mar 22:59
cbe0a62
Compare
Choose a tag to compare

Version 4.10.0

  • 1102 PR2 sequences were longer than the corresponding GenBank sequences (see Issue #6). Some of these sequences were recovered from the original PR2 database (from 2012) while some other were reextracted from the GenBank record.
  • 7 PR2 sequences that were identified as chimeras in the original PR2 database have been removed
  • 21537 PR2 sequences have been labelled as reference sequences from the original PR2 database.

R Script - PR2 update 4.10

Version 4.9.2

01 Mar 15:20
cbe0a62
Compare
Choose a tag to compare

Version 4.9.2

Minor changes. Two entries have been fixed:

  • LC054938.1.1770_U - Taxonomy had a hard return in one the field
  • KJ995958.1.1684_U - Remove space at start of the PR2 accession id

Version 4.9.1

27 Feb 15:55
db6d145
Compare
Choose a tag to compare

Version 4.9.1

This is a minor update.

The taxonomy of the following three sequences has been fixed.

  • FJ402948.1.1186_U
  • FJ402949.1.1210_U
  • LC054937.1.1751_U

The wiki has also been updated to explain how to convert files from UNIX to DOS.