Skip to content

Releases: pr2database/pr2database

PR2 version 5.0.0

06 Apr 11:16
Choose a tag to compare



Minor updates

5.0.1 - 2023-05-15

Assets 5.0.0 have been updated without changing the version #

  • Sequences removed: 6
  • Sequences updated: 79 (mostly Burkholderia_sp. which were wrongly labelled)

Main changes

  • Upgrade taxonomy from 8 levels to 9 levels
  • Add link to EukRibo database (Berney et al. 2022)
  • Add link to Mixoplankton Database (Mitra et al. 2023)

Major groups for which taxonomy has been updated

  • Bacteria and Archaea
  • New organelles sequences (plstids and mitochondria)
  • Stramenopiles
    • Diatoms
    • Chrysophyceae
  • Alveolates
    • Ciliates
    • Dinoflagellates
    • Perkinsea
  • Fungi
    • Chytrids
  • Amoebozoa
    * Myxogastria
  • New supergroup added: Provora



  • Added : 5,718
  • Taxonomy Updated: 17,954
  • Quarantined/Removed: 100

Detailed changes

Taxonomy structure

  • We moved from 8 levels (kingdom to species) to 9 levels (domain to species) with a new level subdivision. The changes are explained here.

Taxonomic groups updated


  • Fungi
    • Microsporidia - Metchnikovellida:
      • Add 6 sequences
    • Chytrids
      • taxonomy completely revised to follow in particular Tedersoo et al. : 191 tatxa edited
      • Add 34 sequences


  • Alveolata
    • Genus Alphamonas corrected from Aphamonas
    • Dinoflagellates
      • Suessiaceae, Borghiellaceae
      • Gonyaucales
      • Kareniaceae, Warnowiaceae
    • Ciliates
      • Spirotrichea updated to follow more closely EukRef annotations. In particular:
        • added new sequences, made some corrections, and updated names (sensu Adl et al. 2019).
          • removed these artificial groups:
            • Leegardiellidae_A and _B: replaced by Leegardiellidae
            • Strobilidiidae_A to J: replaced by Strobilidiidae
            • Strombidiidae_A to R: replaced by Strombidiidae
            • Tontoniidae_A and B: replaced by Tontoniidae
            • Strombidiida_A to H: replaced by Strombidiida
          • For tintinnids, we included both the order and suborder (Choreotrichida-Tintinnina) in the order column (best compromise, hopefully acceptable)
      • Taxonomy of following families updated
        • Discotrichidae
        • Plagiocampidae
        • Urotrichidae
        • Protocruziidae
    • Perkinsea
      • 10 sequences removed
      • 19 sequences reassigned
      • 614 new sequences added
  • Stramenopiles
    • Olisthodiscophyceae : New class added created in Barcytė et al. (2021)
    • Pseudochattonella was wrongly spelled.
    • Diatom taxonomy has been updated with the three recognized classes following Algaebase
      • Bacilliarophyceae
      • Coscinodiscophyceae
      • Mediophyceae
      • Diatomea_X (class) is used for taxa that are not assigned to one of these three classes (e.g. Pheodactylum)
    • Diatoms genera Anaulus, Asterionellopsis, Ceratanaulus, Eunotogramma, Plagiogramma updated plus 2 new sequences assigned
    • Chrysophyceae taxonomy completely revised following in particular Scoble et Cavalier-Smith 2014 and Charvet et al. 2012.
      • Sequences updated: 1577
      • Sequences added: 160
      • Taxa updated: 428
      • Taxa added: 46
  • Rhizaria
    • Radiolaria
      • Spumellaria: 67 new sequences annotated


  • Picozoa
    • Are now classified within Archaeplastida.


  • 14 Percolomonads sequences added


  • Myxogastria: taxonomy updated


  • New supergroup added


  • 16S plastid: 252 new sequences added (some are shorter than 500 bp.)
  • 16S mitochondrion: 1818 sequences from original PR2 database
  • 18S nucleomorph: 12 sequences from original PR2 database

Bacteria, Archaea

  • Supergroup added
  • Cyanobacteria: supergroup replaced by Bacteria_X (before Terrabacteria)

Link PR2 to other databases

EukRibo database version 2.0

See Berney et al. 2022.

  • Add sequences that are not present in PR2: 510
  • Add taxa that are not present in PR2: 938
  • Update sequences taxonomy for all sequences that had no species assigned: 1257
  • Fields from EukRibo added to PR2 metadaata for 48,136 sequences
    • eukribo_UniEuk_taxonomy_string: taxonomy annotation from EukRibo (number of levels is variable)
    • eukribo_V4: Does the sequence contains the V4 region
    • eukribo_V9: Does the sequence contains the V4 region
  • See tutorial: EukRibo database

Mixoplankton database

See Mitra et al. 2023 ( and database at DOI 10.5281/zenodo.7560582

  • Added one column mixoplankton in the metadata
  • filter(! for mixotrophic species

WoRMS database

WoRMS database is an authoritative and comprehensive list of names of marine organisms, including information on synonymy. The content of WoRMS is controlled by taxonomic and thematic experts, not by database managers. For species in PR2 that have an entrin WoRMS we have now added a link to the AphaID (worms_id field) as well as information on the distribution of the species (maine, brackish, freshwater, terrestrial).

Sequences uploaded but not yet annotated

  • 35,884 18S rRNA sequences added from GenBank - 2021-03-23 to 2023-02-20
  • Only 17 504 of these 35 884 pass the current criteria for PR2 (length >= 500 bp etc..)

Sequences annotated automatically

  • 19 405 18S rRNA sequences from GenBank originating from strains corresponding to 2279 new species in taxonomy table

Sequences removed

  • Chimeras: 1259 from initial version of PR2
  • AY745555.1.1854_U, AY745597.1.1844_U, EF209781.1.1956_U, EF209774.1.1835_U, EF209794.1.1834_U which do not exist on Genbank anymore.

Database structure changes

R package


These scripts are just to show some of the procedures used to update the PR2 database. Do not try to run them, they will not work as they require access to the MySQL PR2 database.

List of scripts


Taxonomy structure

  • Burki, Fabien, Andrew J. Roger, Matthew W. Brown, et Alastair G.B. Simpson. 2020. « The New Tree of Eukaryotes ». Trends in Ecology and Evolution 35 (1): 43‑55.

Linked databases

  • Berney, Cédric, Nicolas Henry, Frédéric Mahé, Daniel J. Richter, et Colomban de Vargas. 2022. EukRibo: A Manually Curated Eukaryotic 18S RDNA Reference Database to Facilitate Identification of New Diversity. Preprint. BiorXiv.
  • Mitra, Aditee, David A. Caron, Emile Faure, Kevin J. Flynn, Suzana Gonçalves Leles, Per J. Hansen, George B. McManus, et al. 2023. « The Mixoplankton Database – Diversity of Photo-Phago-Trophic Plankton in Form, Function and Distribution across the Global Ocean ». Journal of Eukaryotic Microbiology : e12972.

Excavata - Percolomonads

  • Hohlfeld, Manon, Claudia Meyer, Alexandra Schoenle, Frank Nitsche, et Hartmut Arndt. 2023. « Biogeography, Autecology, and Phylogeny of Percolomonads Based on Newly Described Species ». Journal of Eukaryotic Microbiology 70 (1): e12930.

TSAR - Stramenopiles

  • Barcytė, D., E...

PR2 version 4.14.1

25 Nov 08:44
Choose a tag to compare

New web interface at

The database itself is unchanged. Please refer to version 4.14.0 for database text files.

PR2 version 4.14.0

25 Jun 10:58
Choose a tag to compare

Main changes

A single SSU database

From version 4.14.0, a single SSU database is provided which contains sequences for:

  • 18S rRNA from nuclear and nucleomorph
  • 16S rRNA from plastid, apicoplast, chromatophore, mitochondrion
  • 16S rRNA from a small selection of bacteria

The rationale is that the database can now be used to detect bacterial sequences that are amplified with either 18S rRNA or "universal" primers. These sequences can be further assigned with Silva or GTDB.

In order to allow correct assignation with software such as DECIPHER (IDTax) for organelle, the taxonomy is appended with 4 letters corresponding to the organelle

Organelle Taxonomy suffix
nucleomorph :nucl
plastid :plas
apicoplast :apic
chromatophore :chrom
mitochondrion :mito

New File provided

  • pr2_version_4.14.0_SSU.decipher.trained.rds: This is DECIPHER trained file that can be used with the IDTaxa function from the R DECIPHER package. This file should be read with the readRDS function.

Major groups for which taxonomy has been updated

  • Apicomplexa
  • Labyrinthulids
  • Radiolaria
  • Foraminifera
  • Radiolaria

Quarantined sequences (makes sense in these COVID times...)

We are introducing sequences that have been quarantined. These sequences have been reassigned with DECIPHER IDTax but the bootstrap values were low or they have been flagged as problematic by DECIPHER during the LeaningTax phase. These sequences are not provided with the current version but will be added in the future avec verification of their taxonomic assignement.

List of sequences added or updated

  • Added: 9,710
  • Updated: 25,298
  • Quarantined: 614
  • Removed: 462


Taxonomic groups updated

  • Alveolata - Javier del Campo

    • Apicomplexa
      • 9955 sequences updated or added.
      • 303 sequences quarantined needing phylogeny assignment.
      • 583 taxonomy entries revised
  • Chlorophyta

    • Ostrobium : 2 sequences added
  • Stramenopiles

    • Labyrinthulids - Javier del Campo
      • Sequences updated or added: 1280
      • Sequences quarantined: 133
      • Taxonomy fully revised: 69 species
    • Cafeteria - Alex Schoenle following Schoenle et al. (2020)
      • sequences updated: 30
      • sequences added: 31
      • script
    • Cafileria marina: 8 sequences added
  • Haptophyta

    • Rappephyceae - Kawachi et al. (2021)
      • Rappemonads moved into Rappephyceae
      • 4 sequences added
  • Radiolaria - Miguel Sandin

  • Foraminifera - Raphaël Morard

    • Total number of validated sequences: 3839
    • Taxonomy updated or added: 315 entries
    • Sequences added: 1149
    • Sequences updated (including new sequences): 2164
    • script to upload to PR2
  • Excavata - Javier del Campo and EUkref team

    • EUkref team: Martin Kolisko, Olga Flegontova, Anna Karnkowska, Gordon Lax, Julia M. Maritz, Tomáš Pánek, Petr Táborský, Jane M. Carlton, Ivan Cepička, Aleš Horák, Julius Lukeš, Alastair G.B. Simpson, and Vera Tai
    • Total number of validated sequences: 6265
    • Taxa updated or added: 735
    • Sequences added from GenBank: 75
    • Sequences updated (existing + new): 1347 + 2875
    • Sequences quarantined: 104
    • Metadata updated with eukref fields: 6091
  • 16S plastid sequences (Ostreobium and Apicomplexa)- Javier del Campo

    • 87 sequences reassigned
    • 482 sequences added
  • Bacteria, Archaea - Daniel Vaulot

    • Sequences added: 7945
    • Taxa added: 1571
    • These sequences originate from Silva seed alignment v. 132 as found on the mothur site
    • They are used as "control" sequences when assigning metabarcodes, especially for primers that are either "universal", i.e. amplify both 18S and 16S or that are "imperfect", in the sense that they also amplify a small fraction of the 16S sequences.

Sequences uploaded but not yet annotated

  • 8763 18S rRNA sequences added from GenBank - 2020-05-27 to 2021-03-23 - Script

Sequences removed

  • Potential chimera in Radiolaria: 343 (M. Sandin)
  • Bad sequences: 6 (F. Mahé)
  • chimeras: 95 (A M Fiore-Donno)
  • ITS: 20 (A M Fiore-Donno)
  • Badly assigned: 6 (A M Fiore-Donno)

Sequences modified (F. Mahé)

  • complemented: 26
  • reverse complemented: 114 + 189

Metadata added

  • A large number of metadata have been downloaded from GenBank such as Genbank taxonomy and references associated with sequences.

Database structure

  • pr2_main
    • quarantined_version: sequences flagged as quarantined will need to be re-assigned latter.
  • pr2_metadata
    • gb_references: removed (empty)
    • gb_locus: removed (empty)
    • gb_division: addede - Three letter code for Genbank division (eg PLN, ENV...)

Metadata added

The following fields were populated from GenBank when the data were missing (413,230 records updated)

  • gb_taxonomy
  • gb_project
  • gb_authors, gb_publication, gb_journal
  • gb_sequence
  • gb_division
  • gb_date


Scripts are just provided to show some of the procedures used to update the PR2 database. Do not try to run them, they will not work as they require access to the MySQL PR2 database.



  • Schoenle, A., Hohlfeld, M., Rosse, M., Filz, P., Wylezich, C., Nitsche, F., & Arndt, H. (2020). Global comparison of bicosoecid Cafeteria-like flagellates from the deep ocean and surface waters, with reorganization of the family Cafeteriaceae. European Journal of Protistology, 73, 125665.
  • Jirsová, D., Füssy, Z., Richtová, J., Gruber, A., & Oborník, M. (2019). Morphology, ultrastructure, and mitochondrial genome of the marine non-photosynthetic bicosoecid Cafileria marina gen. Et sp. Nov. Microorganisms, 7(8), 240.
  • Pan, J., del Campo, J., & Keeling, P. J. (2017). Reference Tree and Environmental Sequence Diversity of Labyrinthulomycetes. Journal of Eukaryotic Microbiology, 64(1), 88–96.


  • Kawachi, M., Nakayama, T., Kayama, M., Nomura, M., Miyashita, H., Bojo, O., Rhodes, L., Sym, S., Pienaar, R. N., Probert, I., Inouye, I., & Kamikawa, R. (2021). Rappemonads are haptophyte phytoplankton. Current Biology.


  • Adl, S. M., Bass, D., Lane, C. E., Lukeš, J., Schoch, C. L., Smirnov, A., et al. 2019. Revisions to the classification, nomenclature, and diversity of eukaryotes. J. Eukaryot. Microbiol. 66, 4–119. doi:10.1111/jeu.12691
  • Biard, T., Bigeard, E., Audic, S., Poulain, J., Stemmann, L., Not, F., 2017. Biogeography and diversity of Collodaria (Radiolaria) in the global ocean. Nat. Publ. Gr. 1–42. doi:10.1038/ismej.2017.12
  • Cavalier-Smith, T., Chao, E.E., Lewis, R., 2018. Multigene phylogeny and cell evolution of chromist infrakingdom Rhizaria: contrasting cell organisation of sister phyla Cercozoa and Retaria. Protoplasma 255, 1517–1574. doi:10.1007/s00709-018-1241-1
  • Capella-Gutiérrez, S., Silla-Martínez, J.M., Gabaldón, T., 2009. trimAl: A tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973. doi:10.1093/bioinformatics/btp348
  • Decelle, J., Suzuki, N., Mahé, F., Vargas, C. De, Not, F., 2012b. Molecular Phylogeny and Morphological Evolution of the Acantharea (Radiolaria). Protist 163, 435–450. doi:10.1016/j.protis.2011.10.002
  • Gouy, M., Guindon, S., Gascuel, O., 2010. SeaView Version 4: A Multiplatform Graphical User Interface for Sequence Alignment and Phylogenetic Tree Building. Mol. Biol. Evol. 27, 221–224. doi:10.1093/molbev/msp259
  • Katoh, K., Standley, D.M., 2013. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol. Biol. Evol. 30, 772–780...

PR2 version 4.13.0

17 Mar 01:28
Choose a tag to compare


List of sequences added or updated

  • Added: 2966
  • Updated: 933
  • Removed: 3817


Taxonomy curated

  • Alveolata

    • Dinophyceae - Suessiales curated by J. del Campo following Janouškovec et al. (2017) and LaJeunesse et al. (2018)
  • Stramenopiles

    • Diatoms - Thalassiosirales - L. Arsenieff following Arsenieff et al. (2020)
    • Pelagophyceae - A. M. Cabello - definition of new environmental clades
      • sequences updated: 30
    • Pelagophyceae, Sarcinochrysidales - From Han et al. 2018.
      • sequences added: 14
    • Chrysophyceae - From Andersen et al. 2017
  • Chlorophyta

    • Pyramimonadales replaced by Pyramimonadophyceae following Daugbjerg et al. (2019)
    • New division Prasinodermophyta and new Class Prasinodermophyceae following Li et al. (2020)

Sequences added to PR2

  • 1,129 18S sequences from the Roscoff Culture Collection (script - Cultures)
  • 1,824 18S sequences from Silva version 138 and Genbank annotated based on hash value of sequences

Sequences uploaded but not yet annotated

  • 7,032 18S rRNA sequences added from GenBank - 2018-11 to 2020-05 - script

  • 333,247 18S rRNA sequences from Silva version 138 (2019-12)


Sequences removed

  • 3817 sequences have been removed from the database
    • potential chimera
    • bad sequences
    • sequences containing at least 2 consecutive Ns (e.g. ...ATTNNGC..)


  • Daugbjerg N., Fassel NMD., Moestrup Ø. 2019. Microscopy and phylogeny of Pyramimonas tatianae sp. nov. (Pyramimonadales, Chlorophyta), a scaly quadriflagellate from Golden Horn Bay (eastern Russia) and formal description of Pyramimonadophyceae classis nova . European Journal of Phycology 0:1–15. DOI: 10.1080/09670262.2019.1638524
  • Janouškovec, Jan, Gregory S. Gavelis, Fabien Burki, Donna Dinh, Tsvetan R. Bachvaroff, Sebastian G. Gornik, Kelley J. Bright, et al. 2017. Major Transitions in Dinoflagellate Evolution Unveiled by Phylotranscriptomics. Proceedings of the National Academy of Sciences 114 (2): E171–80.
  • LaJeunesse, Todd C., John Everett Parkinson, Paul W. Gabrielson, Hae Jin Jeong, James Davis Reimer, Christian R. Voolstra, and Scott R. Santos. 2018. Systematic Revision of Symbiodiniaceae Highlights the Antiquity and Diversity of Coral Endosymbionts. Current Biology 28 (16): 2570-2580.e6.
  • Arsenieff L., Le Gall F., Rigaut-Jalabert F., Mahé F., Sarno D., Gouhier L., Baudoux A-C., Simon N. 2020. Diversity and dynamics of relevant nanoplanktonic diatoms in the Western English Channel. The ISME Journal. DOI: 10.1038/s41396-020-0659-6.
  • Han KY., Graf L., Reyes CP., Melkonian B., Andersen RA., Yoon HS., Melkonian M. 2018. A Re-investigation of Sarcinochrysis marina (Sarcinochrysidales, Pelagophyceae) from its Type Locality and the Descriptions of Arachnochrysis, Pelagospilus, Sargassococcus and Sungminbooa genera nov. Protist 169:79–106. DOI: 10.1016/j.protis.2017.12.004.
  • Andersen RA., Graf L., Malakhov Y., Yoon HS. 2017. Rediscovery of the Ochromonas type species Ochromonas triangulata (Chrysophyceae) from its type locality (Lake Veysove, Donetsk region, Ukraine). Phycologia 56:591–604. DOI: 10.2216/17-15.1.
  • Li L., Wang S., Wang H., Sahu SK., Marin B., Li H., Xu Y., Liang H., Li Z., Cheng S., Reder T., Çebi Z., Wittek S., Petersen M., Melkonian B., Du H., Yang H., Wang J., Wong GK., Xu X., Liu X., Van de Peer Y., Melkonian M., Liu H. 2020. The genome of Prasinoderma coloniale unveils the existence of a third phylum within green plants. Nature Ecology & Evolution. DOI: 10.1038/s41559-020-1221-7.

Database structure

  • Table pr2_metadata - add fields

    • pr2_depth: depth of sample in meters
    • gb_id: Genbank ID number (big integer)
    • gb_project_id: Genbank project ID for metagenomes
    • gb_sequence - original gb_sequence (longtext)
  • Table pr2_metadata - remove fields and move to list_countries table

    • pr2_continent
    • pr2_country_geocode
    • pr2_country_lon
    • pr2_country_lat
  • New Tables (for internal use only)

    • list_countries - Table with information on each country
      • pr2_country
      • pr2_continent
      • pr2_country_geocode
      • pr2_country_lon
      • pr2_country_lat
    • pr2_assign_bayes - Contains assignement of uncurated sequences using dada2::AssignTaxonomy against PR2 4.12.0
    • pr2_assign_silva - Contains assignement of uncurated sequences from Silva version 138


Scripts (see links above) are just provided to show some of the procedures used to update the PR2 database. Do not try to run them, they will not work as they require access to the MySQL PR2 database.

Files provided

  • For this version we do not provide the SQLite format. It will be provided again for relase 5.0.0
  • A version of the database compatible with the DECIPHER R package is available here
  • Files also available on Zenodo

PR2 version 4.12.0

17 Aug 03:58
Choose a tag to compare

Date : 2019-08-08 (updated 2019-08-17)


Database structure

  • Table pr2_main - add fields
    • gene - 18S_RNA, 16S_RNA
    • organelle - nucleus, plastid, mitochondria, nucleomorph, apicoplast (left empty for cyanobacteria)
  • Table pr2_metadata - add or modify fields
    • gb_organelle - import the corresponding gb field
    • pr2_sequence_origin - add other possibilities such as genome and metagenome
    • pr2_continent, pr2_country, pr2_country_lat, pr2_country_lon - geographical origin extracted from gb_country field
    • pr2_location, pr2_location_lat, pr2_location_lon - geographical origin extracted from gb_country field.
    • pr2_ocean, pr2_sea, pr2_sea_lat, pr2_sea_lon - extracted from gb_country field and gb_isolation_source
  • Table pr2_sequences - add fields
  • Table pr2_taxonomy - add fields
    • taxon_trophic_mode - detailed trophic mode (e.g. "C-fixation constitutive; Mixotroph")

Clean up

  • 1692 sequences that had more than 2 consecutive "NN" have been removed

Files provided

  • We are now providing separate files for 18S nuclear and 16S plastid sequences for UTAX, dada2, fasta and mothur/Qiime formats.
  • The merged file contains both 18S and 16S sequences.
  • The metadata file is not provided any more since metadata can be found in the merged file.
  • The whole pr2 database is also provided as an SQLite file. It contains the different tables making up pr2.

Taxonomy changed

  • Apicomplexa
    • Taxonomy completely revised following del Campo et al. (2019)
    • New sequences: 2619
    • Updated sequences: 5889+239
    • Removed sequences: 89
  • Stramenopiles - Higher ranks changed according to Massana et al. 2014, Derelle et al 2016, and Adl et al. 2019 compiled by R. Massana.
  • Diatoms - Chaetoceros - 196 new sequences have been added from Gaonkar et al. (2019) with help from B. Edvardsen
  • Chlorophyta
    • Mamiellophyceae - Micromonas clades have been updated according to Tragin and Vaulot 2019.
    • Prasinophytes clade IX - Separation between clades IXA and IXB removed waiting for analysis.
  • Cryptophyceae - Cryptomonadales moved from family to order.
  • Cercozoa - Class Chlorarachniophyceae replaces Filosa-Chlorarachnea

Plastid 16S sequences and cyanos

Data originate from the PhytoRef database (Decelle et al. 2015). Taxonomy has been harmonized with the PR2 taxonomy framework. In particular going from 12 levels to 8 taxonomy levels. This integration of plastid sequences should be helpful to researchers that get metabarcodes for both 16S and 18S rRNA.

  • 16S plastid sequences added: 6049
  • 16S cyanobacteria sequence added: 42


Sequence geo-localisation : Following up on the very good post of Margaret Brisbin, the geoname server ( and the fuzzywuzzy Python library has been used to provide information about sequence location origin. Country and/or ocean are now provided for 90,788 GenBank entries with countries/ocean coordinates.


R Scripts used

PR2 version 4.11.1

13 Dec 08:24
Choose a tag to compare

Version 4.11.1

Mostly small changes and bug fixes
Date : 13 December 2018

Database changes

  • Fields eukref_publication, eukref_authors, eukref_journal merged into gb_publication, gb_authors, gb_journal
  • Field gb_reference added

Bug fixes

  • 1 sequence duplicated removed GU824068.1.1173_U
  • Sequence KT860933 shortened because end bad

PR2 version 4.11.0

30 Oct 13:09
Choose a tag to compare

Annotators : Daniel Vaulot, Adriana Lopes dos Santos, Vittorio Boscaro (Eukref)
Date : 30 October 2018

Database changes

  • Database is now available as a R datafile
  • Remove sequences shorter than 500 bp - 173 sequences affected
  • The PR2 database can installed as an R package using the devtools package

Bug fixes

  • Correct some Genbank clone information (gb-clone) that were wrongly formatted as dates (e.g. 07-02 was mis-labelled as jul.-02).
  • Correct PR2 accession numbers, start, end and label - 1170 sequences affected
  • GenBank entries with 2 PR2 sequences and different taxonomy: 4 sequences corrected and 2 removed (AF245249, AY706334, FJ848510, JF276416, KM020045, KM020071)
  • R Script - PR2 update 4.11 - Management


Ciliates (from Eukref)

  • Follows the publication of Boscaro et al.
  • Sequences annotated by Eukref as either "low quality" (125) or "chimeras" (283) have been annotated and removed from PR2.
  • Sequences with taxonomy updated: 4550
  • New sequences added: 2478 sequences
  • Sequences removed (will need to be re-examined later) : 652
  • R Script - PR2 update 4.11 - Ciliophora


Version 4.10.0

07 Mar 22:59
Choose a tag to compare

Version 4.10.0

  • 1102 PR2 sequences were longer than the corresponding GenBank sequences (see Issue #6). Some of these sequences were recovered from the original PR2 database (from 2012) while some other were reextracted from the GenBank record.
  • 7 PR2 sequences that were identified as chimeras in the original PR2 database have been removed
  • 21537 PR2 sequences have been labelled as reference sequences from the original PR2 database.

R Script - PR2 update 4.10

Version 4.9.2

01 Mar 15:20
Choose a tag to compare

Version 4.9.2

Minor changes. Two entries have been fixed:

  • LC054938.1.1770_U - Taxonomy had a hard return in one the field
  • KJ995958.1.1684_U - Remove space at start of the PR2 accession id

Version 4.9.1

27 Feb 15:55
Choose a tag to compare

Version 4.9.1

This is a minor update.

The taxonomy of the following three sequences has been fixed.

  • FJ402948.1.1186_U
  • FJ402949.1.1210_U
  • LC054937.1.1751_U

The wiki has also been updated to explain how to convert files from UNIX to DOS.