# Summaries of clustering for Plazi.org

This notebook explores the result of clustering and focuses on the relationships among Plazi treatments and other occurrence records on GBIF.org.

This is a preliminary attempt of composing a report. Further investigation and refinements of visualisation is expected before becoming useful for Plazi.org.

All ideas, questions, corrections and suggestions are welcome.

## Relationships in Plazi.org taxonomic treatments database
Number of clustered relationships involving material citations.

In [0]:
%sql
SELECT count(*)
FROM chihjen.relationshipsf
WHERE o1:publishingOrgKey = "7ce8aef0-9e92-11dc-8738-b8a03c50a862";

count(1)
158133


## 1. Publishers have records that link to Plazi
Excluding Plazi itself where treatments cite the same specimens.

In [0]:
%sql
SELECT o2.publisher AS `Publisher`, count(*) AS `Records`
  FROM
     chihjen.relationshipsf r
       JOIN gbif.occurrence o1 on r.id1 = o1.gbifid
       JOIN gbif.occurrence o2 on r.id2 = o2.gbifid
WHERE
  o1.publishingorgkey = "7ce8aef0-9e92-11dc-8738-b8a03c50a862"
  AND o2.publishingorgkey != "7ce8aef0-9e92-11dc-8738-b8a03c50a862"
  AND r.id1 < r.id2
GROUP BY
  o2.publisher
ORDER BY
  Records DESC

Publisher,Records
Landcare Research,5897
California Academy of Sciences,5031
MNHN - Museum national d'Histoire naturelle,5020
Sam Noble Oklahoma Museum of Natural History,3224
"National Museum of Natural History, Smithsonian Institution",3196
Museu Nacional / UFRJ,3006
Purdue Entomological Research Collection,2664
University of Kansas Biodiversity Institute,2434
Swedish Museum of Natural History,2276
Senckenberg,2024


## 2. Plazi material citations within the treatment database
The same specimen that is cited in multiple treatments.

### 2.1 Count by Type Status (the same)
Where type status are the same.

In [0]:
%sql
SELECT o1.typestatus AS `Type Status`, count(*) AS count
  FROM
     chihjen.relationshipsf r
       JOIN gbif.occurrence o1 on r.id1 = o1.gbifid
       JOIN gbif.occurrence o2 on r.id2 = o2.gbifid
WHERE
  o1.publishingorgkey = "7ce8aef0-9e92-11dc-8738-b8a03c50a862"
  AND o2.publishingorgkey = "7ce8aef0-9e92-11dc-8738-b8a03c50a862"
  AND r.id1 < r.id2
  AND o1.typestatus = o2.typestatus
GROUP BY
  o1.typestatus
ORDER BY
  count DESC

Type Status,count
PARATYPE,4852
HOLOTYPE,2313
LECTOTYPE,266
SYNTYPE,118
PARALECTOTYPE,42
TYPE,15
NEOTYPE,7
ALLOTYPE,2


### 2.2 Treatments citing different types
Having reference field available might be helpful for retrieving treatments.

In [0]:
%sql
SELECT o1.gbifid, o1.typestatus, o1.occurrenceid, o2.gbifid, o2.typestatus, o2.occurrenceid
  FROM
     chihjen.relationshipsf r
       JOIN gbif.occurrence o1 on r.id1 = o1.gbifid
       JOIN gbif.occurrence o2 on r.id2 = o2.gbifid
WHERE
  o1.publishingorgkey = "7ce8aef0-9e92-11dc-8738-b8a03c50a862"
  AND o2.publishingorgkey = "7ce8aef0-9e92-11dc-8738-b8a03c50a862"
  AND r.id1 < r.id2
  AND o1.typestatus <> o2.typestatus


gbifid,typestatus,occurrenceid,gbifid.1,typestatus.1,occurrenceid.1
1288044528,HOLOTYPE,114587C9FF92FFEDFF7CA012FC43FC65.mc.29843C82FF91FFEDFF2CA733FF15FE24,3321829375,PARATYPE,03CF0539AA5CFF83FEF4FEE8FD31F939.mc.3B0EBE72AA5CFF83FD8EFA6BFEB6FA71
2236572783,ALLOTYPE,03D062208B48E01CFE9FFCAEFEAFFBD9.mc.3B11D96B8B4AE01BFB2AFACCFBB9FA49,2556156884,HOLOTYPE,03FC87AE991B444A07D2FAB1799DFBB6.mc.3B3D3CE5991A444A054AFD517903FD72
2351014802,PARATYPE,B66387BEF37FB36CFF3816B0FA2FF9B7.mc.8EA23CF5F37FB36CFDC810EEFC1EFB57,3111470303,HOLOTYPE,0380FE17FF8CFFE7FC3BB5273553FA8E.mc.3B41455CFF8CFFE4FBEFB5BF3511FAD0
1308295863,PARATYPE,3A1387D2FFB9FFB8FF41F939FAE8FA7B.mc.02D23C99FFBAFFBEFA83FE6FFC3EFE6B,1632857156,HOLOTYPE,038C8799FFD3FFAB67F107DCFC25FB6D.mc.3B4D3CD2FFD3FFAB62E60173FC25FB6D
1265799803,HOLOTYPE,03D226106C09FFBAFF74FB181F2DFAEE.mc.3B139D5B6C09FFBAFD8EFB181F29FAEE,1499613734,PARATYPE,03C987BCF45CFFA6F7B74CB461D3FE86.mc.3B083CF7F45BFFA6F3A24CBB6378FF4E
2236572784,HOLOTYPE,03D062208B48E01CFE9FFCAEFEAFFBD9.mc.3B11D96B8B4AE01BFDC7FA9CFB48FA61,2556156917,ALLOTYPE,03FC87AE991B444A07D2FAB1799DFBB6.mc.3B3D3CE5991A444A055CFD197F60FD72
1324487238,LECTOTYPE,03E487C7FFF6D221FF462536F555FEBE.mc.3B253C8CFFF6D220FF46241CF361FCB5,1457688677,PARALECTOTYPE,90078570FF862A07FF075CDD88C8FCDD.mc.A8C63E3BFF862A07FCC25DC78C7AFE0C
1836914169,HOLOTYPE,03EB87F7FFA8FF8170CCFC74AEEDF9D2.mc.3B2A3CBCFFA8FF8770CCFBC1ABF5FB32,2995583586,PARATYPE,03E7A162FFFF6851FF14E3D5FE26FACA.mc.3B261A29FFFF6851FD8DE444FC01FBEA
1700783840,PARATYPE,E20187B4FFDDFFA6338FB4E7FAC7F966.mc.DAC03CFFFFDDFFA732A2B53BFA61FE03,3028258394,HOLOTYPE,038287D4BB525D1497E00FCCFF213BD9.mc.3B433C9FBB555D1496970BFBFF253BD9
1700783850,PARATYPE,E20187B4FFDCFFA4338FB346FBC8F9AB.mc.DAC03CFFFFDFFFA533EBB4E8FA60FF51,3028258581,HOLOTYPE,038287D4BB545D1597E00BFBFEDA3E07.mc.3B433C9FBB545D1596330FA9FEDE3E07


## 3. Any clustered non-specimen that could potentially be examined for data quality issues?
LIVING_SPECIMEN, OBSERVATION and MACHINE_OBSERVATION are most likely wrong. While UNKNOWNs need to be verified first.

In [0]:
%sql
SELECT o2.basisofrecord, count(*) AS Count
  FROM
     chihjen.relationshipsf r
       JOIN gbif.occurrence o1 on r.id1 = o1.gbifid
       JOIN gbif.occurrence o2 on r.id2 = o2.gbifid
WHERE
  o1.publishingorgkey = "7ce8aef0-9e92-11dc-8738-b8a03c50a862"
  AND o2.publishingorgkey <> "7ce8aef0-9e92-11dc-8738-b8a03c50a862"
  AND r.id1 < r.id2
  AND o2.basisofrecord <> "PRESERVED_SPECIMEN"
GROUP BY
  o2.basisofrecord
ORDER BY
  Count DESC

basisofrecord,Count
MATERIAL_SAMPLE,1121
UNKNOWN,261
FOSSIL_SPECIMEN,44
MACHINE_OBSERVATION,7
OBSERVATION,4
LIVING_SPECIMEN,2


### 3.1 Who provides material sample that is associated with treatments?

In [0]:
%sql
SELECT o2.publisher, count(*) AS Count
  FROM
     chihjen.relationshipsf r
       JOIN gbif.occurrence o1 on r.id1 = o1.gbifid
       JOIN gbif.occurrence o2 on r.id2 = o2.gbifid
WHERE
  o1.publishingorgkey = "7ce8aef0-9e92-11dc-8738-b8a03c50a862"
  AND o2.publishingorgkey <> "7ce8aef0-9e92-11dc-8738-b8a03c50a862"
  AND r.id1 < r.id2
  AND o2.basisofrecord = "MATERIAL_SAMPLE"
GROUP BY
  o2.publisher
ORDER BY
  Count DESC

publisher,Count
The International Barcode of Life Consortium,798
"National Museum of Natural History, Smithsonian Institution",320
Museums Victoria,2
European Nucleotide Archive (EMBL-EBI),1


### 3.2 Occurrences that are neither preserved specimen nor material sample.
Other than fossil specimens, these occurrences should be worth looking at for data quality.

In [0]:
%sql
SELECT o2.gbifid, o2.basisofrecord, o2.datasetname, o2.publisher
  FROM
     chihjen.relationshipsf r
       JOIN gbif.occurrence o1 on r.id1 = o1.gbifid
       JOIN gbif.occurrence o2 on r.id2 = o2.gbifid
WHERE
  o1.publishingorgkey = "7ce8aef0-9e92-11dc-8738-b8a03c50a862"
  AND o2.publishingorgkey <> "7ce8aef0-9e92-11dc-8738-b8a03c50a862"
  AND r.id1 < r.id2
  AND o2.basisofrecord <> "MATERIAL_SAMPLE"
  AND o2.basisofrecord <> "PRESERVED_SPECIMEN"
ORDER BY
  o2.basisofrecord, o2.datasetname


gbifid,basisofrecord,datasetname,publisher
1826483689,FOSSIL_SPECIMEN,Canadian Museum of Nature Fossil Vertebrate Collection,Canadian Museum of Nature
1826467283,FOSSIL_SPECIMEN,Canadian Museum of Nature Fossil Vertebrate Collection,Canadian Museum of Nature
472567387,FOSSIL_SPECIMEN,Collection Bernstein - SMF,Senckenberg
1825775191,FOSSIL_SPECIMEN,Natural History Museum (London) Collection Specimens,Natural History Museum
1825775394,FOSSIL_SPECIMEN,Natural History Museum (London) Collection Specimens,Natural History Museum
2005624588,FOSSIL_SPECIMEN,Paleontological Research Institution Collections,Paleontological Research Institution
3081037399,FOSSIL_SPECIMEN,Paleontological Research Institution Collections,Paleontological Research Institution
2005624556,FOSSIL_SPECIMEN,Paleontological Research Institution Collections,Paleontological Research Institution
2005629229,FOSSIL_SPECIMEN,Paleontological Research Institution Collections,Paleontological Research Institution
2005624597,FOSSIL_SPECIMEN,Paleontological Research Institution Collections,Paleontological Research Institution


## 4. Clustered specimens that are not (yet) geo-referenced.
Ordered by dataset name and its publisher.

In [0]:
%sql
SELECT concat(o2.datasetname, " (", o2.publisher, ")") AS `Dataset - Publisher`, count(*) AS Count
  FROM
     chihjen.relationshipsf r
       JOIN gbif.occurrence o1 on r.id1 = o1.gbifid
       JOIN gbif.occurrence o2 on r.id2 = o2.gbifid
WHERE
  o1.publishingorgkey = "7ce8aef0-9e92-11dc-8738-b8a03c50a862"
  AND o2.publishingorgkey <> "7ce8aef0-9e92-11dc-8738-b8a03c50a862"
  AND r.id1 < r.id2
  AND o2.basisofrecord = "PRESERVED_SPECIMEN"
  AND (o2.decimallatitude is null OR o2.decimallongitude is null)
GROUP BY
  `Dataset - Publisher`
ORDER BY
  Count DESC

Dataset - Publisher,Count
Coleção Entomológica do Museu Nacional / UFRJ (Museu Nacional / UFRJ),2980
The Purdue Entomological Research Collection (Purdue Entomological Research Collection),2649
New Zealand Arthropod Collection (NZAC) (Landcare Research),1549
Collection Coleoptera SMF (Senckenberg),1260
Natural History Museum (London) Collection Specimens (Natural History Museum),1085
Museu de Ciències Naturals de Barcelona: MCNB-Art (Museu de Ciències Naturals de Barcelona),933
Recent Invertebrates Specimens (Sam Noble Oklahoma Museum of Natural History),825
"NMNH Extant Specimen Records (National Museum of Natural History, Smithsonian Institution)",593
Canadian Museum of Nature Insect Collection (Canadian Museum of Nature),556
NHMD Entomology Collection (Natural History Museum of Denmark),530


## 5. Entries in the treatment bank that do not have materials clustered.
Perhaps useful for publishers to prioritise in digitisation.

- For all Plazi occurrences, filter out entries that have been clustered with preserved specimen from other publishers.
- Group by type status.
- Group by year.

In [0]:
%sql
CREATE TABLE chihjen.plazi_occ STORED AS parquet AS SELECT * FROM gbif.occurrence WHERE publishingorgkey = "7ce8aef0-9e92-11dc-8738-b8a03c50a862";

In [0]:
%sql
CREATE TABLE chihjen.plazi_ext_relations STORED AS parquet AS
SELECT r.id1, r.id2, r.reasons FROM chihjen.relationshipsf r
  WHERE
    r.o2:publishingOrgKey != "7ce8aef0-9e92-11dc-8738-b8a03c50a862"
    AND r.id1 < r.id2

### 5.1 Grouped by type status.
There are 285401 plazi materials that have type status to be specified.

In [0]:
%sql
SELECT o.typestatus, count(*) AS Count
  FROM
    chihjen.plazi_occ o
    LEFT OUTER JOIN chihjen.plazi_ext_relations pr ON o.gbifId = pr.id1
WHERE
  pr.id1 iS NULL
GROUP BY
  o.typestatus
ORDER BY
  Count DESC

typestatus,Count
,285401
PARATYPE,70649
HOLOTYPE,68671
LECTOTYPE,9468
SYNTYPE,4360
PARALECTOTYPE,2717
ALLOTYPE,1727
TYPE,751
NEOTYPE,741
ISOTYPE,60


### 5.2 Grouped by year

In [0]:
%sql
SELECT o.year, count(*) AS Count
  FROM
    chihjen.plazi_occ o
    LEFT OUTER JOIN chihjen.plazi_ext_relations pr ON o.gbifId = pr.id1
WHERE
  pr.id1 iS NULL
  AND o.year is not null
GROUP BY
  o.year
ORDER BY
  year

year,Count
1600,2
1603,2
1606,6
1609,2
1616,1
1620,1
1633,1
1637,1
1638,1
1645,1


There are 158123 material citations that have year to be specified.

In [0]:
%sql
SELECT o.year, count(*) AS Count
  FROM
    chihjen.plazi_occ o
    LEFT OUTER JOIN chihjen.plazi_ext_relations pr ON o.gbifId = pr.id1
WHERE
  pr.id1 iS NULL
  AND o.year is null
GROUP BY
  o.year
ORDER BY
  year

year,Count
,158123


### 5.3 How to find out the specimen holder?
- What if the mark-up procedure can identify the depository of the voucher?

In [0]:
%sql
SELECT o.gbifid, o.typestatus, o.institutioncode, o.collectioncode
  FROM
    chihjen.plazi_occ o
    LEFT OUTER JOIN chihjen.plazi_ext_relations pr ON o.gbifId = pr.id1
WHERE
  pr.id1 iS NULL
  AND o.institutioncode is not NULL
ORDER BY
  o.institutioncode, o.collectioncode

## 6. Relations that have more than one occurrences associated with the treatment bank.

In [0]:
%sql
CREATE TABLE chihjen.plazi_multi_r STORED AS parquet AS 
SELECT o1.gbifid, count(*) AS `relations`
  FROM
     chihjen.relationshipsf r
       JOIN gbif.occurrence o1 on r.id1 = o1.gbifid
       JOIN gbif.occurrence o2 on r.id2 = o2.gbifid
WHERE
  o1.publishingorgkey = "7ce8aef0-9e92-11dc-8738-b8a03c50a862"
  AND o2.publishingorgkey != "7ce8aef0-9e92-11dc-8738-b8a03c50a862"
  AND r.id1 < r.id2
GROUP BY
  o1.gbifid
HAVING relations > 1
ORDER BY
  o1.gbifid DESC

### 6.1 Top 20 material citations in terms of clustered relationships

In [0]:
%sql
SELECT r.gbifid, o.kingdom, o.phylum, o.classkey, o.order, o.family, r.relations FROM chihjen.plazi_multi_r r
JOIN gbif.occurrence o ON o.gbifid = r.gbifid
ORDER BY relations DESC LIMIT 20;

gbifid,kingdom,phylum,classkey,order,family,relations
1671745096,Animalia,Arthropoda,216,Coleoptera,Elmidae,682
1671745127,Animalia,Arthropoda,216,Coleoptera,Elmidae,621
1671745118,Animalia,Arthropoda,216,Coleoptera,Elmidae,621
1671745052,Animalia,Arthropoda,216,Coleoptera,Elmidae,450
1671745062,Animalia,Arthropoda,216,Coleoptera,Elmidae,450
1671745094,Animalia,Arthropoda,216,Coleoptera,Elmidae,357
1632857208,Animalia,Arthropoda,216,Coleoptera,Scarabaeidae,254
1671744614,Animalia,Arthropoda,216,Coleoptera,Staphylinidae,176
1671744638,Animalia,Arthropoda,216,Coleoptera,Staphylinidae,176
1671744644,Animalia,Arthropoda,216,Coleoptera,Staphylinidae,176


#### 6.1.1 An example of a clustered group
The number of paratypes is a phenomenon among entomological collections.

In [0]:
%sql
SELECT o1.gbifid, o1.typestatus, o1.acceptedscientificname, o2.gbifid, o2.typestatus, o2.acceptedscientificname, o2.publisher
  FROM
     chihjen.relationshipsf r
       JOIN gbif.occurrence o1 on r.id1 = o1.gbifid
       JOIN gbif.occurrence o2 on r.id2 = o2.gbifid
WHERE
  o1.publishingorgkey = "7ce8aef0-9e92-11dc-8738-b8a03c50a862"
  AND o2.publishingorgkey != "7ce8aef0-9e92-11dc-8738-b8a03c50a862"
  AND r.id1 < r.id2
  AND o1.gbifid IN (1671745096)

gbifid,typestatus,acceptedscientificname,gbifid.1,typestatus.1,acceptedscientificname.1,publisher
1671745096,PARATYPE,"Hexanchorus crinitus Spangler & Santiago-Fragoso, 1992",890702568,PARATYPE,"Hexanchorus crinitus Spangler & Santiago-Fragoso, 1992",Sam Noble Oklahoma Museum of Natural History
1671745096,PARATYPE,"Hexanchorus crinitus Spangler & Santiago-Fragoso, 1992",890702596,PARATYPE,"Hexanchorus crinitus Spangler & Santiago-Fragoso, 1992",Sam Noble Oklahoma Museum of Natural History
1671745096,PARATYPE,"Hexanchorus crinitus Spangler & Santiago-Fragoso, 1992",890703511,PARATYPE,"Hexanchorus crinitus Spangler & Santiago-Fragoso, 1992",Sam Noble Oklahoma Museum of Natural History
1671745096,PARATYPE,"Hexanchorus crinitus Spangler & Santiago-Fragoso, 1992",890701876,PARATYPE,"Hexanchorus crinitus Spangler & Santiago-Fragoso, 1992",Sam Noble Oklahoma Museum of Natural History
1671745096,PARATYPE,"Hexanchorus crinitus Spangler & Santiago-Fragoso, 1992",890702300,PARATYPE,"Hexanchorus crinitus Spangler & Santiago-Fragoso, 1992",Sam Noble Oklahoma Museum of Natural History
1671745096,PARATYPE,"Hexanchorus crinitus Spangler & Santiago-Fragoso, 1992",890702044,PARATYPE,"Hexanchorus crinitus Spangler & Santiago-Fragoso, 1992",Sam Noble Oklahoma Museum of Natural History
1671745096,PARATYPE,"Hexanchorus crinitus Spangler & Santiago-Fragoso, 1992",890701862,PARATYPE,"Hexanchorus crinitus Spangler & Santiago-Fragoso, 1992",Sam Noble Oklahoma Museum of Natural History
1671745096,PARATYPE,"Hexanchorus crinitus Spangler & Santiago-Fragoso, 1992",890702478,PARATYPE,"Hexanchorus crinitus Spangler & Santiago-Fragoso, 1992",Sam Noble Oklahoma Museum of Natural History
1671745096,PARATYPE,"Hexanchorus crinitus Spangler & Santiago-Fragoso, 1992",890703306,PARATYPE,"Hexanchorus crinitus Spangler & Santiago-Fragoso, 1992",Sam Noble Oklahoma Museum of Natural History
1671745096,PARATYPE,"Hexanchorus crinitus Spangler & Santiago-Fragoso, 1992",890703286,PARATYPE,"Hexanchorus crinitus Spangler & Santiago-Fragoso, 1992",Sam Noble Oklahoma Museum of Natural History


### 6.2 Top 20 non-paratype material citation clustered groups

In [0]:
%sql
SELECT r.gbifid, o.kingdom, o.phylum, o.classkey, o.order, o.family, r.relations
FROM chihjen.plazi_multi_r r
JOIN gbif.occurrence o ON o.gbifid = r.gbifid
WHERE o.typestatus != "PARATYPE"
ORDER BY relations DESC LIMIT 20;

gbifid,kingdom,phylum,classkey,order,family,relations
1413110409,Animalia,Arthropoda,216,Hymenoptera,Apidae,54
1413110419,Animalia,Arthropoda,216,Hymenoptera,Apidae,54
1413110418,Animalia,Arthropoda,216,Hymenoptera,Apidae,45
1413110415,Animalia,Arthropoda,216,Hymenoptera,Apidae,45
3027954354,Animalia,Arthropoda,216,Hymenoptera,Formicidae,31
1058480482,Animalia,Arthropoda,216,Diptera,Mydidae,26
3027954343,Animalia,Arthropoda,216,Hymenoptera,Formicidae,22
1058481284,Animalia,Arthropoda,216,Hymenoptera,Formicidae,21
2237851564,Animalia,Arthropoda,216,Hymenoptera,Formicidae,20
1848831620,Animalia,Chordata,131,Anura,Leptodactylidae,19


#### 6.2.1 An example

In [0]:
%sql
SELECT o1.gbifid, o1.typestatus, o1.acceptedscientificname, o2.gbifid, o2.typestatus, o2.acceptedscientificname, o2.publisher
  FROM
     chihjen.relationshipsf r
       JOIN gbif.occurrence o1 on r.id1 = o1.gbifid
       JOIN gbif.occurrence o2 on r.id2 = o2.gbifid
WHERE
  o1.publishingorgkey = "7ce8aef0-9e92-11dc-8738-b8a03c50a862"
  AND o2.publishingorgkey != "7ce8aef0-9e92-11dc-8738-b8a03c50a862"
  AND r.id1 < r.id2
  AND o1.gbifid IN (2416598905)

gbifid,typestatus,acceptedscientificname,gbifid.1,typestatus.1,acceptedscientificname.1,publisher
2416598905,HOLOTYPE,"Oligaphorura kedroviensis Sun, Shveenkova, Xie & Babenko, 2019",3345627531,,"Oligaphorura kedroviensis Sun, Shveenkova, Xie & Babenko, 2019",Moscow Pedagogical State University (MPGU)
2416598905,HOLOTYPE,"Oligaphorura kedroviensis Sun, Shveenkova, Xie & Babenko, 2019",3345622478,,"Oligaphorura kedroviensis Sun, Shveenkova, Xie & Babenko, 2019",Moscow Pedagogical State University (MPGU)
2416598905,HOLOTYPE,"Oligaphorura kedroviensis Sun, Shveenkova, Xie & Babenko, 2019",3345624438,,"Oligaphorura kedroviensis Sun, Shveenkova, Xie & Babenko, 2019",Moscow Pedagogical State University (MPGU)
2416598905,HOLOTYPE,"Oligaphorura kedroviensis Sun, Shveenkova, Xie & Babenko, 2019",3345623025,,"Oligaphorura kedroviensis Sun, Shveenkova, Xie & Babenko, 2019",Moscow Pedagogical State University (MPGU)
2416598905,HOLOTYPE,"Oligaphorura kedroviensis Sun, Shveenkova, Xie & Babenko, 2019",3345621955,,"Oligaphorura kedroviensis Sun, Shveenkova, Xie & Babenko, 2019",Moscow Pedagogical State University (MPGU)
2416598905,HOLOTYPE,"Oligaphorura kedroviensis Sun, Shveenkova, Xie & Babenko, 2019",3345622220,,"Oligaphorura kedroviensis Sun, Shveenkova, Xie & Babenko, 2019",Moscow Pedagogical State University (MPGU)
2416598905,HOLOTYPE,"Oligaphorura kedroviensis Sun, Shveenkova, Xie & Babenko, 2019",3345627020,,"Oligaphorura kedroviensis Sun, Shveenkova, Xie & Babenko, 2019",Moscow Pedagogical State University (MPGU)
2416598905,HOLOTYPE,"Oligaphorura kedroviensis Sun, Shveenkova, Xie & Babenko, 2019",3345628832,,"Oligaphorura kedroviensis Sun, Shveenkova, Xie & Babenko, 2019",Moscow Pedagogical State University (MPGU)
2416598905,HOLOTYPE,"Oligaphorura kedroviensis Sun, Shveenkova, Xie & Babenko, 2019",3345626743,,"Oligaphorura kedroviensis Sun, Shveenkova, Xie & Babenko, 2019",Moscow Pedagogical State University (MPGU)
2416598905,HOLOTYPE,"Oligaphorura kedroviensis Sun, Shveenkova, Xie & Babenko, 2019",3345623847,,"Oligaphorura kedroviensis Sun, Shveenkova, Xie & Babenko, 2019",Moscow Pedagogical State University (MPGU)


## 7. Relations that might suggest different revision status between the treatment and the specimen (excluding material samples).
Or is it because of the interpretation of the GBIF backbone taxonomy?

In [0]:
%sql
SELECT o1.species, o1.acceptedscientificname, o2.species, o2.acceptedscientificname, o2.publisher 
  FROM
     chihjen.relationshipsf r
       JOIN gbif.occurrence o1 on r.id1 = o1.gbifid
       JOIN gbif.occurrence o2 on r.id2 = o2.gbifid
WHERE
  o1.publishingorgkey = "7ce8aef0-9e92-11dc-8738-b8a03c50a862"
  AND o2.publishingorgkey != "7ce8aef0-9e92-11dc-8738-b8a03c50a862"
  AND r.id1 < r.id2
  AND o2.basisofrecord != "MATERIAL_SAMPLE"
  AND o1.acceptedscientificname != o2.acceptedscientificname
GROUP BY
  o1.species, o1.acceptedscientificname, o2.species, o2.acceptedscientificname, o2.publisher



species,acceptedscientificname,species.1,acceptedscientificname.1,publisher
Acinonyx jubatus,"Acinonyx jubatus (Schreber, 1775)",Acinonyx jubatus,"Acinonyx jubatus raineyi Heller, 1913",Field Museum
Prosopocoilus antilopus,"Prosopocoilus antilopus amicorum Matsumoto, 2019",Prosopocoilus antilopus,"Prosopocoilus antilopus insulanus Kriesche, 1919",Natural History Museum
Mischogyne elliotianum,Mischogyne elliotiana var. elliotiana,Mischogyne elliotianum,Mischogyne elliotianum (Engl. & Diels) R.E.Fr.,Naturalis Biodiversity Center
Chaeropus ecaudatus,"Chaeropus ecaudatus occidentalis Gould, 1845",Chaeropus ecaudatus,"Chaeropus ecaudatus (Ogilby, 1838)",Berkeley Natural History Museums
Protoptila orotina,Protoptila orotina orotina,Protoptila orotina,"Protoptila orotina Flint, 1974",University of Minnesota Insect Collection
Mischogyne elliotianum,Mischogyne elliotiana var. elliotiana,Mischogyne elliotianum,Mischogyne elliotianum var. glabra (Keay) Evrard,Naturalis Biodiversity Center
Gossia aphthosa,Gossia aphthosa (Vieill. ex Brongn. & Gris) N.Snow,Gossia aphthosa,Gossia aphthosa subsp. austro-orientalis N.Snow & Gandhi,MNHN - Museum national d'Histoire naturelle
Giraffa camelopardalis,"Giraffa camelopardalis senegalensis Petzold, Magnant & Hassanin, 2020",Giraffa camelopardalis,"Giraffa camelopardalis (Linnaeus, 1758)",European Nucleotide Archive (EMBL-EBI)
Rhinanthus minor,Rhinanthus minor subsp. minor,Rhinanthus minor,Rhinanthus minor L.,"Botanical Garden & Museum, Natural History Museum of Denmark"
Lasionycta subfuscula,Lasionycta subfuscula subfuscula,Lasionycta subfuscula,"Lasionycta subfuscula Grote, 1873",European Nucleotide Archive (EMBL-EBI)


## 8. Miscellaneous counts.

### 8.1 Total Plazi occurrences in GBIF (458907 records).

In [0]:
%sql
SELECT count(*) FROM gbif.occurrence
WHERE publishingorgkey = "7ce8aef0-9e92-11dc-8738-b8a03c50a862";

-- for verification
-- SELECT count(*) FROM chihjen.plazi_occ;

count(1)
458907


In [0]:
Totally 57235 clustered results involving Plazi treatments, excluding reverse entries.

In [0]:
%sql
SELECT count(*) AS `relations`
  FROM
     chihjen.relationshipsf r
       JOIN gbif.occurrence o1 on r.id1 = o1.gbifid
       JOIN gbif.occurrence o2 on r.id2 = o2.gbifid
WHERE
  o1.publishingorgkey = "7ce8aef0-9e92-11dc-8738-b8a03c50a862"
  AND o2.publishingorgkey != "7ce8aef0-9e92-11dc-8738-b8a03c50a862"
  AND r.id1 < r.id2

relations
57235


## Potential follow-up activities
1. Cross check with statistics extracted from http://tb.plazi.org/GgServer/srsStats and http://tb.plazi.org/GgServer/dioStats.