Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Species complex from iNaturalist is mapped onto the species in GBIF #2935

Open
gbif-portal opened this issue Aug 7, 2020 · 20 comments
Open
Labels
checklistbank Fix would be in Checklistbank col Catalog of Life Under review

Comments

@gbif-portal
Copy link
Collaborator

Species complex from iNaturalist is mapped onto the species in GBIF

This observation, as an example, https://www.gbif.org/occurrence/2005406919, is at rank = species complex on iNaturalist. However, it's being considered the actual species C. melanostomus itself. It seems GBIF should add ranks for species complex rather than map them to one of the species within the complex.


User: See in registry
System: Chrome 84.0.4147 / Windows 10.0.0
Referer: https://www.gbif.org/occurrence/2005406919
Window size: width 1536 - height 722
API log
Site log
System health at time of feedback: OPERATIONAL
datasetKey: 50c9509d-22c7-4a22-a47d-8c48425ef4a7
publishingOrgKey: 28eb1a3f-1c15-4a95-931a-4af90ecb574d

@thomasstjerne
Copy link

It would be good to come up with a general recommendation of how to handle species aggregates / species complexes for publishers of occurrence data.
This is the same problem as "Collective taxa" in Dyntaxa and Super species in in the Danish fungal atlas.

These species complexes will differ across data providers and it would be difficult to include all such concepts in a global taxonomy. However, we could do a higher taxon match if the given taxon rank is recognised as a species complex e.g. "complex", "species aggregate" or similar, and then display the name as "Clusiodes melanostomus complex", "Clusiodes melanostomus group" or "Clusiodes melanostomus s. lato".

@mdoering thoughts?

@thomasstjerne thomasstjerne added checklistbank Fix would be in Checklistbank col Catalog of Life and removed bug labels Aug 10, 2020
@mdoering
Copy link
Member

Important but difficult problem. We do have a rank "species aggregate" already in GBIF:
https://www.gbif.org/species/search?rank=SPECIES_AGGREGATE&advanced=1

For example Taraxacum laevigatum aggr.

For mapping occurrences there is either the option to have all these species complex in the backbone or to map a single occurrence to multiple taxa in the backbone. Or maybe we could treat these aggregates in the backbone in a more volatile way and create/delete them as we do the occurrence matching?

@larspett
Copy link

Has this been solved? We have a few species aggregates/ species complexes that we allow volunteers to use in the Swedish Butterfly Monitoring Scheme SeBMS and that are registered in Dyntaxa but I believe they are lost when syncing the dataset with GBIF. In some cases, using the next taxonomic level makes sense, but in the case with species pairs like Leptidea juvernica/sinapus or Plebejus argus/idas, we know for sure that observations represent one of these two species. Numerically, they can be dominant so it is a pity this detailed information cannot be passed on.

@mdoering
Copy link
Member

mdoering commented Jan 20, 2021

No progress been made I am afraid. @thomasstjerne @timrobertson100 we should discuss the best way forward.

Adding species complexes to the backbone based on checklist sources seems problematic as we would very likely miss most complexes coming in from occurrences.

How do we best know about the species complex and can we understand its relation to proper species somehow? The iNat example above does not tell us anything about the complex. It is just a species name with a different rank.

@thomasstjerne
Copy link

For species complexes / aggregates, I think we could only handle how we display these records. We could flag records that are probably s. lato / complexes / aggregates and then display flag them and display the name with s. lato (= in a broad sense) appended. Example: https://www.gbif.org/occurrence/2238565917
We can identify these records from either the rank complex, species aggr, superspecies or the name if it has s. lato, sensu lato appended.

For species pairs, I like the idea of assigning two names to an occurrence, but how do we index them?
In the case of Leptidea juvernica/sinapus it would not be straight forward to be able find the occurrence in searches both Leptidea juvernica and Leptidea sinapus.

@mdoering
Copy link
Member

Makes sense and can be implemented right now without the need to know more about each species complex.
What would be a good name for such an issue? BROADER_MATCH, SENSU_LATO_MATCH, AGGREGATE_MATCH (as we use aggregate not complex in our rank enum)?

@thomasstjerne
Copy link

I would say SENSU_LATO_MATCH

And the english translation could then be "Taxon in a broad sense"

@larspett
Copy link

personally, I would prefer AGGREGATE_MATCH to avoid the need for translation of the primary term

@thomasstjerne
Copy link

@larspett All enum values on the portal are translated (english, french, russian, spanish, chinese, etc etc)

@larspett
Copy link

@thomasstjerne yes am aware of this. My preference for the English term is mainly because I think (maybe incorrectly) that it has a potential of being more widely used. I had the impression that sensu lato is more used in botany than in e.g. entomology hence my preference for the non-Latin term

@thomasstjerne
Copy link

Then lets settle with AGGREGATE_MATCH.

@mdoering
Copy link
Member

mdoering commented Jan 20, 2021

that means an addition to the GBIF API in enum NameUsageIssue & OccurrenceIssue

@mdoering
Copy link
Member

The existing matching issues actually follow a slightly different syntax.
I would therefore propose the following 2 new issues:

OccurrenceIssue.TAXON_MATCH_AGGREGATE
NameUsageIssue.BACKBONE_MATCH_AGGREGATE

Existing ones are:
TAXON_MATCH_HIGHERRANK
TAXON_MATCH_FUZZY
TAXON_MATCH_NONE

BACKBONE_MATCH_FUZZY
BACKBONE_MATCH_NONE

mdoering added a commit to gbif/gbif-api that referenced this issue Jan 20, 2021
@mdoering
Copy link
Member

gbif/gbif-api@f467884

@mdoering
Copy link
Member

The matching service does not flag issues, this needs to be done by the occurrence interpreter.
It just provides a matching type which we would have to extend with AGGREGATE:

    public static enum MatchType {
        EXACT,
        FUZZY,
        HIGHERRANK,
        NONE;

@mdoering
Copy link
Member

This is blocked by the occurrence processing issue https://github.com/gbif/occurrence/issues/229 now.

@bdagley
Copy link

bdagley commented Mar 13, 2023

I identify on iNaturalist and have added Hymenoptera complexes, and am just replying to the original question of what rank should be used. First, the rank of complex is useful, so any coarsening of those records on GBIF is a loss of information that identifiers have volunteered. Ideally if possible, additional ranks could be added to GBIF (whatever that would entail) so that they could exactly represent complex records. Although in the event that GBIF can't add those ranks currently, the next best option similar to what some proposed would be to use the parent taxon.

I noticed someone mentioned using genus for complexes, although subgenus would be better where applicable if possible. Although, I think GBIF may also lack the subgenus rank. Which if so would be even more of a priority to ideally add, although hopefully both can be added. As the main integrated taxonomy website, it would be nice and certainly fitting if GBIF could eventually expand to include more ranks. Lastly, are there other related open GBIF issues about ranks? I just created an account after speaking with the GBIF administration recently.

@mdoering
Copy link
Member

I don't think any progress has been made on this so far. Adding aggregates or s.l./str. taxa to the GBIF Backbone would be difficult and result in a rather different taxonomy with alternative concepts for the same name being part of the taxonomy. That would need to be very carefully discussed as it likely as many unforeseen consequences. Ultimately we would need such a taxonomy though to faithfully represent all existing taxon concepts that have been used in identifications. At some stage GBIF must leave a single, consistent consensus taxonomy behind. I am just not sure if we are ready for this now.

Right now we wrongly assign a regular species to what was a broader concept in the source data (aggregate, complex, super species or just s.l. remarks). Indeed it seems wise not to narrow down the concept, but instead broaden it to the next higher taxon the backbone contains. Currently that often would be the genus, but we could try to include subgenera, sections or series too in the near future with the upcoming Catalogue of Life Extended Checklist.

The matching service could refuse to match to a regular species rank when the "query" was an aggregate/complex rank. That would be rather simple to implement. I am not sure anymore what would be gained if we add a new MatchType.AGGREGATE and still link the occurrence with the regular species. For most subsequent usages of the occurrence data it would appear wrongly as the species, even though it would be shown on the occurrence details page for humans. In all other places, metrics, maps, search etc it would just be the species. I think it is better to just do a regular higher taxon match then.

@bdagley
Copy link

bdagley commented Mar 13, 2023

I mostly agree. To clarify, I actually didn't have adding s.l. or s.s. ranks to GBIF in mind at this time. As you implied, adding those would be among the most complicated. For example because they'd affect how genera and subgenera are defined or may result in the latter being defined "twice" in being represented by multiple ranks, the genus and the genus s.s. and/or s.l. ranks. I suggest in order of priority that subgenera, tribes, and subfamilies be added first, then complex/species groups, and then finally to later maybe consider if s.l. or s.s. would work, although am fine leaving those out for now and maybe indefinitely, since that might become too confusing.

A related matter which I've also recently brought up on a few other open issues is I don't think GBIF is currently using/reading subgenera correctly. For example, bumblebees (Bombus) are among the most studied pollinators although the species are placed under the genus rank with no subgenera (https://www.gbif.org/species/1340278). Yet if I do a new search for one of the subgenera, Pyrobombus, it was also separately added to GBIF, yet incorrectly as a genus of family Apidae with no species under it (https://www.gbif.org/species/4669778).

@CecSve
Copy link

CecSve commented Mar 30, 2023

#4656

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
checklistbank Fix would be in Checklistbank col Catalog of Life Under review
Projects
None yet
Development

No branches or pull requests

6 participants