-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve handling of records declaring absence data #268
Comments
Thanks - I also appreciate the conflict flags, as we are bound to have a few false positives caused by database default values and the like. |
We have an actual vocabulary for occurrenceStatus: http://rs.gbif.org/vocabulary/gbif/occurrence_status.xml |
It might be useful to have a |
This is important. If we are to refine this for occurrence data only, removing those terms that are targetting checklist use and possibly adding new ones, we need to
|
This is a horrible vocabulary for this term, because we should not be mixing up presence and absence with abundance. It would be much easier for everyone if Secondly, absence is only resolvable when there are some spatial and temporal limits. Shouldn't there be a check on eventDate, location and/or country to warn people there is an unbounded absence. Otherwise, the record sort of means it is absent everywhere and/or for all time. |
There was previous discussion about absence records in these issues:
Those are good points, Quentin. Is there a term for recording abundance? I can't see one. The vast majority of data gives present/absent, but there is some giving abundance. These are the verbatim values we have for occurrenceStatus with frequency > 1000:
|
Thanks for raising this. If you look at the data you'll also find attempts to convey things like invasive, threatened etc which would be better elsewhere too.
The suggestion to add a flag for
|
@qgroom I understand |
Exciting! I hope it will be very clear to users that absence data ARE available. Sounds like it will be but just want to make sure. The P and Q are me (from a time before I was officially in charge of OBIS-USA), I'll make sure to get those corrected. |
Completely agree with what @timrobertson100 (how to parse it + flags) and @ahahn-gbif (exclude absences from views by default) suggest. Some notes:
|
+1 for 1. - we should indeed look at organismQuantity as well
|
For the datasets I work with, this would not be a good assumption to make. Usually the individualCount is included first and the occurrenceStatus is created based on the individualCount or organismQuantity so if individualCount = 0 but occurrenceStatus = PRESENT it means something went wrong in the code to create occurrenceStatus. |
Thanks, good point, I hadn't considered that. In that case, I agree it should get the same conflict flag as @peterdesmet suggested under point 3 |
I agree, I would also prioritize
|
@peterdesmet I'm not understanding why this one would be flagged: 0 | absent* | ABSENT | OCCURRENCE_STATUS_INFERRED_FROM_INDIVIDUAL_COUNT I would think that one shouldn't get a flag since things are all in agreement? |
@albenson-usgs it's a choice 🤷♂️: behind the scenes I would always infer from |
I'm afraid I'd disagree. I would propose only inferring if required, otherwise what is the point of the field? This would be similar to how we handle others e.g. Therefore I'd suggest:
|
That's fine by me (have adapted in table above). But we still choose |
Do you mean choose how to populate the interpreted
|
In support of Tim above (I think) Normally we respect values that are there, but flag them as odd if they are in conflict with other values. E.g. I would argue we do the same for individualCount, organismQuantity and occurrenceStatus. We infer occurrenceStatus if missing, but if it is provided, we do not mess with it (despite conflicts with other fields); instead we add issue flags. |
@MortenHofft that makes sense, but does this mean that |
Yes. Just like we show null island, despite it probably being faulty data. If we consider it particular critical we can add an extra warning like we do on maps.
Here and now I like what MattBlissett mentions. Adding something similar to |
I would argue the current occurrenceStatus vocabulary is more of an abundance vocabulary than a simple boolean. More like ACFOR: https://en.wikipedia.org/wiki/Abundance_(ecology), but including doubtful, absent & excluded. I would prefer to create a new distribution status vocabulary to be used for species distribution checklists and shrink the existing occurrenceStatus one to be just present and absent like DwC suggests. Its probably also safer to change the distribution extension to point to a new vocabulary than changing the occurrence core to point to a new one. |
Quick note just to say that occurrenceStatus is a required term for OBIS and only present or absent are accepted so this falls in line with what's been outlined here. From the OBIS Manual: occurrenceStatus (required term) is a statement about the presence or absence of a taxon at a location. It is an important term, because it allows us to distinguish between presence and absence records. It is a required term and should be filled in with either present or absent. |
Ok, that is clearer (even though individualCount might have more reliable information, see #268 (comment)). I have updated my table at #268 (comment) (in italic) to reflect this decision. |
I've made the relevant changes in the GBIF schema sandbox, I think exactly as @mdoering suggests.
Is that reasonable for everyone? |
blocked by #325 |
Added new values for occurrence status search/downloads
In reviewing the code I spotted an error in the table.
Should read
We are not inferring presence from an |
I think there's a second, similar error in the table:
Should instead be:
|
* #268 Added occurrence status field, interpretation, converter and updated ES schema for it
Thanks for noticing, I have updated the table |
API and interpretation in production |
Some datasets provide evidence of species absences. While this can be a difficult area to accommodate properly as modeling effort and confidence are required, there is a lot we can do to improve the current situation where consumers are given the burden of interpreting the data shared. In some cases, consumers will not have even enough information to detect this and will use absence records as presence records.
I propose we introduce the following:
occurrenceStatus
in the occurrence search and download API and then expose it on the web site. We should review the data to determine if the current vocabulary is reasonable for the observed use in data. WhereindividualCount
states 0 we should setoccurrenceStatus = ABSENT
if it is NULL and add a flagOCCURRENCE_STATUS_INFERRED_FROM_INDIVIDUAL_COUNT
to true. IfoccurrenceStatus
is NULL we set it to PRESENT as a sensible defaultINDIVIDUAL_COUNT_CONFLICTS_WITH_OCCURRENCE_STATUS
setting it totrue
when the count is zero but the status declares it exists (could be several values) or when the count is >0 and it is declared as absent.INDIVIDUAL_COUNT_UNPARSABLE
andOCCURRENCE_STATUS_UNPARSABLE
setting them appropriately when data cannot be parsed.The text was updated successfully, but these errors were encountered: