Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FAQ: How do I publish absences? #1746

Closed
albenson-usgs opened this issue Feb 23, 2022 · 12 comments
Closed

FAQ: How do I publish absences? #1746

albenson-usgs opened this issue Feb 23, 2022 · 12 comments

Comments

@albenson-usgs
Copy link
Contributor

Q. How do I publish absence data?
A. Step 1: Include sampling event records even if the sampling yielded no derived species occurrences. This allows species absences to be inferred.

I wanted to raise the issue that following the directions in the IPT manual for this FAQ would not always be appropriate for datasets shared via the OBIS community. As noted in the paper published about OBIS-ENV-DATA https://bdj.pensoft.net/article/10989/element/2/3386095// there will be events with no occurrences because sometimes the abiotic sampling is at a greater frequency than the species observations. These should not be interpreted as absences.

@muttcg
Copy link
Member

muttcg commented Feb 24, 2022

Hi @albenson-usgs
I can share the issue from interpretation point how we identify absent data, please read this issue gbif/pipelines#268

MattBlissett added a commit that referenced this issue Feb 24, 2022
@MattBlissett
Copy link
Member

@ahahn-gbif
Copy link
Contributor

Occurrence records marked as absences would not be an issue, I agree. The tricky part would be in the interpretation of sampling events without occurrences as a bundle of absence records. However, this should never happen in isolation ("we did not find any organisms"), but ideally be interpreted against a timestamped checklist of the taxa that could have been expected. I may have overlooked something, but I do not think that GBIF, so far, makes these inferences during ingestion - the recommendations are on the publication process alone.

From the ingestion perspective: if GBIF do not want to index events not related to biological sampling (which I assume is true?), but published within the same dataset that also publishes sampling events on organisms, we would need to identify or discuss a level of content standardization that allows to tell different types of sampling event records apart. I am not entirely sure whether we do receive such mixed datasets from our partners; an example would help here - would you be able to point us at a dataset, @@albenson-usgs?

The most straightforward solution so far would be to publish organism-related and non-biological sampling parameters in separate datasets. Darwin Core does not offer a dedicated "sampling event type" filter with a reliable, standardized vocabulary to recognize those. The samplingProtocol would be the closest we can get, but content received is completely unstandardized so far.

Thanks for bringing this up, we will need to consider this in the context of absence evaluation from sampling events. Likely something along the lines of "if no corresponding taxon checklist is provided, do not interpret absences".

@albenson-usgs
Copy link
Contributor Author

albenson-usgs commented Feb 24, 2022

An example dataset is here. Just published a few minutes ago. There are 125716 events with no occurrences (17118 events with occurrences- absences are explicit in the occurrence table using occurrenceStatus).

The slight problem is that it needs to be "if no corresponding taxon checklist is provided, do not always interpret absences."

I know that for instance the recently published NEON tick dataset it is ok to infer absences from events with no occurrences.

@mike-podolskiy90
Copy link
Contributor

Is this still relevant please?

@ahahn-gbif
Copy link
Contributor

I may be confused here. Given that GBIF do not, to my knowledge, evaluate sampling event datasets against any checklist, and do not infer absences at all, this seems to be an issue limited to (a) data publication guidelines for the IPT and (b) data ingestion from IPTs-published datasets by OBIS. I can see the problem of inferring absences, but it does not concern any current GBIF workflow. Is this possibly a discussion better to continue in the OBIS GitHub, @albenson-usgs?

@albenson-usgs
Copy link
Contributor Author

The issue is with the instructions in IPT manual. If GBIF is not evaluating absences in this way then the text is not accurate? I would advocate for removing that paragraph and keeping everything from "Alternatively, you can make species..." and removing the "Alternatively" but I understand that is how GBIF Norway is providing their datasets. Also step 2 is not how the OBIS community is supplying absences. Maybe this needs to be discussed within TDWG to figure out the most inclusive way to describe how to provide absences?

@mike-podolskiy90
Copy link
Contributor

Closing this. Please feel free to re-open if anything needs to be fixed in the IPT

@albenson-usgs
Copy link
Contributor Author

@mike-podolskiy90 does the IPT manual have a separate GitHub? This is still unresolved. I would like the text in the manual to be updated in such a way that it is more inclusive in how to provide absences. Until the text in the manual is modified I would like this issue to remain open somewhere.

@albenson-usgs
Copy link
Contributor Author

@mike-podolskiy90 @ahahn-gbif who has the ability/authority to make changes to the IPT manual?

@mike-podolskiy90
Copy link
Contributor

Thank you @albenson-usgs for quick response. I thought it was solved with the documentation.
Could you please make a pull request with your suggestions for the documentation, file is here https://github.com/gbif/ipt/blob/master/docs/en/modules/ROOT/pages/sampling-event-data.adoc

@albenson-usgs
Copy link
Contributor Author

Closed via 1832

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants