kbraak edited this page Sep 6, 2016 · 15 revisions
Clone this wiki locally

Back to instructions

Resource metadata   —>   Checklist Data   —>   Occurrence Data   —>   Sampling Event Data

Sampling Event Data

Table of contents


Resources which present evidence not only of the occurrence of a species at a particular place and time, but also sufficient detail to assess community composition for a broader taxonomic group or relative abundance of species at multiple times and places. Such datasets derive from standardized protocols for measuring and observing biodiversity. Examples include vegetation transects, standardized bird census data, ecogenomic samples, etc. These add to Occurrence Data by indicating what protocol was followed, which occurrence records derive from a sampling event following the protocol, and ideally the relative abundance (by a suitable numerical measure) of species recorded in the sample. These additional elements can support better comparison of the data from different times and places (where the same protocol is indicated) and may in some cases enable researchers to infer absence of particular species from particular sites. These datasets include the same basic descriptive information included under Resource Metadata and the same standard elements as in Occurrence Data.

How to transform your data into sampling event data

Ultimately your data needs to be transformed into two tables using Darwin Core (DwC) term names as column names: one table of sampling events and another table of species occurrences derived from (associated to) each sampling event.

Try putting your data into the Excel template, which includes two sheets: one for sampling events and another for associated species occurrences.

Alternatively if your data is stored in a supported database, you can write two SQL tables (views) using DwC column names: one for sampling events and another for associated species occurrences.

Each sampling event record should include all required DwC fields and as many recommended DwC fields as possible. You can augment your table with extra DwC columns, but only DwC terms from this list.

Similarly each species occurrence record should include all required DwC fields and as many recommended DwC fields as possible. You can augment your table with extra DwC columns, but only DwC terms from this list. Some DwC terms will be redundant meaning they are added to both sampling event and species occurrence records. As a general rule, try not to add redundant terms with the same values. It is fine if they have different values though, for example if you wanted to define a location of an event and then define more specific locations for individual occurrences. Otherwise when the location of individual occurrences isn't supplied, its location gets inherited from the event.

For extra guidance, you can look at the template populated with example data or the list of exemplar datasets.


Download Sampling Event Data Template Download Sampling Event Data Template

Populate it and upload it to the IPT.

Required DwC fields:

Recommended DwC fields:

Exemplar datasets:


Q. How do I indicate that a sampling event was part of a time series?

A. All sampling events at the same location must share the same locationID.

Q. How do I publish a hierarchy of events (recursive data type) using parentEventID?

A. The classic example is sub-sampling of a larger plot. To group all (child) sub-sampling events under the (parent) sampling event, the parentEventID of all sub-sampling events must be set to the eventID of the (parent) sampling event. To be valid, all parentEventIDs must reference eventIDs of records defined in the same dataset. Otherwise, the parentEventID must be globally unique identifier (e.g. DOI, HTTP URI, etc) that resolves to an event record described elsewhere. Ideally, all (child) sub-sampling events share the same date and location as the (parent) event it references.

Q. How do I publish absence data?

A. Step #1: Include sampling event records even if the sampling yielded no derived species occurrences. This allows species absences to be inferred. This example sampling event dataset from Norway demonstrates how this looks.

Alternatively, you can make species absences explicit by adding a species occurrence record for each species that could have been observed at the time and place of sampling, but was not observed, by setting the following fields:


Optional (provide one or both):

Warning: Currently GBIF indexes all species occurrences no matter if they "present" or "absent". Until this issue is fixed, GBIF recommends excluding all species absences by using the following filter on the IPT’s Occurrence Mapping page:

Filter: afterTranslation -> occurrenceStatus -> NotEquals -> absent

More information about how to apply a filter can be found in the IPT User Manual here.

Step #2: Define the taxonomic scope of all sampling events included in the dataset, it is recommended to publish a timestamped checklist together with the sampling event dataset, which represents the species composition that could be observed at the time and place of sampling given the sampling protocol (and/or the taxonomic coverage of the study and the expertise of the personnel carrying out identification). This would allow for accurate presence/absence data being recorded. In addition to the normal (expected) species composition, the checklist could include invasive (unexpected) species. For taxonomic and biogeographical/ecological reasons, however, this checklist would exist solely within the context of the sampling event dataset.

Instructions how to create a checklist can be found here. Detailed metadata should be included with the checklist describing a) the people who performed the identifications and their taxonomic expertise and b) how it was decided that these species were detectable & identifiable at the time and place of sampling.

To link the checklist to the sampling event dataset, add the checklist to the dataset metadata in the External links section.