Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Table of contents
- How to transform your data into occurrence data
- Required DwC fields
- Recommended DwC fields
- Exemplar datasets
Resources which present evidence of the occurrence of a species at a particular place and normally on a specified date. These datasets expand on most Checklist Data because they contribute to mapping the historical or current distribution of a species. At the most basic, such datasets may provide only general locality information (even limited to a country identifier). Ideally they also include coordinates and a coordinate precision to support fine scale mapping. In many cases, these datasets may separately record multiple individuals of the same species. Examples of such datasets include databases of specimens in natural history collections, citizen science observations, data from species atlas projects, etc. If sufficient information exists in the source dataset (or applies consistently to all occurrences in the dataset), it is recommended that these datasets are presented as Sampling Event Data. These datasets include the same basic descriptive information included under Resource Metadata.
How to transform your data into occurrence data
Ultimately your data needs to be transformed into a table structure using Darwin Core (DwC) term names as column names.
Alternatively if your data is stored in a supported database, you can write an SQL table (view) using DwC column names. Be careful to include all required DwC fields and add as many recommended DwC fields as possible.
For extra guidance, you can look at the exemplar datasets.
You can augment your table with extra DwC columns, but only DwC terms from this list.
Populate it and upload it to the IPT. Try to augment it with as many DwC terms as you can.
Required DwC fields:
Recommended DwC fields:
- taxonRank - to substantiate scientificName
- kingdom - and other higher taxonomy if possible
- decimalLatitude & decimalLongitude & geodeticDatum - to provide a specific point location
- individualCount / organismQuantity & organismQuantityType - to record the quantity of a species occurrence
Q. How do I indicate a species was absent?
Q. How can I generalize sensitive species occurrence data?
A. How you generalize sensitive species data (e.g. restrict the resolution of the data) depends on the species' category of sensitivity. Where there is low risk of perverse outcomes, unrestricted publication of sensitive species data remains appropriate. Note it is the responsibility of the publisher to protect sensitive species occurrence data. For guidance, please refer to this best-practice guide. You could refer to this recent essay in Science, which presents a simplified assessment scheme that can be used to help assess the risks from publishing sensitive species data.
When generalizing data you should try not to reduce the value of the data for analysis, and make users aware how and why the original record was modified using the Darwin Core term informationWithheld.
As indicated in the best-practice guide, you should also publish a checklist of the sensitive species being generalized. For each species you should explain:
- the rationale for inclusion in the list
- the geographic coverage of sensitivity
- its sensitivity category
- the date to review its sensitivity
This will help alert other data custodians that these species are regarded as potentially sensitive in a certain area and that they should take the sensitivity into account when publishing the results of their analyses, etc.
Helpful formulas for generalizing point location
A. The following formula obscures a latitude/longitude point by a factor of 5000m. Note pointX and pointY must be provided in 'length in meters' and TRUNC truncates the number to an integer by removing the decimal part:
pointX = TRUNC(pointX / 5000) * 5000 pointY = TRUNC(pointY / 5000) * 5000