Data Quality Flags

Dan Stoner edited this page Aug 14, 2018 · 32 revisions

Purpose

This document describes how iDigBio identifies known data quality issues of ingested specimen data and represents them in the iDigBio Search API. During the ingestion process, iDigBio often encounters data that are missing, inconsistent, factually incorrect, or out of compliance with meta-data standards and controlled vocabularies. For example, Taxonomic Names are added from the GBIF Backbone Taxonomy. To facilitate indexing, corrections are made to these data and they are flagged in the search API. Another example is replacement of common mispellings (e.g. "Flordia" instead of "Florida").

The following summary indicates the frequency that various flags have been assigned to records in iDigBio:

http://search.idigbio.org/v2/summary/top/records?top_fields=[%22flags%22]&count=1000

Generally guidelines for flag names:

  1. a flag named with "added" means the field was empty in the provided data and iDigBio added a value to help fully populate the record. This enhances searching and discovery.
  2. a flag named with "replaced" means the field contained data from the provider and iDigBio attempted to make it more consistent by replacing the value. Note that the original data values are always available in the raw data. The replaced values are designed to enhance searching and discovery.

Flags

The table below describes the flags that might be added to records in iDigBio:

Flag Definition
datecollected_bounds Date Collected out of bounds (1700-01-02, Date of Indexing).
dwc_basisofrecord_paleo_conflict Basis of Record was not FossilSpecimen, but the record contains paleo context terms.
dwc_acceptednameusageid_added tbd
dwc_basisofrecord_invalid tbd
dwc_basisofrecord_removed tbd
dwc_class_added Darwin Core Class Added. http://terms.tdwg.org/wiki/dwc:class
dwc_class_replaced Darwin Core Class Corrected.
dwc_continent_added Darwin Core Continent Added. http://terms.tdwg.org/wiki/dwc:continent
dwc_continent_replaced Darwin Core Continent Corrected.
dwc_country_added Darwin Core Country Added. http://terms.tdwg.org/wiki/dwc:country
dwc_country_replaced Darwin Core Country Corrected.
dwc_datasetid_added tbd
dwc_datasetid_replaced tbd
dwc_family_added tbd
dwc_family_replaced tbd
dwc_genus_added tbd
dwc_genus_replaced tbd
dwc_infraspecificepithet_added tbd
dwc_infraspecificepithet_replaced tbd
dwc_kingdom_added Darwin Core Kingdom Added. http://terms.tdwg.org/wiki/dwc:kingdom
dwc_kingdom_replaced Darwin Core Kingdom Corrected.
dwc_kingdom_suspect tbd
dwc_multimedia_added tbd
dwc_order_added Darwin Core Order Added. http://terms.tdwg.org/wiki/dwc:order
dwc_order_replaced Darwin Core Order Corrected.
dwc_originalnameusageid_added tbd
dwc_parentnameusageid_added tbd
dwc_phylum_added Darwin Core Phylum Added. http://terms.tdwg.org/wiki/dwc:phylum
dwc_phylum_replaced Darwin Core Phylum Corrected.
dwc_scientificnameauthorship_added tbd
dwc_specificepithet_added tbd
dwc_specificepithet_replaced tbd
dwc_stateprovince_replaced Darwin Core State or Province Corrected.
dwc_stateprovince_replaced tbd
dwc_taxonid_added tbd
dwc_taxonid_replaced tbd
dwc_taxonomicstatus_added tbd
dwc_taxonomicstatus_replaced tbd
dwc_taxonrank_added tbd
dwc_taxonrank_invalid Taxonrank supplied not contained in controlled vocabulary.
dwc_taxonrank_removed tbd
dwc_taxonrank_replaced tbd
dwc_taxonremarks_added tbd
dwc_taxonremarks_replaced tbd
gbif_canonicalname_added tbd
gbif_genericname_added tbd
gbif_reference_added tbd
gbif_taxon_corrected tbd
gbif_vernacularname_added tbd
geopoint_0_coord Geographic Coordinate had literal '0' values.
geopoint_bounds Geographic Coordinate was out of bounds.
geopoint_datum_error Geographic Coordinate has Invalid Geodetic Datum. http://terms.tdwg.org/wiki/geodeticDatum
geopoint_datum_missing Geographic Coordinate Missing Geodetic Datum (Assumed to be WGS84).
geopoint_low_precision Geographic Coordinate has Low Precision.
geopoint_pre_flip Prior to examining other factors, the magnitude of latitude was determined to be greater than 180, and the longitude was less than 90, so their values were swapped.
geopoint_similar_coord Geographic Coordinate had similar latitude and longitude (+/- lat == +/- lon).
idigbio_isocountrycode_added iDigBio ISO 3166-1 alpha-3 Country Code Added. iDigBio correction table
rev_geocode_both_sign Geographic Coordinate had its Latitude and Longitude negated to place it in correct country.
rev_geocode_corrected The reverse geocoding process was able to find a coordinate operation that placed the point within the stated country.
rev_geocode_eez The Reverse geocode does not fall within the land boarders of a country, but does fall inside a countries' exclusive economic zone water boundary (approx. 200 miles from shore).
rev_geocode_eez_corrected The reverse geocoding process was able to find a coordinate operation that placed the point within the stated country's exclusive economic zone.
rev_geocode_failure The point was not able to be reverse geocoded to any country.
rev_geocode_flip Geographic Coordinate had its Latitude and Longitude swapped to place it in correct country.
rev_geocode_flip_both_sign Geographic Coordinate had its Latitude and Longitude both swapped and negated to place it in correct country.
rev_geocode_flip_lat_sign Geographic Coordinate had its Latitude and Longitude swapped, and its Latitude negated to place it in correct country.
rev_geocode_flip_lon_sign Geographic Coordinate had its Latitude and Longitude swapped, and its Longitude negated to place it in correct country.
rev_geocode_lat_sign Geographic Coordinate had its Latitude negated to place it in correct country.
rev_geocode_lon_sign Geographic Coordinate had its Longitude negated to place it in correct country.
rev_geocode_mismatch Geographic Coordinate did not reverse geocode to correct country.
scientificname_added Scientific name added by concatenating genus and species.
taxon_match_failed tbd

Query Examples

Searching records for the flag scientificname_added:

{
  "flags":"scientificname_added"
}
http://search.idigbio.org/v2/search/records?rq={%22flags%22:%22scientificname_added%22}

Searching my recordset records that are flagged with scientificname_added:

{
  "flags":"scientificname_added",
  "recordset":"c38b867b-05f3-4733-802e-d8d2d3324f84"
}
http://search.idigbio.org/v2/search/records?rq={%22flags%22:%22scientificname_added%22,%22recordset%22:%22c38b867b-05f3-4733-802e-d8d2d3324f84%22}
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.