Pelagios Gazetteer Interconnection Format

John Muccigrosso edited this page Jun 29, 2018 · 43 revisions

Version: 1.0.1

Gazetteers form the backbone of the Pelagios initiative. Through shared gazetteer references, we create connections between otherwise disconnected datasets.

There are many gazetteers out there, and there are good reasons for this diversity: geographical and temporal coverage, granularity, cultural focus, technical emphasis (e.g. emphasis on names vs. geometry), scholarly quality, community,...

This is why Pelagios needs different gazetteers to interoperate with each other on their basic level, so that we can build tools and infrastructure that allows everyone to:

  • search across different gazetteers
  • find enough information in order to identify and disambiguate places
  • annotate data with stable URIs to the most appropriate gazetteer

Our goal is not to define The One unified data model to represent gazetteers. What we aim for is simply a uniform way to build links between different gazetteers, along with just enough additional metadata to support the three requirements above.

Pelagios API 2.0

A current reference implementation of cross-gazetteer search is part of the upcoming Peripleo API. (See screenshot above, which shows the overview page for Carnuntum, as covered by the different gazetteers linked to Pelagios.)

A Simple Example

To publish a gazetteer to Pelagios, you need to create a summary of it in RDF, and publish it online as a dump file. The example below is a "dump file" with just a single place.

@prefix cito: <http://purl.org/spar/cito> .
@prefix cnt: <http://www.w3.org/2011/content#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix geosparql: <http://www.opengis.net/ont/geosparql#> .
@prefix gn: <http://www.geonames.org/ontology#> .
@prefix lawd: <http://lawd.info/ontology/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .

<http://www.mygazetteer.org/place/Athens> a lawd:Place ;

  # Don't think of label and description in terms of a 
  # 'primary name' or detailed abstract. Think of this in
  # terms of UI: what do you want users to see about your
  # place in a list of search results? 

  rdfs:label "Athens"@en ;

  dcterms:description "A major Greek city-state"@en ;

  # Optional: a present-day (ISO-3166 alpha2) country code
  gn:countryCode "GR" ;

  # Dont' think of this in terms of 'how long your place
  # existed'. Use it to specify the period your gazetteer
  # is concerned with it and/or provides attestations.
  # In terms of format, use ISO 8601 (YYYY[-MM-DD) or time
  # interval (<start>/<end>).
  dcterms:temporal "-750/640" ;

  # Additionally, we encourage the use of (one or multiple)
  # PeriodO identifiers to denote time periods 
  dcterms:temporal <http://n2t.net/ark:/99152/p03wskd389m> ; # Greco-Roman

  # Use closeMatch to express 'vague' matches, e.g. to link 
  # to a modern-day town now located there
  skos:closeMatch <http://sws.geonames.org/264371/> ;

  # Use exactMatch to express (geographical, temporal, cultural)
  # identity
  skos:exactMatch <http://pleiades.stoa.org/places/579885> ;

  # Attestions can apply to individual names (as in the example 
  # below). But They may also apply to the place as a whole.
  # You can also provide variant names using lawd:variantForm.
  # For language encoding, use RFC 5646 format.
  lawd:hasName [ lawd:primaryForm "Athens"@en ];
  lawd:hasName [ lawd:primaryForm "Athenae" ] ;
  lawd:hasName [ 
    lawd:primaryForm "Αθήνα"@el ; 
    lawd:hasAttestation <http://www.mygazetteer.org/att/0001>
  ] ;

  # Optional: a representative point coordinate
  geo:location [ geo:lat 5.16 ;  geo:long 52.05 ] ;

  # Optional: detail geometry as WKT string
  # (alternatively, use osgeo:asGeoJSON for a GeoJSON string)
  geosparql:hasGeometry [
    geosparql:asWKT "LINESTRING (5.16 52.05, 5.17 52.05, 5.16 52.06)" ;
  ] ;

  foaf:primaryTopicOf
    <http://www.mygazetteer.org/place/Athens.html> ;

  dcterms:isPartOf <http://www.mygazetteer.org/place/Greece> ;
  .

<http://www.mygazetteer.org/att/0001> a lawd:Attestation ;
  dcterms:publisher <http://www.mygazetteer.org/> ;
  cito:citesAsEvidence
    <http://www.mygazetteer.org/documents/01234> ;
  cnt:chars "Αθήνα" 
  .

Advanced Techniques

The example above shows just some of the very basics. If you want, you can pack in a lot more data about your places!

Resources

Open Questions

Modern vs. Historical Names

There has been discussion whether we need to distinguish between (a) modern name(s) vs. historical names. Most historical gazetteers will usually focus on historical names only; modern names would be considered 'finding aids' for users, rather than actual gazetteer data. One suggestion, therefore, has been to use dcterms:spatial for modern names:

dcterms:spatial "Athens"@en ;

or

dcterms:spatial [ rdfs:label "Athens"@en ] ;

Note that dcterms:spatial, by definition, mandates an object of type dcterms:Location. I.e. the former example may not be valid RDF (?). But it seems literal objects are widely used in the wild. (See e.g. use in Europeana.)

Constraining isPartOf Relation by Time

Our hierarchy model is deliberately kept as simple as possible (using dcterms:isPartOf or dcterms:hasPart). However, there is a clear use case for constraining the relation by time. ("Place A has been part of Place B between year X and Y" - e.g. in terms administrative units.) It's an open question on how to model this as simple & straightforward as possible.

I.e essentially what's needed is something like this ("RDF pseudo-code") that carries both the parent resource and the constraint as a payload.

dcterms:isPartOf "<http://maps.cga.harvard.edu/tgaz/placename/hvd_113652> part of Xihan 西汉 from -154 to -118 " ;

.

Note (RSi): compare qualified relations.