Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Pelagios Gazetteer Interconnection Format
Clone this wiki locally
Gazetteers form the backbone of the Pelagios initiative. Through shared gazetteer references, we create connections between otherwise disconnected datasets.
There are many gazetteers out there, and there are good reasons for this diversity: geographical and temporal coverage, granularity, cultural focus, technical emphasis (e.g. emphasis on names vs. geometry), scholarly quality, community,...
This is why Pelagios needs different gazetteers to interoperate with each other on their basic level, so that we can build tools and infrastructure that allows everyone to:
- search across different gazetteers
- find enough information in order to identify and disambiguate places
- annotate data with stable URIs to the most appropriate gazetteer
Our goal is not to define The One unified data model to represent gazetteers. What we aim for is simply a uniform way to build links between different gazetteers, along with just enough additional metadata to support the three requirements above.
A current reference implementation of cross-gazetteer search is part of the upcoming Peripleo API. (See screenshot above, which shows the overview page for Carnuntum, as covered by the different gazetteers linked to Pelagios.)
A Simple Example
To publish a gazetteer to Pelagios, you need to create a summary of it in RDF, and publish it online as a dump file. The example below is a "dump file" with just a single place.
@prefix cito: <http://purl.org/spar/cito> . @prefix cnt: <http://www.w3.org/2011/content#> . @prefix dcterms: <http://purl.org/dc/terms/> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> . @prefix geosparql: <http://www.opengis.net/ont/geosparql#> . @prefix gn: <http://www.geonames.org/ontology#> . @prefix lawd: <http://lawd.info/ontology/> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix skos: <http://www.w3.org/2004/02/skos/core#> . <http://www.mygazetteer.org/place/Athens> a lawd:Place ; # Don't think of label and description in terms of a # 'primary name' or detailed abstract. Think of this in # terms of UI: what do you want users to see about your # place in a list of search results? rdfs:label "Athens"@en ; dcterms:description "A major Greek city-state"@en ; # Optional: a present-day (ISO-3166 alpha2) country code gn:countryCode "GR" ; # Dont' think of this in terms of 'how long your place # existed'. Use it to specify the period your gazetteer # is concerned with it and/or provides attestations. # In terms of format, use ISO 8601 (YYYY[-MM-DD) or time # interval (<start>/<end>). dcterms:temporal "-750/640" ; # Additionally, we encourage the use of (one or multiple) # PeriodO identifiers to denote time periods dcterms:temporal <http://n2t.net/ark:/99152/p03wskd389m> ; # Greco-Roman # Use closeMatch to express 'vague' matches, e.g. to link # to a modern-day town now located there skos:closeMatch <http://sws.geonames.org/264371/> ; # Use exactMatch to express (geographical, temporal, cultural) # identity skos:exactMatch <http://pleiades.stoa.org/places/579885> ; # Attestions can apply to individual names (as in the example # below). But They may also apply to the place as a whole. # You can also provide variant names using lawd:variantForm. # For language encoding, use RFC 5646 format. lawd:hasName [ lawd:primaryForm "Athens"@en ]; lawd:hasName [ lawd:primaryForm "Athenae" ] ; lawd:hasName [ lawd:primaryForm "Αθήνα"@el ; lawd:hasAttestation <http://www.mygazetteer.org/att/0001> ] ; # Optional: a representative point coordinate geo:location [ geo:lat 5.16 ; geo:long 52.05 ] ; # Optional: detail geometry as WKT string # (alternatively, use osgeo:asGeoJSON for a GeoJSON string) geosparql:hasGeometry [ geosparql:asWKT "LINESTRING (5.16 52.05, 5.17 52.05, 5.16 52.06)" ; ] ; foaf:primaryTopicOf <http://www.mygazetteer.org/place/Athens.html> ; dcterms:isPartOf <http://www.mygazetteer.org/place/Greece> ; . <http://www.mygazetteer.org/att/0001> a lawd:Attestation ; dcterms:publisher <http://www.mygazetteer.org/> ; cito:citesAsEvidence <http://www.mygazetteer.org/documents/01234> ; cnt:chars "Αθήνα" .
The example above shows just some of the very basics. If you want, you can pack in a lot more data about your places!
- Example: adding image links
- Example: publishing bibliographic references
- Adding timestamps to names or geometries
- Adding source information to names or geometries
The Digital Atlas of the Roman Empire publishes its gazetteer in Pelagios format. The dump file (gzipped RDF/Turtle, 1.5MB, ~27.000 places) is available at http://dare.ht.lu.se/export_pelagios3.ttl.gz
We maintain a Python script that converts the native dump format of the iDAI gazetteer to Pelagios here. The script will not be re-usable directly, but should provide a good starting point for other conversions.
An on-line service that validates your .ttl file, so you know whether you've done it right.
Modern vs. Historical Names
There has been discussion whether we need to distinguish between (a) modern name(s) vs. historical names. Most historical gazetteers will usually focus on historical names only; modern names would be considered 'finding aids' for users, rather than actual gazetteer data. One suggestion, therefore, has been to use dcterms:spatial for modern names:
dcterms:spatial "Athens"@en ;
dcterms:spatial [ rdfs:label "Athens"@en ] ;
dcterms:spatial, by definition, mandates an object of type
dcterms:Location. I.e. the former example may not be valid RDF (?). But it seems literal objects are widely used in the wild. (See e.g. use in Europeana.)
Constraining isPartOf Relation by Time
Our hierarchy model is deliberately kept as simple as possible (using dcterms:isPartOf or dcterms:hasPart). However, there is a clear use case for constraining the relation by time. ("Place A has been part of Place B between year X and Y" - e.g. in terms administrative units.) It's an open question on how to model this as simple & straightforward as possible.
I.e essentially what's needed is something like this ("RDF pseudo-code") that carries both the parent resource and the constraint as a payload.
dcterms:isPartOf "<http://maps.cga.harvard.edu/tgaz/placename/hvd_113652> part of Xihan 西汉 from -154 to -118 " ;
Note (RSi): compare qualified relations.