Skip to content

sommergeo/roceeh2wiki

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

roceeh2wiki

This repo contains tools to publish geodata from the ROCEEH Out of Africa Database (ROAD) to Wikipedia maps. The tools help to query data from ROAD's SPARQL endpoint and convert the results to the JSON schema of Wikipedia's map extension Kartographer. The JSON files can be pasted to Wikimedia Commons, which then are linked to Wikis.

Workflow of the roceeh2wiki package

Results

The following Wikis are currently provided:

ROAD Content Wikimedia Wikipedia
Ahmarian Link
Aurignacian Link en
Aterian Link en de fr it es pt
Chatelperronian Link en de fr it es pt
Early Upper Paleolithic Link
Fauresmith Link en de fr es
Gravettian Link en de fr it es pt
Howiesonspoort Link en de fr it
Initial Upper Paleolithic Link
Levantine Aurignacian Link
Micoquian Link en de fr es
Proto-Aurignacian Link
Solutrean Link
Still Bay Link en fr es
Uluzzian Link en de fr es pt

Use

The program uses the script roceeh2wiki.py to iterate through wiki_cultures.xlsx, a list of archaeological cultures to be queried from the ROAD database and published on Wikpedia. The spreadsheet contains 5 columns:

  • use indicates whether a culture should be queried or not. We aim to provide a high level of quality and ask specialists to evaluate the resulting maps. Therefore the maps are released step by step.
  • enwiki_title and enwiki_title are Wikis in the German and English speaking Wikipedia that correspond to cultures stored in ROAD. They are the targets of our maps.
  • road_culture contains cultures to be queried from the ROAD attribute archaeological_layer$archstratigraphy_idarchstrat (or its RDF realisation road:ArchaeologicalLayer\#archstratigraphyIdArchstrat).
  • description contains a JSON styled dictionary with the maps names in different languages.

For each row that is flagged use==T, the script creates an output TXT file containing a Wikipedia-flavoured JSON into the output folder. This JSON can be manually exported to Wikimedia Commons https://commons.wikimedia.org/wiki/Data:ROCEEH/*.map. The Wikimedia Commons file is then linked within the target Wiki's <mapframe>. It is planned to replace this manual last step in Wiki with a wikibot.

.
├── scripts                  
│   └── roceeh2wiki.py       # Query cultures from ROAD and export to JSON
├── data
│   └── wiki_cultures.xlsx   # List with ROAD cultures and corresponding Wikis to process
└── output
    ├── Ahmarian.txt 	     # 1st culture
    ├── Aterian.txt 	     # 2nd culture
    └── ...		     # Many more results
graph LR;
    A(Start) --> B[road_query];
    B --> C[wiki_json];
    C --> D[Export to /output/road_culture.txt];
    D --> E{Last item?};
    E --> |yes| F[Manually paste to Wikimedia];
    F --> G(End);
    H(/data/wiki_cultures.xlsx) --> |road_culture| B;
    H --> |description| C;
    E --> |no| H;

Background

Maps in Wikipedia

Web maps are implemented in Wikipedia by a <mapframe> element. The element's text argument is used as a subtitle of the map and cotains a name in the repsective language, the license and the source name. The mapframe points to a Wikimedia Commons file, referenced in the "title" tag.

Screenshot from Wikimedia Commons

<mapframe text="Selected Uluzzian sites from the [https://www.roceeh.uni-tuebingen.de/roadweb ROAD database] (CC BY-SA 4.0 ROCEEH)" longitude="16.3" latitude="41.5" zoom="5" width="450", height="350">
{
  "type": "ExternalData",
  "service": "page",
  "title": "ROCEEH/Uluzzian.map"
}
</mapframe>

Geodata in Wikimedia Commons

Geodata for Wikipedia are collected in Wikimedia Commons for two reasons. First, GeoJSON files can be excessively long depending on its content, so that it disturbs the readability in the Wikipedia text editor. Second, contents in Wikimedia Commons can be accessed from Wikis in all languages, no cross-posting needed.

Screenshot from Wikimedia Commons

URL

All ROAD contents follow the URL-Schema https://commons.wikimedia.org/wiki/Data:ROCEEH/*.map, where * denotes the content title. This file is accessible within Wikipedia as ROCEEH/*.map.

JSON

The following code is an exemple from the Uluzzian culture, and was shortened to show just one site, "Uluzzo C". The Kartographer schema uses a JSON file, which can be divided into general map information and geodata.

  • General map information:
    • The "license" for all ROAD data is CC BY-SA 4.0 and therefore complies with Wikipedia's terms of use.
    • The "description" is shown in Wikimedia Commons as a subheading. Different languages can be used to translate to the target Wiki's title, e.g. English "Uluzzian" vs. German "Uluzzien".
    • The "sources" tag is a standard text. The date of export is updated automatically.
    • The tags "zoom", "latitude" and "longitude" are optional and can be used to set the map's initial extent. The map engine however is smart enough to set a suitable extent automatically.
  • Geodata:
    • The "data" tag is at the heart of Wikipedia's JSON scheme and contains a standard GeoJSON file. Most of its contents are standardized. The appearance of the popup is defined in the features' "properties".
      • The "title" contains the name of the site as exported from ROAD. It is planned to optionally link to other Wikis, where available.
      • The "description" always contains a link to the site's Summary Data Sheet, a PDF generated with the URL https://www.roceeh.uni-tuebingen.de/roadweb/tcpdf/localityInfoPDF/localityInfoPDF.php?locality=*, where * denotes the site's name. It is planned to optionally include existing Wikipedia images and other contents, where available.
{
    "license": "CC-BY-SA-4.0",
    "description": {
        "de": "Fundstellen des Uluzzien",
        "en": "Uluzzian sites"
    },
    "sources": "Data retrieved from the [https://www.roceeh.uni-tuebingen.de/roadweb ROCEEH Out Of Africa Database (ROAD)].",
    "zoom": 5,
    "latitude": 41.5,
    "longitude": 16.3,
    "data": {
        "type": "FeatureCollection",
        "name": "uluzzian_road",
        "crs": {
            "type": "name",
            "properties": {
                "name": "urn:ogc:def:crs:OGC:1.3:CRS84"
            }
        },
        "features": [
            {
                "type": "Feature",
                "properties": {
                    "title": "Uluzzo C",
                    "description": "[[File:Grotta di Uluzzo C 4.jpg|150px|alt=Grotta di Uluzzo C]]</br>[https://www.roceeh.uni-tuebingen.de/roadweb/tcpdf/localityInfoPDF/localityInfoPDF.php?locality=Uluzzo%20C Summary Data Sheet]"
                },
                "geometry": {
                    "type": "Point",
                    "coordinates": [
                        17.96,
                        40.15
                    ]
                }
            }
        ]
    }
}

ROAD SPARQL endpoint

The ROAD database is implemented as a relational SQL database, that can be accessed through a web portal with many tools for querying, analyzing and visualizing. But the database is also regularly exported to RDF files, that can be queried through ROAD's SPARQL endpoint at https://www.roceeh.uni-tuebingen.de/road/.

Screenshot from Wikimedia Commons

SPARQL queries can be requested through a web interface, which allows to export results to a HTML table, JSON, XML or CSV file. The following example shows a query for archaeological sites associated with the Uluzzian culture and returns their names and geocoordinates. Roceeh2wiki uses the Python library sparql-dataframe to request data directly.

PREFIX road: <https://www.roceeh.uni-tuebingen.de/road/>
PREFIX wgs84_pos: <https://www.w3.org/2003/01/geo/wgs84_pos#>

SELECT  DISTINCT (?culture) ?title ?lon ?lat
WHERE {
  ?x a road:ArchaeologicalLayer.
  ?x road:ArchaeologicalLayer\#archstratigraphyIdArchstrat "Uluzzian".
  ?x road:ArchaeologicalLayer\#localityId ?title.
  ?y a road:Locality.
  ?y road:Locality\#id ?title.
  ?y wgs84_pos:long ?lon.
  ?y wgs84_pos:lat ?lat.
} ORDER BY ?title

About

Publish ROCEEH ROAD content to Wikipedia

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages