Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Joining Wikidata IDs to OSM #242

Closed
abhisheksaikia opened this issue Oct 27, 2016 · 22 comments

Comments

10 participants
@abhisheksaikia
Copy link

commented Oct 27, 2016

Objective: Joining matched wikidata id's to OSM City and Towns POIs

Worksheet https://docs.google.com/spreadsheets/d/1Gh2-T20DuXMoRhUwx_L6FYcaHufOwcwpHm-fe0gkRxI/edit#gid=0 - View

OSM Diary Scaling multilingual name tags with Wikidata


Why join wikidata and OSM?

An OpenStreetMap feature linked to a wikidata item is augmented with extra information that is stored in the wikidata database. The level of information that can be gathered for an OSM feature when it is linked to wikidata is immense. For example, you can get name translations from Wikidata for places in OSM, find population of countries - all using the API.

Workflow:

We have created spreadsheets for OSM city and town features which have corresponding wikidata entries with exact name matches.

  1. Click on a link in the JOSM column, the OSM feature with it's corresponding wikidata tag is displayed in JOSM. Click on add selected tags to add the wikidata tag

wikidata20

  1. Before uploading check:
  2. Case 1: Match distance is close to 0
    • Wikidata type matches to a City> Town> Commune> human settlement > Municipality
    • Check if the address matches in both OSM and Wikidata
    • Check the description matches to a city or town
    • Case 1A: There is only one match candidate, add the wikidata tag to OSM
    • Case 1B: If there is still more than one match, Wikidata has duplicate entries for the place. eg. OSM Feature id 958710338; Wikidata entries: Q1993763, Q2001643
  • Case 2: Match distance is high (>10km)
    • Wikidata type matches to a City> Town> Commune> human settlement > Municipality
    • Check if the address matches in both OSM and WIkidata
    • Case2A: If only one match, coordinate in Wikidata is wrong, add the wikidata id to OSM
  • Case 3: Anything else has no match in Wikidata. Manual matching has to be done to find out if a Wikidata entry exists or not.

Changeset comment: Adding Wikidata tag to Cities and Towns https://github.com/mapbox/mapping/issues/242
Changeset source: Wikidata

In the added column in the spreadsheet, enter:

yes=added the match, manual=added a wikidata match after a manual search, no=did not add a wikidata tag

Resources:

cc @mapbox/team-data

@bhousel

This comment has been minimized.

Copy link

commented Oct 27, 2016

Nice work! This is a great step forward for OSM.
cc @pigsonthewing & @1ec5, thought you would find this interesting..

@pigsonthewing

This comment has been minimized.

Copy link

commented Oct 27, 2016

Thank you; yes. I've requested access to the worksheet.

@1ec5

This comment has been minimized.

Copy link
Member

commented Oct 27, 2016

Should city boundary relations be tagged with wikidata too?

If there is still more than one match, Wikidata has duplicate entries for the place. eg. OSM Feature id 958710338; Wikidata entries: Q1993763, Q2001643

This turns out to be a village of the same name as a larger town nearby. Hopefully cases like this are rare.

Wikidata type matches to a City> Town> Commune> human settlement > Municipality

Note that Wikidata’s subclass hierarchy is a little unexpected. Normally I have to perform this search in the Wikidata Query Service:

SELECT ?settlement
WHERE {
  # Municipalities (and any subclasses thereof)
  { ?settlement wdt:P31/wdt:P279* wd:Q15284 . }
  # and cities (and any subclasses thereof)
  UNION { ?settlement wdt:P31/wdt:P279* wd:Q515 . }
  # but not metropolitan areas (or any subclasses thereof)
  FILTER NOT EXISTS { ?settlement wdt:P31/wdt:P279* wd:Q1907114 . }
  …
}

Note how city isn’t directly related to municipality, and how some metropolitan areas are also tagged as cities.

@planemad

This comment has been minimized.

Copy link
Contributor

commented Oct 28, 2016

@1ec5 lesser than 20% of the WIkidata entries have a subclass (P31) , most of them have a description though which are really useful for context. But since its free-form text it requires a manual check. You can see here how some names match perfectly with Wikidata items electoral districts and railway station, but the subclass is not defined.

Except for these outliers, most entries seem to match perfectly even if it does not have a subclass.

screenshot 2016-10-28 17 33 57

@planemad

This comment has been minimized.

Copy link
Contributor

commented Oct 28, 2016

Over 5,300 Cities and Towns have been joined to Wikidata in the last 4 days using this workflow.

screenshot 2016-10-28 17 42 12
http://overpass-turbo.eu/s/jGy

@1ec5

This comment has been minimized.

Copy link
Member

commented Oct 28, 2016

lesser than 20% of the WIkidata entries have a subclass (P31) , most of them have a description though which are really useful for context.

You mean an entry has an “is a” (P31) statement, but it’s too generic to associate with an OSM place tag? Like how Q526757, depicted in the screenshot above as having no wd_type, is generically tagged as a human settlement? That’s true; my suggestion was only to make sure to look at municipalities and cities but not metropolitan areas. If you’re already looking at all human settlements, including Q526757, then that’s fine. 👍

@planemad

This comment has been minimized.

Copy link
Contributor

commented Nov 9, 2016

Another 3,100 places got updated in the last two days.

screenshot 2016-11-09 17 08 14

http://overpass-turbo.eu/s/jWM
@maning

This comment has been minimized.

Copy link
Contributor

commented Nov 17, 2016

OSM-Canada community started working on this for all place=town/city in the country!

@maning

This comment has been minimized.

Copy link
Contributor

commented Nov 17, 2016

I also requested @mapconcierge -san to help us with the Japan places.

@bsrinivasa

This comment has been minimized.

Copy link

commented Nov 17, 2016

Here are the spreadsheet links for the following cities which doesn't have an english names and needs wikidata matching for neighbourhoods and suburbs.

cc: @maning @mapconcierge -san

@DenisCarriere

This comment has been minimized.

Copy link

commented Nov 18, 2016

@maning Thanks for the post, we've also started doing all of Africa. We've only been mapping for about 1-2 hours at 3-5 people and we've gotten pretty far so far.

http://tasks.osmcanada.ca/project/42

Using Add #wikidata to #Africa places to track our changesets.

Here's the SPARQL query we are running:

http://tinyurl.com/zhs93zu

SELECT DISTINCT ?place ?location ?distance ?placeDescription ?name_en ?name_fr ?name_es ?name_de ?name_it ?name_ru WHERE { 
  # Search Instance of & Subclasses
  ?place wdt:P31/wdt:P279* ?subclass
  FILTER (?subclass in (wd:Q486972))

  # Search by Nearest
  SERVICE wikibase:around { 
    ?place wdt:P625 ?location . 
    bd:serviceParam wikibase:center "Point(13.397 52.514)"^^geo:wktLiteral .
    bd:serviceParam wikibase:radius "15" . 
    bd:serviceParam wikibase:distance ?distance .
  }

  # Filter by Exact Name
  OPTIONAL {?place rdfs:label ?name_en FILTER (lang(?name_en) = "en") . }
  OPTIONAL {?place rdfs:label ?name_fr FILTER (lang(?name_fr) = "fr") . }
  OPTIONAL {?place rdfs:label ?name_es FILTER (lang(?name_es) = "es") . }
  OPTIONAL {?place rdfs:label ?name_de FILTER (lang(?name_de) = "de") . }
  OPTIONAL {?place rdfs:label ?name_it FILTER (lang(?name_it) = "it") . }
  OPTIONAL {?place rdfs:label ?name_ru FILTER (lang(?name_ru) = "ru") . }

  FILTER (regex(?name_en, "^Berlin$") || regex(?name_fr, "^Berlin$") || regex(?name_es, "^Berlin$") || regex(?name_de, "^Berlin$") || regex(?name_it, "^Berlin$") || regex(?name_ru, "^Berlin$")) .

  # Get Descriptions
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en,fr,es,de,it,ru"
  }

} ORDER BY ASC(?dist)

We decided to do 15km since a lot of cities weren't getting tagged, we could always drop it down to 10km if we wanted to, it's a parameter.

I've built a Javascript Geocoder that runs the SPARQL query mentioned above.

https://github.com/deniscarriere/geocoder-geojson

$ npm install -g geocoder-geojson
$ geocode --provider wikidata --nearest [-75.7,45.4] --radius 15 Ottawa | jq .features[0].id
"Q1930"
@woodpeck

This comment has been minimized.

Copy link

commented Nov 18, 2016

Everyone, please be aware that if you run a query against another database to retrieve tags that you then add to objects in OSM, you're doing a mechanical edit that requires prior consultation with the community.

If you edit an individual object in OSM that you have knowledge of and add the Wikidata tag, that's of course no problem at all. But editing objects that you don't know in areas you have never been to just because a SPARQL query suggests that you edit something is questionable. The addition of Wikidata tags by people without knowledge of the region regularly leads to data quality issues (e.g. different OSM entities for an area administered by a village and the village itself, but the same wikipedia entry for both, and the wikidata entry created from it claims to be just the village).

@DenisCarriere

This comment has been minimized.

Copy link

commented Nov 18, 2016

The reverting has begun by @woodpeck_repair.

https://www.openstreetmap.org/changeset/43780386

Frederik Ramm (@woodpeck) works for GeoFrabrik has been removing the wikidata & wikipedia tags from users.

Changeset/43673242 had 1 edit (Clearly a manual edit with local knowledge)

http://osm.mapki.com/history/way.php?id=211202482

image

http://osm.mapki.com/history/way.php?id=46225505
image

@woodpeck

This comment has been minimized.

Copy link

commented Nov 18, 2016

@DenisCarriere just to be clear, I am removing undiscussed mechanical edits in my capacity as a member of the Data Wokring group. It is near midnight in Europe, not a time where I am at work, and hence the fact that my day job is at Geofabrik has very little to do with this. You have made it a habit to stress my employment situation whenever you complain about me; I think that this is an unnecessary intrusion on your part that doesn't further your argument the least.

The undiscussed mechanical edits I am reverting are easy to restore; I am keeping a correspondence table of OSM ID to Wikidata tag. If you can be bothered to secure the agreement of the community for this then it will be easy to add these tags back in; however some concerns have been voiced regarding the licensing situation, as any process that relies on Wikidata location data might be problematic. All this would normally be discussed widely before someone runs a world-wide mechanical edit. Clicking thumbs-down on that doesn't make it less of a fact - even though I hear facts aren't that popular anymore in the world nowadays.

@bhousel

This comment has been minimized.

Copy link

commented Nov 19, 2016

@woodpeck, @DenisCarriere,
I'm very sorry to hear that this work is being reverted, but acknowledge and respect that @woodpeck is just trying to do his job.

I'm locking this thread to collaborators in order to prevent any further bickering and escalation, while we regroup from the Mapbox side and decide how to proceed. Thanks.

@mapbox mapbox locked and limited conversation to collaborators Nov 19, 2016

@planemad

This comment has been minimized.

Copy link
Contributor

commented Dec 1, 2016

We have a validator tool to help spot possible tag mismatches by comparing the location of the OSM and Wikidata feature and highlighting those where the distance is greater than 1km.

http://osmlab.github.io/wikidata-osm

It seems like most of the mismatches are because of the node location of a large regions not matching. It still needs some improvement but can be quite useful already.

cc @woodpeck @DenisCarriere

@mapbox mapbox unlocked this conversation Dec 21, 2016

@bsrinivasa

This comment has been minimized.

Copy link

commented Jan 16, 2017

After adding Wikidata tags to OSM cities, towns and POIs, the team resumed back with linking Wikidata tags to OSM for neighbourhoods and suburbs across 20 popular world cities.

Here is the spreadsheet containing the list of OSM neighbourhoods.

Workflow:

  1. A list of neighbourhoods and suburbs in openStreetMap for popular cities are queried using overpass.
  2. Filter out those places which already have a Wikidata item tagged to it.
  3. Wikidata items for these neighbourhoods are queried using SPARQL query. (Sample query)
  4. The results from both the platforms are compared and matched to find a potential match.
  5. The matched entities are verified manually and are linked to OpenStreetMap using JOSM api url from spreadsheet which allows to quickly add tags using JOSM.

cc: @planemad @maning @chtnha @kupendrayadav

@bsrinivasa

This comment has been minimized.

Copy link

commented Jan 16, 2017

Progress update

@chtnha @kupendrayadav and I verified and linked Wikidata entities to OSM neighbourhoods mostly around London. Those features were chosen where the name string from both the platform matched more than 90%. The work was reviewed for validation of the added tags and 99% of those were found good. It was also found that the distance between the coordinates from both the platforms did not exceed ~ 1.5 kilometer with only one exceptional case being 1.8 kilometer for the ones we added Wikidata tags.

  • Total Wikidata tags added = 117
  • Number of unique neighbourhoods = 129
  • Total reviewed = 186

We could add 90.7 % of total unique neighbourhoods and suburbs reviewed today.

cc: @planemad @maning

@bsrinivasa

This comment has been minimized.

Copy link

commented Jan 17, 2017

Progress update

On 17th Jan, a total 197 features were verified by a team of 3 out of which Wikidata tags were added to 128 neighbourhoods around various cities.

Here is the spreadsheet tracking progress and contains the list of OSM neighbourhoods

@bsrinivasa

This comment has been minimized.

Copy link

commented Feb 22, 2017

The team has resumed back on matching wikidataentities to missing OSM features. Here is the workflow.

  1. Use Wikimama tool which,
  • Generates a list of features from OpenStreetMap using Overpass query looking for place=city on nodes.
  • Generates a list of features which are instances of city in Wikidata using SPARQL wikidata query.
  • Compares the lists from both platforms and finds potential matches based on the name matches and the distance between the coordinates from both the platforms. The tool also scores based on string match. (100 indicates the perfect string match and the score gradually decreases with variations in the name labels)
  1. Import the output file - output.csv into a spreadsheet.
  2. Use basic filtering, sorting and conditional formatting to filter out the exact matches which have less distance values.
  3. Verify each one of them before adding and use the JOSM url to quickly add the wikidata tags to the OSM feature.

This is the spreadsheet used by the team for matching and verifying wikidata tags.

cc: @abhisheksaikia @chtnha @kupendrayadav @planemad

@kupendrayadav

This comment has been minimized.

Copy link

commented Feb 24, 2017

@bsrinivasa, @chtnha, @abhisheksaikia and I worked for two days on the matching the wiki data tags to cities. We spent 3 hours per day and completed reviewing 913 cities.

Breakdown stats for adding wiki data tags:

Date No. of cities reviewed No. of cities where tags were added Duplicates
22-02-2017 426 421 5
23-02-2017 487 485 2
Total 913 906 7

cc @planemad

@bsrinivasa

This comment has been minimized.

Copy link

commented Nov 14, 2017

Thank you everyone for joining and contributing in the effort of joining Wikidata ids to OpenStreetMap places. 👏 😄

The current Wikidata coverage (as of Oct 2017) for various important place tags in OpenStreetMap is captured in this spreadsheet.

The following two charts 💹 indicates the number of places in OSM and the number of places linked with Wikidata. The common and expected pattern that could be observed here is that the most important features such as countries and cities which are also in less numbers have been linked and places in the lower admin levels with a huge number of features have a gap and need a lot of Wikidata matches.

chart1
Number of places from various admin levels and numbers linked with wikidata

chart2
Percentage of total number of places in various admin divisions

With this note, closing this issue. Lets focus together on filling out these gaps and enriching the Wikidata coverage in OSM in the upcoming days. Feel free to reopen this issue if there are any ongoing projects for matching Wikidata or to initiate relevant discussion. 🚀

@bsrinivasa bsrinivasa closed this Nov 14, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.