Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add links to locations in wikidata using dct:spatial #42

Closed
acka47 opened this issue Jul 8, 2014 · 16 comments
Closed

Add links to locations in wikidata using dct:spatial #42

acka47 opened this issue Jul 8, 2014 · 16 comments

Comments

@acka47
Copy link
Contributor

acka47 commented Jul 8, 2014

As addition to OSM and GeoNames links (see #17), we should also add links to the corresponding wikidata resource to NWBib resources.

@acka47
Copy link
Contributor Author

acka47 commented Oct 23, 2014

We should also do this for values in field 700n=96|97, see #53. As we encounter some problems with looking up Nominatim API and linking to LinkedGeoData, we might rather use Wikidata as default.

@acka47
Copy link
Contributor Author

acka47 commented Oct 23, 2014

Here is an example search for "Köln" in the wikidata API:
curl "https://www.wikidata.org/w/api.php?action=wbsearchentities&search=Köln&language=de&format=json"

Result:

{

    "searchinfo": {
        "search": "Köln"
    },
    "search": [
        {
            "id": "Q365",
            "url": "//www.wikidata.org/wiki/Q365",
            "description": "Großstadt in Nordrhein-Westfalen",
            "label": "Köln"
        },
        {
            "id": "Q690441",
            "url": "//www.wikidata.org/wiki/Q690441",
            "description": "Leichter Kreuzer Köln (1928)",
            "label": "Köln"
        },
...
    ],
    "search-continue": 7,
    "success": 1

}

@acka47
Copy link
Contributor Author

acka47 commented Oct 27, 2014

I've played around a bit with Wikidata Query. One can quite easily extract the Wikidata subset of all settlements in Northrhine-Westphalia.

Here are the classes and properties needed for this:

  • Q1198: Nordrhein-Westfalen
  • Q486972: settlement
  • P31: instance of
  • P131: is in the administrative territorial entity
  • P279: subclass of

Queries:

  1. All items that settlements or a subclass of settlement (DON'T OPEN THE LINK IN FIREFOX!): curl -g "https://wdq.wmflabs.org/api?q=claim[31:(TREE[486972][][279])]"
  2. All items that are in/subunits of the administrative territorial entity of Northrhine-Westphalia: curl -g https://wdq.wmflabs.org/api?q=tree[1198][150][131]
  3. Combination of 1. and 2.: curl -g "https://wdq.wmflabs.org/api?q=tree[1198][150][131]%20AND%20claim[31:%28TREE[486972][][279]%29]" (Remember to escape the spaces.)

As far as I can see, one can not query this API for labels, thus, to map NWBib spatial albels to Wikidata one has to combine this with other techniques.

Proposal:

a) Get data for the 6756 wikidata items of query: curl -g "https://wdq.wmflabs.org/api?q=tree[1198][150][131]%20AND%20claim[31:%28TREE[486972][][279]%29]" and compare each NWBib location string with the labels of the wikidata items on the list.

OR

b) Query the "official" Wikidqata API à la curl "https://www.wikidata.org/w/api.php?action=wbsearchentities&search=altenrath&language=de&format=json" and take first result that can be found on the list in 3.

@acka47 acka47 added the ready label Oct 29, 2014
@acka47
Copy link
Contributor Author

acka47 commented Nov 3, 2014

This is how to get the JSOn data for the wikidata item with ID 123: https://www.wikidata.org/wiki/Special:EntityData/Q123.json

For example "Cologne": https://www.wikidata.org/wiki/Special:EntityData/Q365.json

@acka47
Copy link
Contributor Author

acka47 commented Nov 3, 2014

Edge cases and tests (to be amended):

@dr0i dr0i added working and removed ready labels Nov 4, 2014
@dr0i
Copy link
Member

dr0i commented Nov 4, 2014

Hm.
I could easily create a concordance list wikidataID<->labels, but there seems to be missing a lot of those labels we have in our data. Many (seems to be true for most of the settlements consisting of more than one word) are not searchable (e.g. [Kreuztal Burgholdinghausen](https://www.wikidata.org/w/api.php?action=wbsearchentities&search=kreutal burgholdinghausen&language=de&format=json) even if the query terms Burgholdinghausen and Kreuztal actually do exist in that data). Maybe I missed something. Also, is it possible to restrict this freetext search to something like NRW?
As it is, with nominatim we had much better results.

Possible solution could be to store all the wikidata items and query against that data locally to build a concordance list.

dr0i added a commit to lobid/lodmill that referenced this issue Nov 6, 2014
See hbz/nwbib#42.

* update test
* update flux' (i.e. remove dynamical lookup of OSM)
dr0i added a commit to lobid/lodmill that referenced this issue Nov 7, 2014
See hbz/nwbib#42.

* update test
* update flux' (i.e. remove dynamical lookup of OSM)
dr0i added a commit to lobid/lodmill that referenced this issue Nov 12, 2014
See hbz/nwbib#42.

* update test
* update flux' (i.e. remove dynamical lookup of OSM)
dr0i added a commit to lobid/lodmill that referenced this issue Nov 12, 2014
See hbz/nwbib#42.

* update test
* update flux' (i.e. remove dynamical lookup of OSM)
dr0i added a commit to lobid/lodmill that referenced this issue Nov 12, 2014
See hbz/nwbib#42.

* update test
* update flux' (i.e. remove dynamical lookup of OSM)
dr0i added a commit to lobid/lodmill that referenced this issue Nov 12, 2014
See hbz/nwbib#42.

* update test
* update flux' (i.e. remove dynamical lookup of OSM)
dr0i added a commit to hbz/lobid that referenced this issue Nov 12, 2014
@dr0i
Copy link
Member

dr0i commented Nov 17, 2014

Uses wikidata to link wikidata entities with geo coordinates. Stores the data directly on the resource with property lv:subjectLocation. Stores Wikidata object using dct:spatial.
See e.g. http://test.lobid.org/resource/BT000084171.
Amount of data with geo location: 203,226, using the query at the bottom.
Not so bad, as #17 (comment) states an optimum of 234,082. And there is some potential to optimize the lookup-database.

(query "how much resources exist having a field lv:subjectLocation" ):

curl -XGET 'http://193.30.112.171:9200/lobid-resources-staging/_search' -d '
{
    "query": {
        "wildcard": {
            "@graph.http://purl.org/lobid/lv#subjectLocation.@value" :"*"
        }
    }
}'

@dr0i dr0i assigned acka47 and unassigned dr0i Nov 17, 2014
@dr0i dr0i removed the processing label Nov 17, 2014
@dr0i
Copy link
Member

dr0i commented Nov 17, 2014

Deployed to staging.

@acka47
Copy link
Contributor Author

acka47 commented Nov 17, 2014

We should use the "concept URI" for the RDF, e.g. http://www.wikidata.org/entity/Q4094.

dr0i added a commit to lobid/lodmill that referenced this issue Nov 17, 2014
See hbz/nwbib#42#issuecomment-63292242
@dr0i
Copy link
Member

dr0i commented Nov 17, 2014

@acka47 Have a look at the test ntriples at the above commit. Merge it if you're positive with it.

@acka47
Copy link
Contributor Author

acka47 commented Nov 18, 2014

I leave this issue open for now as we peobably will have to improve the mapping to wikidata.

@acka47
Copy link
Contributor Author

acka47 commented Nov 20, 2014

Correcting #42 (comment): According to hbz/lobid#91 (comment) the optimum is 250,583. That means that 250,583 - 203,226 = 47,357.

@acka47
Copy link
Contributor Author

acka47 commented Nov 20, 2014

Now I would like to know how many of the linked Wikidata resources actually have geo coordinates. So, what does the query look like for all resources that have a dct:spatial triple but do not have a lv:subjectLocation triple?

@dr0i
Copy link
Member

dr0i commented Nov 20, 2014

 curl -XGET 'http://193.30.112.171:9200/lobid-resources-staging/_search' -d '
{
  "query": {
    "bool" : {
        "must" : {
          "wildcard": { "@graph.http://purl.org/dc/terms/spatial.@id" : "*" }
        },
        "must_not" : {
          "wildcard": { "@graph.http://purl.org/lobid/lv#subjectLocation.@value" : "*" }
      }
    }
  }
} '

=>135

@acka47
Copy link
Contributor Author

acka47 commented Nov 20, 2014

I found an error : All the titles under http://lobid.org/resource?q=Q2176071 should link to http://www.wikidata.org/entity/Q1736878 instead of http://www.wikidata.org/entity/Q2176071

This will probably be correct as soon as we adjust the mapping to wikidata.

@acka47
Copy link
Contributor Author

acka47 commented Nov 24, 2014

Closing this issue as we already added wikidata data. As already noted, the mapping has to be improved, though. Thus, I created #83 for this.

@acka47 acka47 closed this as completed Nov 24, 2014
@acka47 acka47 removed the review label Nov 24, 2014
dr0i added a commit to hbz/lobid that referenced this issue Jan 21, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants