Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spatial notation "99" is (ab)used as addtional free text field in source data #17

Closed
acka47 opened this issue May 27, 2014 · 20 comments
Closed
Assignees
Labels

Comments

@acka47
Copy link
Contributor

acka47 commented May 27, 2014

The notation "99" is used in the data (see http://test.lobid.org/nwbib/search?query=n99) but doesn't show up in the classification (SKOS, legacy classification).

This is so because the notation is used as a free text field for information on places, indicating 99 in subfield 'a' and the free text for the place in subfield 'b' of field 700n. In MAB, this looks like: 700n |a 99 |b Hilden

Example values are "Hilden", "Köln", "Recklinghausen" and "Nettetal".

Mapping notation '99' to http://purl.org/lobid/nwbib-spatial#n99 isn't correct then and we should think of a way to get out the free text and add it to the RDF.

@acka47
Copy link
Contributor Author

acka47 commented May 27, 2014

We could use http://purl.org/dc/elements/1.1/coverage for this purpose, e.g.:

<http://lobid.org/resource/BT000084171> <http://purl.org/dc/elements/1.1/coverage> "Hilden" .

@acka47 acka47 added the bug label May 27, 2014
@acka47
Copy link
Contributor Author

acka47 commented May 28, 2014

An even better approach would be imo, if we mapped the strings to actual resources and used http://purl.org/dc/terms/spatial for the linking, e.g.:
<http://lobid.org/resource/BT000084171> <http://purl.org/dc/terms/spatial> <http://sws.geonames.org/2904795/> .

@dr0i
Copy link
Member

dr0i commented Jul 7, 2014

Which property should be used to link to openstreetmap.org , be it a relation,node or a way ?

@acka47
Copy link
Contributor Author

acka47 commented Jul 8, 2014

To have this information somewhere here some numbers on the coverage of notation "99" in NWBib data.

As of today:

@acka47
Copy link
Contributor Author

acka47 commented Jul 8, 2014

As Simon Schneider suggested at the Friday meeting, we should rather link to http://linkedgeodata.org although there are some properties in the data that don't dereference, i.e. that are not part of a documented vocabulary. Relation URIs look like http://linkedgeodata.org/triplify/relation66555.

I propose using the same property (dct:spatial) as for the link to GeoNames. Example:

<http://lobid.org/resource/BT000084171> <http://purl.org/dc/terms/spatial> <http://sws.geonames.org/2904795/> , <http://linkedgeodata.org/triplify/relation66555> .

@dr0i
Copy link
Member

dr0i commented Aug 5, 2014

With lobid/lodmill#499 deployed the data is partly there, see e.g. http://lobid.org/resource/HT015617942.
It's just partly there because the data is build via API lookups and SQL storage when successfully uplooked but when API lookups made too often the server returns "420 calm down". That means, at somepoint the concordance storage is build up in total and new lookups are not needed respectively become very seldom. Also, there may be other causes that lead to unsuccsessfully lookups, e.g. strange characters in the lookup URI (which would be a TODO: clean lookup strings).

Some stats @acka47:

  • number of distinct strings that are used with notation "99": 4.385 (going to be some more)
  • number of links created for now: 155.545 (of 234.082 theoretical possible) <=> 66% (of theroretical possible)

@dr0i
Copy link
Member

dr0i commented Aug 12, 2014

After new indexing at staging we now have 236.838 records with newly generated links to geonames and linkedgeodata (slightly more than the proclaimed 234.082).
Try e.g. http://test.lobid.org/resource/BT000084171.

@acka47
Copy link
Contributor Author

acka47 commented Aug 13, 2014

Looks great. I found one entry, where it didn't work, though: http://lobid.org/resource/HT017502874/about.

@dr0i
Copy link
Member

dr0i commented Aug 13, 2014

Yes, that is one of the 393 problematic cases where the lookup fails. There are two types errors: HTTP status code 400 and 420. The former is a HTTP response code indicating a real problem with the URL, etc http://nominatim.openstreetmap.org/search/de/nrw/'horn?&limit=1&format=json&addressdetails=1&json_callback=D'horn and the logs reveal 5 such problematic URLs. The 388 other are of code 420, which is a not exactly defined status code and may mean "calm down" because there were too much lookups per second, so these will disappear with the time (see above comment). Example for a 420:
java.io.IOException: Server returned HTTP response code: 420 for URL: http://nominatim.openstreetmap.org/search%3Fq%3DM%C3%B6hnesee+Ort&limit=1&format=json&addressdetails=1&json_callback=MöhneseeOrt'

@acka47
Copy link
Contributor Author

acka47 commented Aug 19, 2014

+1 for this, we have to fix notation 97 and 96 though, see #53

@acka47 acka47 assigned dr0i and unassigned acka47 Aug 19, 2014
@acka47 acka47 added deploy and removed review labels Aug 19, 2014
@dr0i
Copy link
Member

dr0i commented Aug 19, 2014

Ok, these are other issues. Closing this one.

@acka47
Copy link
Contributor Author

acka47 commented Sep 4, 2014

Just nopticed that there are now a lot of errors in the data.

Examples:

These haven't been there from the beginning but I obviously missed to test properly before saying "+1"... Reopening.

@acka47 acka47 reopened this Sep 4, 2014
@acka47 acka47 added working and removed deploy labels Sep 4, 2014
dr0i added a commit to lobid/lodmill that referenced this issue Sep 4, 2014
dr0i added a commit to lobid/lodmill that referenced this issue Sep 4, 2014
dr0i added a commit to lobid/lodmill that referenced this issue Sep 4, 2014
@dr0i
Copy link
Member

dr0i commented Sep 9, 2014

I remember there was a problem regarding data parsing. The error was fixed, but the geo api lookups resulting in false data was stored in database and this was not cleansed.
Now, created the database anew and e.g. http://test.lobid.org/nwbib/HT018312567 is correct.

@dr0i dr0i assigned acka47 and unassigned dr0i Sep 9, 2014
@dr0i dr0i added review and removed working labels Sep 9, 2014
@acka47
Copy link
Contributor Author

acka47 commented Sep 9, 2014

The links for "Düsseldorf" are better. "Köln" links to "Altstadt-Nord" in GeoNames, though. See for example http://test.lobid.org/nwbib/BT000088884.

@dr0i
Copy link
Member

dr0i commented Sep 9, 2014

This "wrong" link to geonames is a result of the way we build these links - there must be a better way, but for mow it's just like that:
Lookup nominatim using the literals, parse lat/lon from the result and use these to lookup geonames-API.

I remember that asking geonames with the literals directly too often it results in no result at all.
From that it's clear: the linkedgeodata.org links shoul be less error prone than the geonames link we created. (but then, the linkedgeodata are much more worth as they yield more geocoordinates, don't they?)

@acka47
Copy link
Contributor Author

acka47 commented Sep 9, 2014

You are right that LinkedGeoData is better in terms of geo coordinates as we get polygons from it. On the other side GeoNames is better for multilingual labels (which we currently don't need, though)...

@acka47
Copy link
Contributor Author

acka47 commented Sep 9, 2014

We have to improve this in the future. Maybe we should leave out the Geonames links in the HTML for now...

@fsteeg
Copy link
Member

fsteeg commented Sep 9, 2014

I still think we wrongly invested in this upfront, and think we should not put additional work into this until more pressing issues are solved (first and foremost, reliable daily updates with monitoring).

@dr0i dr0i removed the review label Sep 9, 2014
dr0i added a commit to lobid/lodmill that referenced this issue Sep 29, 2014
See hbz/nwbib#17

* pass-through of '<' and '>' to be used by following morphs
* ',' and especially '-' are necessary when lookup nominatim API
* avoid conflation of words by substituting '(','/',')' with '%20'
* add tests
dr0i added a commit to lobid/lodmill that referenced this issue Oct 1, 2014
Pass the unmodified literal to build the query. Clean the
literal to be used as JSONP callback ID in the geo morph,
NOT already in the preceding morph.

See hbz/nwbib#17.
@acka47
Copy link
Contributor Author

acka47 commented Oct 13, 2014

BTW, it is similar with notations 96 & 97, see #53.

@acka47
Copy link
Contributor Author

acka47 commented Oct 27, 2014

Closing, as the original issue is solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants