-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maximize number of containedIn statements #253
Comments
Would be possible with Mapzen API [1, 2], but that would both significantly increase our Mapzen call count, and the transformation runtime. The current approach based on the Gemeindeschlüssel is much more efficient. Since we also want to improve RS/AGS (see #134), I think we should stay with the current approach and try to improve the numbers from that side. [1] https://search.mapzen.com/v1/search?text=50676+Köln&sources=geonames&layers=coarse |
With the increased number of AGS values on staging, we have more containedIn values, too: http://beta.lobid.org/organisations/search?q=containedIn:* These are still way lower than the AGS numbers. Should there be a geonames value for every AGS? Are these missing in https://raw.githubusercontent.com/hbz/lookup-tables/master/data/geonames-map.csv? @SBRitter, where did you get that data from? |
Here is the current status from the perspective of entries missing a http://beta.lobid.org/organisations/search?q=_missing_:containedIn |
In the lodmill repo is the file https://github.com/lobid/lodmill/blob/master/lodmill-rd/src/main/resources/geonames_DE.csv. It's a simple csv, the source must be http://download.geonames.org/export/dump/DE.zip. Transformation of this csv is done in lodmill repo using metamorph (https://github.com/lobid/lodmill/blob/master/lodmill-rd/src/main/resources/morphGeonamesCsv2ld.xml) to create triples, which are linked by the Gemeindeschlüssel-Object (found in ISIL field |
Ah, and in the lodmill csv there are way more entries (~180k) than in comparison to the lookup-table repo (~11k). |
OK, trying to understand how to proceed here and how it's all connected. From #253 (comment) it seems the issue is that we are missing too many http://test.lobid.org/organisations/search?q=_missing_:containedIn A potential solution would be to use https://github.com/lobid/lodmill/blob/master/lodmill-rd/src/main/resources/geonames_DE.csv, which contains data occuring in |
Links to GeoNames address some interesting use cases, e.g. you can get data on population size and make queries like: get all museums in places with < 10,000 residents. As this will be easy to fix as soon as #268 is finished, we shouldn't close it. |
With our own pelias service running, we could do it like @fsteeg suggested in #253 (comment). |
See also lobid/lodmill#488. |
Currently, we have ~6700 entries with a
containedIn
link to geonames. We get this by querying geonames with the Gemeindeschlüssel, see the current morp-enriched, lines 331-335The link is missing for ~15k, see http://beta.lobid.org/organisations/search?q=_missing_:containedIn. As mapzen also provides geonames data, we should probably query it with address plus Gemeindeschlüssel (if available).
The text was updated successfully, but these errors were encountered: