-
-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OSM import stopps without error message #107
Comments
Hi @moosi did you manage to resolve the issue? |
Hey @missinglink, the problem is still present. The import stopped without an error and the pelias cli removed the container, so it assumes the import is finished. I did some more research on this issue. I used a bigger machine for the import (32 cores, 128 GB RAM) and the import stopped at exactly the same document count (100148808 docs). Today I will download a new .pbf and start another import to make sure my osm file is not corrupted. I will keep you updated. |
@missinglink I downloaded the latest osm file (planet-190527.osm.pbf) and checked the md5 sum. The import stopped again after ~100 mio. documents (103504866 docs) so it is safe to say that the .pbf file is not corrupted. |
100 million documents sounds about right for openstreetmap. I'm trying to understand what is the problem? |
Okay, so I may got confused with the documentation saying a full planet import should be around 600 mio. documents. When I look at the current pelias dashboard it tells 538.4 mio. addresses are imported. In turn this would mean that ~440 mio. addresses are imported from openaddresses.io. Or does a single document contain more than 1 address? |
That sounds about right, I think openaddresses has between 400-500 million records worldwide. A single document only contains one 'thing'. |
In this case it seems like pelias is only usable for address search when importing osm and openaddresses.io data (e.g. compared to nominatim that only relies on osm data). Thanks a lot for the input and the fast reply! |
You can choose which data you want to import, but yes, if you only import OSM then you'll have a similar amount of addresses as nominatim. For our geocode.earth cloud service we also import all of openaddresses and then we additionally generate a planet-wide interpolation index which includes TIGER block ranges too. Hopefully that brings us to pretty much complete coverage in the USA and most of Europe, depending on how willing the governments are in each country to provide open data :) |
@missinglink Can I safely ignore these "denormalize failed on relation xxxx...." errors , is it suppose to happen or I'm a missing something ? I can see the the record count increasing by the way in elasticsearch. Thanks ! |
@missinglink I still think there is a problem with the OSM import. I did some tests on my instance and benchmarked the results using api.geocode.earth . Whenever I hit an address that is provided by openaddresses, both return the correct result. When I hit an address that is based on openstreetmap, the api.geocode.earth API will find the address while my instance often returns a fallback (=whosonfirst result). OSM addresses found:
OSM addresses not found:
All full-text addresses are correctly parsed into street, number and city. It seems like for some cities it is working fine, some cites are not imported at all. Also interesting: An address that returns results from openstreetmap and openaddresses using api.geocode.earth will return a fallback on my instance. Thats why I think there is a problem with the OSM import. Do we have a chance to find out when the last successfull import of api.geocode.earth did take place? |
Hi @moosi, I just looked at our current config, at time of writing we are running api You mentioned in the issue description that admin-lookup is disabled, is that still the case? |
There is another full planet build running now which we hope to have available in the next few days. |
I did the OSM import using "planet-190527.osm.pbf" and an openaddresses download of the 6th of June. Admin lookup is enabled. Here a diff comparison for the search "/v1/search?text=Sebastianstraße 53": https://jsoncompare.com/#!/diff/id=53fe004f1c0d35ed382ab53858f55c4c/ The address of "Bonn" is missing and the address of "Dinslaken" is differend/missing. |
It seems that your build is missing the $ curl -s 'localhost:9200/pelias/address/way%2F284476365?pretty'
{
"_index" : "pelias",
"_type" : "address",
"_id" : "way/284476365",
"_version" : 1,
"found" : true,
"_source" : {
"center_point" : {
"lon" : 8.555305,
"lat" : 51.54399
},
"parent" : {
"continent" : [ "Europe" ],
"country" : [ "Germany" ],
"macrocounty_a" : [ null ],
"country_a" : [ "DEU" ],
"locality_a" : [ null ],
"region_id" : [ "85682513" ],
"county" : [ "Paderborn" ],
"locality" : [ "Büren" ],
"continent_a" : [ null ],
"region_a" : [ "NRW" ],
"macrocounty" : [ "Detmold" ],
"county_id" : [ "102063835" ],
"locality_id" : [ "101810347" ],
"continent_id" : [ "102191581" ],
"region" : [ "Nordrhein-Westfalen" ],
"macrocounty_id" : [ "404227571" ],
"country_id" : [ "85633111" ],
"county_a" : [ "PD" ]
},
"name" : {
"default" : "53 Sebastianstraße"
},
"address_parts" : {
"zip" : "33142",
"number" : "53",
"street" : "Sebastianstraße"
},
"source" : "openstreetmap",
"source_id" : "way/284476365",
"layer" : "address"
}
} When the You should see something like this: info: [wof-pip-service:master] macrocounty worker loaded 371 features in 0.742 seconds
info: [wof-pip-service:master] county worker loaded 40639 features in 42.127 seconds |
This is the output of the pip-service, so they seem to load correctly:
|
It looks as though the source data has changed with I'm going to have to close this issue because its quickly got off-topic and turned in to a general support thread. If you need any additional help setting up a full planet build you can contact us for consultancy |
I was also thinking about this, and it would definitely be good to add a message to the OSM importer that indicates a successful (or unsuccessful) completion of the import process. We discussed that in pelias/pelias#255 and it would be nice to use a standard message across all our importers as described there. If anyone wants to take a quick try at adding it, we'd be happy to help them get started with a pull request. |
I was able to run the import of a small area (portland-metro) without any problems but when I try to perform a full planet import, the osm importer stopps after ~100mio. documents without any error message. Right now I do not run the importers in parallel, only the osm importer is running. I use the default configuration of pelias / docker-compose (starting with the configuration using Elasticsearch 2.4 and also tried with the updated configuration using Elasticsearch 5.6). To speed up debugging I disabled admin lookup.
VM Specs:
16 cores
32 GB RAM
1TB Elasticsearch storage (>1,5TB for tmp storage)
Elasticsearch status:
100148808 docs
46379422702 bytes store size
OSM importer log tail:
Elasticsearch log tail:
The text was updated successfully, but these errors were encountered: