Request Timeout during Import #85

ktjaco · 2016-04-11T22:52:20Z

I'm in the process of installing Pelias without vagrant on my Ubuntu machine as well as a Centos machine.

On the Ubuntu machine I tried to npm install OpenAddresses and node import.js using node version v0.10.38. Which resulted in the following error.

node import.js
2016-04-11T21:59:53.991Z - info: [openaddresses] Importing 1 files.
2016-04-11T21:59:54.202Z - info: [openaddresses] Creating read stream for: /home/user/pelias/openaddresses/data/au/countrywide.csv
2016-04-11T22:08:41.242Z - error: [dbclient] esclient error Error: Request Timeout after 120000ms
    at /home/kent/pelias/openaddresses/node_modules/pelias-dbclient/node_modules/elasticsearch/src/lib/transport.js:340:15
    at null.<anonymous> (/home/kent/pelias/openaddresses/node_modules/pelias-dbclient/node_modules/elasticsearch/src/lib/transport.js:369:7)
    at Timer.listOnTimeout [as ontimeout] (timers.js:112:15)
2016-04-11T22:08:41.243Z - error: [dbclient] invalid resp from es bulk index operation
2016-04-11T22:08:41.243Z - info: [dbclient] retrying batch [500]

This was resolved by switching to a newer version of node (v0.12.0). I could import a large dataset like Australia (countrywide) without an issue. I've also tried using both master and production branches.

I tried to do this on a Centos machine and got the above error with both versions of node. Would this be an issue with the OS I am using or a RAM problem?

The text was updated successfully, but these errors were encountered:

orangejulius · 2016-04-12T15:26:31Z

Hey @curiousc0w,
This is interesting. I tried the OSM importer just now with node 0.10.38, and I wasn't able to reproduce the issue. Can you paste the output of npm ls as well as git show?

ktjaco · 2016-04-13T04:38:49Z

@orangejulius
Umm, I was able to start an import without the runtime error on the Centos machine. I'm not sure what would have resolved the issue. Possibly removing node_modules...

Also, I was told that the production branches would be stable and not be updated although I'm having some issues querying certain place names. Here's the link to what I'm referring to.

pelias/api#498

orangejulius · 2016-04-13T16:11:13Z

Ah yes, clearing the node_modules directory and rerunning npm install fixes everything

ktjaco · 2016-04-13T16:33:42Z

@orangejulius
Also, do you have an idea of how big a global OpenAddresses import might be? I'm in the process of importing global addresses supported in OpenAddresses. I've been using ElasticHQ to analyse the database growth of during the import process. I'm currently on New York City and ElasticHQ is telling me that my Pelias index has a total and primary size of 44.7 GB.

orangejulius · 2016-04-13T16:48:59Z

It's quite large, well over 200 million records. Our full planet build comes to 150GB, so with just OA, you're probably looking at around 100GB.

ktjaco · 2016-04-13T18:05:54Z

@orangejulius
Thank you! Do you know how long a full OA process should take?

So I just ran into the timeout error/bug again. It looped through the error I posted above for 5-10 minutes and the import re-started again at us/oh/adams.csv. I'm not sure if this is an error or not but this wasn't the first time that it stopped and restarted again.

orangejulius · 2016-04-13T18:10:25Z

It will take quite a while: our full build takes almost two days. We run all of our importers in parallel, and until recently OA was the one that took the longest. Since then we've actually split the OA import into two parts which we run in parallel.

Now that I think about it, I've seen this timeout issue when the machine doing the importing is overloaded. This can be because the combination of Elasticsearch and the importer is too much for one machine, or because Elasticsearch is running out of memory, so definitely make sure you're watching the utilization of your hardware. Our dev cluster uses 4 r3.xlarge instances, for a total of 120GB of memory. Near the end of the import process, when there's lots of data in play, they are using most of their 4 CPUs' capacity.

ktjaco · 2016-04-13T19:31:48Z

@orangejulius
Thanks for the info. I was started to get worried about this timeout error.

ktjaco · 2016-04-14T15:09:45Z

@orangejulius
So I've achieved a full OpenAddresses import and it completed in about 31 hours.

I've tried to query for an address in Toronto, Ontario. And I received the following error. I've searched around and can't find any issues that other users have had with this in Pelias.

{"geocoding":{"version":"0.1","attribution":"http://pelias.mapzen.com/v1/attribution","query":{"text":"8 yonge street","parsed_text":{"number":"8","street":"yonge street","regions":[]},"size":10,"private":false,"querySize":20},"errors":[{"status":503,"displayName":"ServiceUnavailable","message":"SearchPhaseExecutionException[Failed to execute phase [query_fetch], all shards failed]"}],"engine":{"name":"Pelias","author":"Mapzen","version":"1.0"},"timestamp":1460631925507},"type":"FeatureCollection","features":[]}

orangejulius · 2016-04-14T15:35:43Z

that might be Elasticsearch telling you it's completely overloaded. You can probably find more info in the API logs (they go to the console by default), or the elasticsearch logs (I can never remember where they go, and it depends on your system)

ktjaco · 2016-04-14T16:47:22Z

@orangejulius
There appeared to be nothing in the Pelias/API logs. I tried restarting elasticsearch (service elasticsearch restart). Following that I started the Pelias API (npm start), then making a query. While making the query I did a tail -f elasticsearch.log and this is was displayed.

[2016-04-14 08:37:14,396][INFO ][node                     ] [Warhawk] stopping ...
[2016-04-14 08:38:10,967][INFO ][node                     ] [Margo Damian] version[1.7.3], pid[11170], build[05d4530/2015-10-15T09:14:17Z]
[2016-04-14 08:38:10,968][INFO ][node                     ] [Margo Damian] initializing ...
[2016-04-14 08:38:11,065][INFO ][plugins                  ] [Margo Damian] loaded [], sites []
[2016-04-14 08:38:11,103][INFO ][env                      ] [Margo Damian] using [1] data paths, mounts [[/ (/dev/mapper/vg_livedvd-lv_root)]], net usable_space [93.6gb], net total_space [150.5gb], types [ext4]
[2016-04-14 08:38:13,569][INFO ][node                     ] [Margo Damian] initialized
[2016-04-14 08:38:13,569][INFO ][node                     ] [Margo Damian] starting ...
[2016-04-14 08:38:13,741][INFO ][transport                ] [Margo Damian] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.168.1.102:9300]}
[2016-04-14 08:38:13,758][INFO ][discovery                ] [Margo Damian] elasticsearch/tv6a95nkSr6qKfVes8e_1A
[2016-04-14 08:38:17,529][INFO ][cluster.service          ] [Margo Damian] new_master [Margo Damian][tv6a95nkSr6qKfVes8e_1A][localhost.localdomain][inet[/192.168.1.102:9300]], reason: zen-disco-join (elected_as_master)
[2016-04-14 08:38:17,630][INFO ][http                     ] [Margo Damian] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.1.102:9200]}
[2016-04-14 08:38:17,631][INFO ][node                     ] [Margo Damian] started
[2016-04-14 08:38:17,666][INFO ][gateway                  ] [Margo Damian] recovered [1] indices into cluster_state
[2016-04-14 08:38:54,266][DEBUG][action.search.type       ] [Margo Damian] All shards failed for phase: [query_fetch]
org.elasticsearch.index.shard.IllegalIndexShardStateException: [pelias][0] CurrentState[RECOVERING] operations only allowed when started/relocated
    at org.elasticsearch.index.shard.IndexShard.readAllowed(IndexShard.java:1004)
    at org.elasticsearch.index.shard.IndexShard.acquireSearcher(IndexShard.java:797)
    at org.elasticsearch.index.shard.IndexShard.acquireSearcher(IndexShard.java:793)
    at org.elasticsearch.search.SearchService.createContext(SearchService.java:564)
    at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:544)
    at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:385)
    at org.elasticsearch.search.action.SearchServiceTransportAction$11.call(SearchServiceTransportAction.java:333)
    at org.elasticsearch.search.action.SearchServiceTransportAction$11.call(SearchServiceTransportAction.java:330)
    at org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:559)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

It may very well be an overload. I have another VM doing an import so RAM is running very close to full capacity.

ktjaco closed this as completed Apr 13, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request Timeout during Import #85

Request Timeout during Import #85

ktjaco commented Apr 11, 2016

orangejulius commented Apr 12, 2016

ktjaco commented Apr 13, 2016

orangejulius commented Apr 13, 2016

ktjaco commented Apr 13, 2016

orangejulius commented Apr 13, 2016

ktjaco commented Apr 13, 2016

orangejulius commented Apr 13, 2016

ktjaco commented Apr 13, 2016

ktjaco commented Apr 14, 2016

orangejulius commented Apr 14, 2016

ktjaco commented Apr 14, 2016

Request Timeout during Import #85

Request Timeout during Import #85

Comments

ktjaco commented Apr 11, 2016

orangejulius commented Apr 12, 2016

ktjaco commented Apr 13, 2016

orangejulius commented Apr 13, 2016

ktjaco commented Apr 13, 2016

orangejulius commented Apr 13, 2016

ktjaco commented Apr 13, 2016

orangejulius commented Apr 13, 2016

ktjaco commented Apr 13, 2016

ktjaco commented Apr 14, 2016

orangejulius commented Apr 14, 2016

ktjaco commented Apr 14, 2016