Index failing with `connection reset by peer` #30

bnewbold · 2020-01-31T20:07:25Z

I twice attempted to import over 140 million documents into a local, single-node ES 6.8 cluster using a command like the following:

zcat /srv/fatcat/snapshots/release_export_expanded.json.gz |  pv -l | parallel -j20 --linebuffer --round-robin --pipe ./fatcat_transform.py elasticsearch-releases - - | esbulk -verbose -size 10000 -id ident -w 6 -index qa_release_v03b -type release

This is with esbulk 0.5.1. I will retry with the latest 0.6.0.

The index almost completed, but after more than 100m documents, failed with an error like:

2020/01/31 11:49:40 Post http://localhost:9200/_bulk: net/http: HTTP/1.x transport connection broken: write tcp [::1]:56970->[::1]:9200: write: connection reset by peer                                                                      
Warning: unable to close filehandle properly: Broken pipe during global destruction

(the "Warning" part might be one of the other pipeline commands)

I suspect this is actually a problem on the Elasticsearch side... maybe something like a GC pause? I looked in ES logs and see that there were garbage collects up until the time of failure, and none after, but no particularly large or noticeable GC right around the failure.

I would expect the esbulk HTTP retries to resolve any such issues; I assume in this case all the retries failed. Perhaps longer, more, or exponential back-offs would help. Unfortunately, I suspect that this failure may be difficult to reproduce reliably, as it has only occurred with these very large imports.

esbulk has been really useful, thank you for making it available and any maintenance time you can spare!

The text was updated successfully, but these errors were encountered:

bnewbold · 2020-03-27T17:38:55Z

As a follow-up on this issue, if I recall correctly the root issue was having individual batches that were too large (in bytes, not number of documents) and ES would refuse them. Worked around this by decreasing batch size.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Index failing with `connection reset by peer` #30

Index failing with `connection reset by peer` #30

bnewbold commented Jan 31, 2020

bnewbold commented Mar 27, 2020

Index failing with connection reset by peer #30

Index failing with connection reset by peer #30

Comments

bnewbold commented Jan 31, 2020

bnewbold commented Mar 27, 2020

Index failing with `connection reset by peer` #30

Index failing with `connection reset by peer` #30