Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index failing with connection reset by peer #30

Open
bnewbold opened this issue Jan 31, 2020 · 1 comment
Open

Index failing with connection reset by peer #30

bnewbold opened this issue Jan 31, 2020 · 1 comment

Comments

@bnewbold
Copy link

I twice attempted to import over 140 million documents into a local, single-node ES 6.8 cluster using a command like the following:

zcat /srv/fatcat/snapshots/release_export_expanded.json.gz |  pv -l | parallel -j20 --linebuffer --round-robin --pipe ./fatcat_transform.py elasticsearch-releases - - | esbulk -verbose -size 10000 -id ident -w 6 -index qa_release_v03b -type release

This is with esbulk 0.5.1. I will retry with the latest 0.6.0.

The index almost completed, but after more than 100m documents, failed with an error like:

2020/01/31 11:49:40 Post http://localhost:9200/_bulk: net/http: HTTP/1.x transport connection broken: write tcp [::1]:56970->[::1]:9200: write: connection reset by peer                                                                      
Warning: unable to close filehandle properly: Broken pipe during global destruction

(the "Warning" part might be one of the other pipeline commands)

I suspect this is actually a problem on the Elasticsearch side... maybe something like a GC pause? I looked in ES logs and see that there were garbage collects up until the time of failure, and none after, but no particularly large or noticeable GC right around the failure.

I would expect the esbulk HTTP retries to resolve any such issues; I assume in this case all the retries failed. Perhaps longer, more, or exponential back-offs would help. Unfortunately, I suspect that this failure may be difficult to reproduce reliably, as it has only occurred with these very large imports.

esbulk has been really useful, thank you for making it available and any maintenance time you can spare!

@bnewbold
Copy link
Author

As a follow-up on this issue, if I recall correctly the root issue was having individual batches that were too large (in bytes, not number of documents) and ES would refuse them. Worked around this by decreasing batch size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant