Retry bulk chunk requests to Elasticsearch 2, 5, and 6 #1626

soundofjw · 2019-01-23T19:55:17Z

Implements retries when saving bulk documents to Elasticsearch.
This is useful for transient connection problems and other failures which are temporary in nature.

The current behavior will cause the entire "Save to Elasticsearch" step to fail and retry, which creates redundant saves to the cluster. There is no existing method to enable a single failed chunk to retry.

This behavior is implemented with two new options, withMaxRetries and withRetryPause, which can be passed in the configuration as "maxRetries" and "retryPause" respectively.

For completeness, this also patches a typo in the name of the "Write to Elasticesarch" transform ;)

TODO:

Implement FluentBackoff. See comments.

Here we see the logs for the re-attempts in action:

Implements retries when saving bulk documents to Elasticsearch. This is useful for transient connection problems and other failures which are temporary in nature. The current behavior will cause the entire "Save to Elasticsearch" step to fail and retry, which creates redundant saves to the cluster. There is no existing method to enable a single failed chunk to retry. This behavior is implemented with two new options, withMaxRetries and withRetryPause, which can be passed in the configuration as "maxRetries" and "retryPause" respectively.

codecov · 2019-01-23T20:12:19Z

Codecov Report

Merging #1626 into master will decrease coverage by 3.8%.
The diff coverage is 0%.

@@            Coverage Diff            @@
##           master   #1626      +/-   ##
=========================================
- Coverage    72.7%   68.9%   -3.81%     
=========================================
  Files         186     174      -12     
  Lines        5764    5268     -496     
  Branches      329     321       -8     
=========================================
- Hits         4191    3630     -561     
- Misses       1573    1638      +65

Impacted Files	Coverage Δ
...m/spotify/scio/elasticsearch/ElasticsearchIO.scala	`0% <0%> (ø)`	⬆️
...m/spotify/scio/elasticsearch/ElasticsearchIO.scala	`0% <0%> (ø)`	⬆️
...m/spotify/scio/elasticsearch/ElasticsearchIO.scala	`0% <0%> (ø)`	⬆️
.../spotify/scio/bigquery/client/BigQueryConfig.scala	`0% <0%> (-70.59%)`	⬇️
...scala/com/spotify/scio/bigquery/client/Cache.scala	`0% <0%> (-66.67%)`	⬇️
.../spotify/scio/bigquery/BigQueryPartitionUtil.scala	`0% <0%> (-58.98%)`	⬇️
...scala/com/spotify/scio/bigquery/BigQueryUtil.scala	`50% <0%> (-50%)`	⬇️
...la/com/spotify/scio/bigquery/client/QueryOps.scala	`0.79% <0%> (-47.15%)`	⬇️
...com/spotify/scio/bigquery/types/BigQueryType.scala	`37.5% <0%> (-25%)`	⬇️
...la/com/spotify/scio/bigquery/client/TableOps.scala	`0% <0%> (-22.37%)`	⬇️
... and 85 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b0d8103...d8c1724. Read the comment docs.

regadas · 2019-01-23T20:24:26Z

Hi @soundofjw thanks for your PR! 😄 when I added the ES6 module this was one of the things I had in my todo list! Thank you!

At first look this seems ok, however I think i would like to see this using the FluentBackOff already provided by apache beam. wdyt? Do you think it's feasible for you to reflect this on your PR?

Some of the exceptions thrown are not caught within the Client bulk() method.

soundofjw · 2019-01-23T21:12:24Z

@regadas I have not prior knowledge of this - I think it's a great idea.
I'll add this to the PR proper the coming week. Can you put a "WIP" label on this sucker?

…rch method" This exceeds the number of max params in scala style. So rely on it coming as cmdLine arg. This reverts commit 00c299f.

regadas

@soundofjw cool! 👍 lets use FluentBackoff

soundofjw · 2019-02-19T20:07:27Z

@regadas FluentBackoff changes in - I did not go through the trouble of adding every FluentBackoff argument. Let me know if you think this is desired - my experience suggests it may be OK to have the defaults.

regadas · 2019-02-20T16:48:15Z

scio-elasticsearch/es2/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java

        this.maxBulkRequestSize = maxBulkRequestSize;
        this.clientSupplier = new ClientSupplier(clusterName, servers);
        this.toActionRequests = toActionRequests;
        this.error = error;
+
+        this.backoffConfig = FluentBackoff.DEFAULT
+            .withMaxRetries(maxRetries)


Suggested change

.withMaxRetries(maxRetries)

.withMaxRetries(maxRetries - 1)

If the param were maxAttempts, this would make sense - but when we say "3 retries," we are effectively saying 4 attempts. I'd resolve to keep this as is.

regadas · 2019-02-20T16:51:59Z

scio-elasticsearch/es2/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java

+
+          do {
+            try {
+              final BulkRequest bulkRequest = new BulkRequest().add(chunk).refresh(false);


@soundofjw no need for .refresh(false) it's already the default.

Yes, though it should be noted that when we are indexing with a refresh interval of -1, we're getting refreshes anyway (after a certain point, probably hitting the maximum amount of docs that can hang in the bulk queue) - it's really hard to know why this is happening. But to your point, changing this had no effect.

Yeah refresh at indexing time can happen do to several things ... document sizes, nr documents, queues... I guess at some point we can offer support, but let's not do it in the same PR, next one! 😄

regadas · 2019-02-20T19:52:01Z

scio-elasticsearch/es2/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java

-                    final long numOfShard,
-                    final int maxBulkRequestSize,
-                    final ThrowingConsumer<BulkExecutionException> error) {
+          final InetSocketAddress[] servers,


@soundofjw can you undo the unrelated changes, keep the format?

regadas · 2019-02-20T19:57:24Z

scio-elasticsearch/es2/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java

        this.maxBulkRequestSize = maxBulkRequestSize;
        this.clientSupplier = new ClientSupplier(clusterName, servers);
        this.toActionRequests = toActionRequests;
        this.error = error;
+
+        this.backoffConfig = FluentBackoff.DEFAULT


@soundofjw This is actually needs to be inside a

@Setup public void setup() throws Exception { ... }

regadas · 2019-02-20T20:02:47Z

scio-elasticsearch/es2/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java

+
+          final BackOff backoff = backoffConfig.backoff();
+
+          do {


@soundofjw wdyt of remove this do { ... } while(true) in favor of a

while (BackOffUtils.next(Sleeper.DEFAULT, backoff)) { ... }

removes the need for all of this

try { final long sleepTime = backoff.nextBackOffMillis(); if (sleepTime == BackOff.STOP) { break; } Thread.sleep(sleepTime); } catch (InterruptedException | IOException e1) { LOG.error("Interrupt during backoff", e1); break; }

Your proposal will introduce a sleep at the beginning of every attempt:

public static boolean next(Sleeper sleeper, BackOff backOff) throws InterruptedException, IOException { long backOffTime = backOff.nextBackOffMillis(); if (backOffTime == BackOff.STOP) { return false; } sleeper.sleep(backOffTime); return true; }```

Yeah sorry for poor explanation, yeah correct... what I was trying to say is that I think we can refactor this block by decoupling the logic a little bit and at the same time trying to leverage that util for the retry logic ... I think we can get a better semantic and readability.

regadas · 2019-02-22T15:18:18Z

scio-elasticsearch/es2/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java

+
+          do {
+            try {
+              final BulkRequest bulkRequest = new BulkRequest().add(chunk).refresh(false);


Yeah refresh at indexing time can happen do to several things ... document sizes, nr documents, queues... I guess at some point we can offer support, but let's not do it in the same PR, next one! 😄

regadas · 2019-02-22T15:20:49Z

scio-elasticsearch/es5/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java

+          do {
+            try {
+              final BulkRequest bulkRequest = new BulkRequest().add(chunk)
+                  .setRefreshPolicy(WriteRequest.RefreshPolicy.NONE);


same here ... this policy is the default one.

regadas · 2019-02-22T15:27:33Z

scio-elasticsearch/es2/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java

+
+          final BackOff backoff = backoffConfig.backoff();
+
+          do {


Yeah sorry for poor explanation, yeah correct... what I was trying to say is that I think we can refactor this block by decoupling the logic a little bit and at the same time trying to leverage that util for the retry logic ... I think we can get a better semantic and readability.

jto · 2019-04-09T09:47:48Z

@regadas what's the status of this ?

regadas · 2019-04-09T20:19:52Z

Closed via #1826

Josh Whelchel added 2 commits January 23, 2019 16:04

Catch all exceptions within the chunks.forEach block.

9605842

Some of the exceptions thrown are not caught within the Client bulk() method.

Adds new params maxRetries and retryPause to saveAsElasticsearch method

00c299f

regadas added the WIP label Jan 23, 2019

Revert "Adds new params maxRetries and retryPause to saveAsElasticsea…

b4d95d1

…rch method" This exceeds the number of max params in scala style. So rely on it coming as cmdLine arg. This reverts commit 00c299f.

regadas requested changes Jan 23, 2019

View reviewed changes

Implements FluentBackoff

2bd5492

regadas reviewed Feb 20, 2019

View reviewed changes

Josh Whelchel added 2 commits February 21, 2019 14:06

Implements @setup for ElasticsearchWriters, restores formatting

87e7805

Restore indentation formatting on ES Bound constructors

d8c1724

regadas reviewed Feb 22, 2019

View reviewed changes

jto added this to the 0.8.0 milestone Mar 22, 2019

nevillelyh force-pushed the master branch 2 times, most recently from 9bc5e21 to b0d8103 Compare March 29, 2019 15:06

jto requested a review from regadas April 9, 2019 09:47

regadas closed this Apr 9, 2019

soundofjw mentioned this pull request May 22, 2019

Add fluent backoff support for ES #1826

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry bulk chunk requests to Elasticsearch 2, 5, and 6 #1626

Retry bulk chunk requests to Elasticsearch 2, 5, and 6 #1626

soundofjw commented Jan 23, 2019 •

edited

codecov bot commented Jan 23, 2019 •

edited

regadas commented Jan 23, 2019

soundofjw commented Jan 23, 2019

regadas left a comment

soundofjw commented Feb 19, 2019

regadas Feb 20, 2019

soundofjw Feb 21, 2019

regadas Feb 20, 2019

soundofjw Feb 21, 2019

regadas Feb 22, 2019 •

edited

regadas Feb 20, 2019

regadas Feb 20, 2019

regadas Feb 20, 2019

soundofjw Feb 21, 2019

regadas Feb 22, 2019

regadas Feb 22, 2019 •

edited

regadas Feb 22, 2019

regadas Feb 22, 2019

jto commented Apr 9, 2019

regadas commented Apr 9, 2019

Retry bulk chunk requests to Elasticsearch 2, 5, and 6 #1626

Retry bulk chunk requests to Elasticsearch 2, 5, and 6 #1626

Conversation

soundofjw commented Jan 23, 2019 • edited

codecov bot commented Jan 23, 2019 • edited

Codecov Report

regadas commented Jan 23, 2019

soundofjw commented Jan 23, 2019

regadas left a comment

Choose a reason for hiding this comment

soundofjw commented Feb 19, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

regadas Feb 22, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

regadas Feb 22, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jto commented Apr 9, 2019

regadas commented Apr 9, 2019

soundofjw commented Jan 23, 2019 •

edited

codecov bot commented Jan 23, 2019 •

edited

regadas Feb 22, 2019 •

edited

regadas Feb 22, 2019 •

edited