Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exponential backoff retries for ProvisionedThroughputExceeded errors #222

Merged
merged 1 commit into from Mar 15, 2017

Conversation

JohnEmhoff
Copy link
Contributor

Currently this is a hard failure -- retry with backoff instead. Handling this client side is hard because batch_write.commit() clears its list of pending operations on failure.

Related issue: #218

@coveralls
Copy link

coveralls commented Jan 18, 2017

Coverage Status

Coverage remained the same at 97.912% when pulling 47eb37f on JohnEmhoff:retry-throughput-errors into 24e14ca on jlafon:devel.

elif response.status_code < 500:
# We don't retry on a ConditionalCheckFailedException or other 4xx because we assume they will
# fail in perpetuity. Retrying when there is already contention could cause other problems
elif response.status_code < 500 and code != 'ProvisionedThroughputExceededException':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make it configurable

@brandond
Copy link

brandond commented Mar 1, 2017

@JohnEmhoff Are you still working on this?

@JohnEmhoff
Copy link
Contributor Author

JohnEmhoff commented Mar 1, 2017 via email

@bedge
Copy link
Contributor

bedge commented Mar 11, 2017

I"m doing this (some ugly variant anyway) in all my clients already, so definitely a +1 to merge.
I see no reason not to do this for all cases.

@lukedeo
Copy link

lukedeo commented Mar 11, 2017

+1 here as well

@bedge
Copy link
Contributor

bedge commented Mar 15, 2017

Any updates on this? Merging this addresses a known issue, which as of right now is a PITA to deal with.
Is there really any downside to retrying?

@danielhochman
Copy link
Contributor

at Lyft we prefer backpressure to go to the client. i will merge since we actually don't use the retry within PynamoDB in most cases.

it actually seems like a bigger bug that batch_write will clear on error. retry won't guarantee success, particularly when experiencing ProvisionedThroughputExceptions.

can someone create an issue to track that in more detail?

@danielhochman danielhochman merged commit e94d496 into pynamodb:devel Mar 15, 2017
@bedge
Copy link
Contributor

bedge commented Mar 16, 2017

Agree that the batch_write clear on error is a more significant issue. However, this addresses a specific case we see regularly, which is that capacity provisioning lags the dynamodb I/O load and retries early on in the capacity scale-up phase eliminate a bunch of client-side retries.

thanks!

@danielhochman
Copy link
Contributor

@JohnEmhoff @bedge @lukedeo @brandond released as part of 2.1.5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants