New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Service stops when an exception occurs in table.py:ensure_provisioning #134
Comments
Thanks for the report. I have looked into it and the exception should now be catched and the daemon shouldn't die, but rather retry 3 times before failing. I have released a test version for this
|
This has now been released in version 1.10.2. Thanks for the bug report! |
I installed 1.10.2b3 and overnight, I still ran into the same issue.
|
Thanks for getting back on this. I think there are two problems here. Firstly that the service stopped when this error occurred and secondly that we seem to run into some limit in number of requests (?) to CloudWatch. I'll contact AWS to get more information and a suggested approach to manage the exceeded rate limit. Currently Dynamic DynamoDB will contact CloudWatch 4 times per table or GSI that we are looking at. Shouldn't be too much in my opinion, but I'll see how we can handle that. Have the first problem been fixed (it will catch the error three times and then crash if the problem persists)? |
Could you give me a hint about how many tables + GSIs you have Dynamic DynamoDB configure to monitor? |
Yeah, it doesn't seem like it should be anywhere near a cloudwatch rate limit. It also only happens about once a day. I don't see anything in the log file that indicates it was attempted more than once. Instead of crashing, could a message be sent to the sns topic, otherwise I have to continue to check if it's still running. |
Currently SNS topics are configured on a per table / GSI basis. I have planned, and now opened an issue (#136), to implement SNS topic support on a global level. That would make it possible to get a notification when an error occurs. I will not be able to implement that right now, but maybe during next week. |
I'm currently monitoring 3 tables and 1 gsi, but once I get all my variables worked out, I intend to monitor quite a bit more. Also, I've got a couple quick questtons for you.
|
Thanks, I'll contact AWS to get more info about the CloudWatch issue.
Please file issues for the separate features if you'd like them to get on the road map, I think that it is interesting ideas. |
1a) I think this is the way I'd most want use increasing tables, just want to increase when throttling occurs. Another option I was thinking about. It would be nice to decrease to a percentage of consumed capacity. For example, if my current provisioned throughput is 200, and the last cloud watch reading was 100, it would be nice to set to 125% of consumed, so in this case, set to 125 provisioned. I think this would maximize the effectiveness of decreases. |
I have opened #137 to address question 1. |
The CloudWatch team will be looking into the reason behind the exceeded rate limit in your request above. To make Dynamic DynamoDB handle this as good as possible, I will implement a retry mechanism with an exponential backoff strategy. |
I have now implemented an exponential backoff strategy using the I cannot reproduce the error in my environment so I would be happy if you could test this and see how it works for you. All the best! |
Did the config file format change? I'm getting this error when trying to start the service after installing 1.10.6a1 2014-04-01T13:07:28.104109778: do_start:Starting dynamicdynamodb |
Nah, this issue was smashed in a separate branch. Test |
Ok, I got it running, I'll let you know if I hit the cloud watch throttling. It usually happened within a day. |
Ok, great, thanks! |
@doapp-jeremy did you see this exception in the latest version? Have a great weekend! |
Nope, haven't seen it. Thanks! |
I have now released 1.10.7, containing the fix for this issue. Thanks for the report and testing. |
I noticed this exception in my log. It looks like my reads of cloud watch metrics were throttled, I'm not sure why, I'm not aware of a read limit on cloud watch metric. The service is checking every 5 minutes, and I don't have any other code that's reading a bunch of cloud watch metrics either.
I think the correct behavior would be to catch all exceptions, send the exception to the SNS topic, and then restart the service.
The text was updated successfully, but these errors were encountered: