Skip to content

Conversation

allmightyspiff
Copy link
Member

Will allow wait_for_ready to gracefully handle exceptions, and raise the delay for each exception. Raised the default delay from 1s to 10s

def wait_for_ready(self, instance_id, limit, delay=10, pending=False):
When an exception is encountered, delay will be multiplied by 2.
Will still return False if a ready state was not encountered before limit is reached.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) to 85.159% when pulling a58c26b on allmightyspiff:odyhunter-delay_exponential into fc0a44a on softlayer:master.

@allmightyspiff
Copy link
Member Author

allmightyspiff commented Aug 31, 2017

@gltdeveloper / @odyhunter

  • In my implementation I've gone with simply increasing the delay by a factor of 2. I'm not really convinced we need random staggering here, or a higher exponential factor than 2.

  • I've removed the attempts variable in favor of just increasing the delay per error. Since wait_for_ready already has a time limit I think his is a bit nicer.

  • I've changed the default delay from 1s to 10s. This is still configurable of course.

Comments are of course welcome.

@allmightyspiff allmightyspiff self-assigned this Aug 31, 2017
@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) to 85.159% when pulling 1b70b2b on allmightyspiff:odyhunter-delay_exponential into fc0a44a on softlayer:master.

@odyhunter
Copy link

@allmightyspiff This solution will solve our problem, but I think we still need to add attempt :)

Consider this case, if customer use 15 sec delay, 3600 sec total wait time, then if hit exception, the retry will be triggered 8 times. on 15, 30, 60, 120, 240, 480, 960...

If this is a man made error, e.g. given incorrect VSI ID or even customer side Network outage on their site, this code will still running...

@gltdeveloper
Copy link

  • On the removal of the randomization, as a datapoint, the customer that drove this request is driving 100 concurrent provisions (all individually), and all at the same time. If they get into a cadence of wait for ready, of 15 seconds...each per second and there is an API limit on things, it's possible that they could get on the same cadence. That's what drove some aspects of the randomization.
  • Note, other cloud providers use this algorithm for exponential backoff, and there's no reason to be that different IMHO for the sake of thinking we are doing it better. To some extent, people using both clouds would desire similar behavior.
  • Changing the default delay is fine.. (and like that it's configurable).
  • You could also cap the retry to the max retry wait time (to avoid going over what they desire as the desired model).

@allmightyspiff
Copy link
Member Author

Thanks for the input.

specifying attempts

For both of those situations I would expect the user to want to ctrl-c out of the script before it hits attempts. Since the exception catcher logs exceptions, I don't think its too much to ask that users monitor the logs to see if their script is working correctly.
I think I will add a log entry for each successful loop though as well.

randomization

This is a good point, I think I'll add a random 0-9s delay each exception in addition to multiplying by 2

exponential backoff

delay = delay * 2
represents f(x)= 2^x, so we should be good there. With the random number being added the function will be
delay = (delay * 2) +random.randint(0, 9)

capped retry

wait_for_ready will only run at most the time specified by limit, no matter how many exceptions are thrown.

@gltdeveloper
Copy link

I agree with all the comments.. On the random side, I keep thinking maybe you can tie it to the current delay functionality but what you have is simple, and I do like the K.I.S.S. principle. This is a good place to start.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.02%) to 85.163% when pulling 0bac762 on allmightyspiff:odyhunter-delay_exponential into fc0a44a on softlayer:master.

@allmightyspiff
Copy link
Member Author

Real testing

import SoftLayer
import logging

client = SoftLayer.Client()
vs_manager = SoftLayer.VSManager(client)
LOGGER = logging.getLogger()
LOGGER.addHandler(logging.StreamHandler())
LOGGER.setLevel(20)
ready = vs_manager.wait_for_ready(38614529, 5000, 10)
print(ready)

Testing exception

$ python wait_for_ready_test.py
POST https://api.softlayer.com/xmlrpc/v3.1/SoftLayer_Virtual_Guest
Exception: SoftLayerAPIError(SoftLayer_Exception_ObjectNotFound): Unable to find object with id of '309835167'.
Auto retry in 4 seconds
POST https://api.softlayer.com/xmlrpc/v3.1/SoftLayer_Virtual_Guest
Exception: SoftLayerAPIError(SoftLayer_Exception_ObjectNotFound): Unable to find object with id of '309835167'.
Auto retry in 17 seconds
POST https://api.softlayer.com/xmlrpc/v3.1/SoftLayer_Virtual_Guest
Exception: SoftLayerAPIError(SoftLayer_Exception_ObjectNotFound): Unable to find object with id of '309835167'.
Auto retry in 41 seconds
POST https://api.softlayer.com/xmlrpc/v3.1/SoftLayer_Virtual_Guest
Exception: SoftLayerAPIError(SoftLayer_Exception_ObjectNotFound): Unable to find object with id of '309835167'.
Auto retry in 36.471503496170044 seconds
POST https://api.softlayer.com/xmlrpc/v3.1/SoftLayer_Virtual_Guest
Exception: SoftLayerAPIError(SoftLayer_Exception_ObjectNotFound): Unable to find object with id of '309835167'.
False
(py36)

Real VM being provisioned

$ python wait_for_ready_test.py
POST https://api.softlayer.com/xmlrpc/v3.1/SoftLayer_Virtual_Guest
38614529 not ready.
Auto retry in 10 seconds
POST https://api.softlayer.com/xmlrpc/v3.1/SoftLayer_Virtual_Guest
38614529 not ready.
Auto retry in 10 seconds
POST https://api.softlayer.com/xmlrpc/v3.1/SoftLayer_Virtual_Guest
38614529 not ready.
Auto retry in 10 seconds
POST https://api.softlayer.com/xmlrpc/v3.1/SoftLayer_Virtual_Guest
True

@odyhunter
Copy link

Looking good :)

@allmightyspiff allmightyspiff merged commit e249c70 into softlayer:master Sep 5, 2017
@allmightyspiff allmightyspiff deleted the odyhunter-delay_exponential branch August 31, 2020 22:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants