Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Fix wait() logic wrt timeout #67
Conversation
mbruzek
reviewed
Apr 10, 2015
| ready = False | ||
| try: | ||
| + now = time.time() | ||
| with helpers.timeout(timeout): | ||
| # Make sure we're in a 'started' state across the board |
mbruzek
Apr 10, 2015
Contributor
The previous code does not actually check for "started" here. I would recommend removing this comment or changing it to reflect what the code is actually waiting for in status.
|
@AdamIsrael I am not a reviewer for amulet code base, just curious really. I looked this over and it appears the side effect is the code will check status more frequently, but does seem to be better waiting logic. |
|
Thanks for the feedback Matt. I'm no longer convinced that this new logic fixes the underlying issue. I'll update the pull request after I've done further testing. |
|
Hey @mbruzek, thanks for the feedback! I've dug into the issue deeper and updated this pull request. There were two issues: wait_for_status only checked to see if a unit had a public-address before considering it to be ready, but at least one provider (amazon) returns the public-address while the unit is still being allocated. This led to a condition where calling add_unit() would return before the unit was available, and subsequent calls against the unit could fail. |
chuckbutler
reviewed
Apr 20, 2015
| @@ -235,7 +240,7 @@ def wait_for_status(self, juju_env, services, timeout=300): | ||
| ready = True | ||
| status = waiter.status(juju_env) | ||
| for service in services: | ||
| - if not 'units' in status['services'][service]: | ||
| + if 'units' not in status['services'][service]: |
chuckbutler
reviewed
Apr 20, 2015
| - while not ready: | ||
| + now = time.time() | ||
| + while not ready and now + timeout < time.time(): | ||
| + waiter.wait(timeout=15) |
AdamIsrael
Apr 20, 2015
Contributor
Totally arbitrary. I thought about calculating a sane value based on timeout, but 15 seemed sane enough.
chuckbutler
reviewed
Apr 20, 2015
| - break | ||
| - else: | ||
| - ready = True | ||
| + if unit['agent-state'] == 'started': |
chuckbutler
reviewed
Apr 20, 2015
| + if status is None: | ||
| + ready = False | ||
| + break | ||
| + if 'hook' in status and status['hook']: |
chuckbutler
Apr 20, 2015
Contributor
This is wrt the new extended status support thats coming right? +1 again on this. very nice
AdamIsrael
Apr 20, 2015
Contributor
If you mean the 'hook' in status, that was existing. I'm not exactly sure what the use case for it is, so I left it as is.
|
I have a few comments for clarity, but over this LGTM |
tvansteenburgh
reviewed
Apr 20, 2015
| - waiter.wait(timeout=timeout) | ||
| - while not ready: | ||
| + now = time.time() | ||
| + while not ready and now + timeout < time.time(): |
tvansteenburgh
Apr 20, 2015
Member
I'm not sure you want the extra time checking here. Clients are expecting this method (wait()) to raise if units aren't ready within the timeout. I think just while not ready is better here.
AdamIsrael
added some commits
Apr 23, 2015
|
While reviewing @tvansteenburgh's comments, I discovered that I wasn't solving the underlying problem.
PR #68 adds a test for calling |
AdamIsrael commentedApr 9, 2015
This PR addresses lp:1430488.
The
waiter.waitblocks for the full duration of the timeout, andhelpers.timeoutcreates a signal-based timeout that's triggered when the timeout expires. This lead tojuju_agent()being run against a unit still installing, which would throw an exception.I made
waiter.waitblock for a smaller, arbitrary value of time (15 seconds), check the unit status, and repeat until it's ready or the timeout is expired.