Remove the broken status history squashing. #8144

Merged
merged 1 commit into from Nov 28, 2017

Conversation

Projects
None yet
3 participants
Owner

howbazaar commented Nov 28, 2017

The squashing of matching status history values is broken and is removing some values that it shouldn't be. The whole point of squashing the status history values was to remove the repeated update-status calls, and the idle following them. Given that we are no longer recording the update-status calls, the squashing should be unnecessary.

QA steps

juju deploy ./acceptancetests/repository/charms/update-status-tester tester
# wait 15 minutes or so so multiple update-status calls are made
juju show-status-history tester/0

Bug reference

https://bugs.launchpad.net/juju/+bug/1642031

Owner

jameinel commented Nov 28, 2017

So this has a small issue if you try to trigger the changes via 'juju run' instead of via 'update-status' hooks.

To start with, it is very nice to use the right log formatter, because now the status states are also colored (maintenance is yellow, executing/idle are green). Oddly "waiting" is not colored, and stands out.

However, if you use juju-run, this is what I see:

$ juju show-status-log t/0
Time                        Type       Status       Message
28 Nov 2017 13:36:44+04:00  workload   waiting      installing agent
28 Nov 2017 13:36:46+04:00  workload   waiting      agent initializing
28 Nov 2017 13:36:47+04:00  workload   maintenance  installing charm software
28 Nov 2017 13:36:47+04:00  juju-unit  executing    running install hook
28 Nov 2017 13:36:47+04:00  workload   maintenance  installing
28 Nov 2017 13:36:47+04:00  juju-unit  executing    running leader-elected hook
28 Nov 2017 13:36:47+04:00  juju-unit  executing    running config-changed hook
28 Nov 2017 13:36:47+04:00  juju-unit  executing    running start hook
28 Nov 2017 13:36:47+04:00  juju-unit  idle
28 Nov 2017 13:40:33+04:00  juju-unit  executing    running action juju-run
28 Nov 2017 13:40:33+04:00  juju-unit  idle
28 Nov 2017 13:44:18+04:00  juju-unit  executing    running action juju-run
28 Nov 2017 13:44:18+04:00  workload   maintenance  working
28 Nov 2017 13:44:18+04:00  juju-unit  idle
28 Nov 2017 13:44:43+04:00  juju-unit  executing    running action juju-run
28 Nov 2017 13:44:43+04:00  juju-unit  idle
28 Nov 2017 13:44:48+04:00  juju-unit  executing    running action juju-run
28 Nov 2017 13:44:48+04:00  juju-unit  idle
28 Nov 2017 13:44:53+04:00  juju-unit  executing    running action juju-run
28 Nov 2017 13:44:53+04:00  juju-unit  idle

vs with juju-2.3-beta3

TIME                            TYPE            STATUS          MESSAGE
28 Nov 2017 13:36:44+04:00      workload        waiting         installing agent
28 Nov 2017 13:36:46+04:00      workload        waiting         agent initializing
28 Nov 2017 13:36:47+04:00      workload        maintenance     installing charm software
28 Nov 2017 13:36:47+04:00      juju-unit       executing       running install hook
28 Nov 2017 13:36:47+04:00      workload        maintenance     installing
28 Nov 2017 13:36:47+04:00      juju-unit       executing       running leader-elected hook
28 Nov 2017 13:36:47+04:00      juju-unit       executing       running config-changed hook
28 Nov 2017 13:36:47+04:00      juju-unit       executing       running start hook
28 Nov 2017 13:36:47+04:00      juju-unit       idle
28 Nov 2017 13:40:33+04:00      juju-unit       executing       running action juju-run
28 Nov 2017 13:45:04+04:00                      idle            last 2 statuses repeated 1 times
28 Nov 2017 13:44:18+04:00      workload        maintenance     working
28 Nov 2017 13:44:18+04:00      juju-unit       idle
28 Nov 2017 13:45:04+04:00                      idle            last 2 statuses repeated 2 times

Now, it probably isn't 'standard' to use juju-run, but it is something that ends up collapsing a little bit better.

Now, if I just show the workload status:
$ juju show-status-log t/0 --type=workload
Time Type Status Message
28 Nov 2017 13:35:48+04:00 workload waiting waiting for machine
28 Nov 2017 13:36:44+04:00 workload waiting installing agent
28 Nov 2017 13:36:46+04:00 workload waiting agent initializing
28 Nov 2017 13:36:47+04:00 workload maintenance installing charm software
28 Nov 2017 13:36:47+04:00 workload maintenance installing
28 Nov 2017 13:44:18+04:00 workload maintenance working

The new code does what you expect, in that it doesn't trigger anything for repeated 'status-set' calls, and I do see the timestamp on "juju status --format=yaml" shows the latest status while "show-status-log" shows the first status.

$ juju show-status-log t/0 --type=workload
Time                        Type      Status       Message
...
28 Nov 2017 13:44:18+04:00  workload  maintenance  working

vs

$ juju status --format=yaml t/0
...
units:
t/0:
workload-status:
current: maintenance
message: working
since: 28 Nov 2017 13:53:43+04:00
juju-status:
current: idle
since: 28 Nov 2017 13:53:43+04:00
...

However, even after waiting for nearly 20 minutes, and after doing `juju model-config update-status-hook-interval=1m` I still had not seen *any* update-status hooks actually being called.

I did eventually see it trigger, but it was 13:57 after deploying at 13:35. So it took roughly 20 min from the time we finished `installing` before we first called `update-status`.

$ juju show-status-log t/0 --type=workload
Time Type Status Message
28 Nov 2017 13:35:48+04:00 workload waiting waiting for machine
28 Nov 2017 13:36:44+04:00 workload waiting installing agent
28 Nov 2017 13:36:46+04:00 workload waiting agent initializing
28 Nov 2017 13:36:47+04:00 workload maintenance installing charm software
28 Nov 2017 13:36:47+04:00 workload maintenance installing
28 Nov 2017 13:44:18+04:00 workload maintenance working
28 Nov 2017 13:57:44+04:00 workload active the message

(I filed https://bugs.launchpad.net/juju/+bug/1734863 because it appears update-status-hook-interval doesn't actually do anything.)

After letting it run for *another* 30 min, I ended up with:

$ juju show-status-log t/0
Time Type Status Message
...
28 Nov 2017 13:44:18+04:00 workload maintenance working
28 Nov 2017 13:44:18+04:00 juju-unit idle
28 Nov 2017 13:44:43+04:00 juju-unit executing running action juju-run
28 Nov 2017 13:44:43+04:00 juju-unit idle
28 Nov 2017 13:44:48+04:00 juju-unit executing running action juju-run
28 Nov 2017 13:44:48+04:00 juju-unit idle
28 Nov 2017 13:44:53+04:00 juju-unit executing running action juju-run
28 Nov 2017 13:44:53+04:00 juju-unit idle
28 Nov 2017 13:50:13+04:00 juju-unit executing running config-changed hook
28 Nov 2017 13:50:13+04:00 juju-unit idle
28 Nov 2017 13:53:43+04:00 juju-unit executing running action juju-run
28 Nov 2017 13:53:43+04:00 juju-unit idle
28 Nov 2017 13:57:44+04:00 workload active the message

vs

$ /usr/bin/juju show-status-log t/0
TIME TYPE STATUS MESSAGE
...
28 Nov 2017 13:40:33+04:00 juju-unit executing running action juju-run
28 Nov 2017 14:22:31+04:00 idle last 2 statuses repeated 1 times
28 Nov 2017 13:44:18+04:00 workload maintenance working
28 Nov 2017 13:44:18+04:00 juju-unit idle
28 Nov 2017 13:44:43+04:00 juju-unit executing running action juju-run
28 Nov 2017 14:22:31+04:00 idle last 2 statuses repeated 2 times
28 Nov 2017 13:44:53+04:00 juju-unit idle
28 Nov 2017 13:50:13+04:00 juju-unit executing running config-changed hook
28 Nov 2017 13:50:13+04:00 juju-unit idle
28 Nov 2017 13:53:43+04:00 juju-unit idle

Note that the new one actually shows "workload active the message" while juju 2.3-beta3 doesn't even show that value.
I can also confirm that update-status hook *was* run multiple times during this time period

$ juju debug-log --include t/0 --replay --no-tail | grep "executing.*run update-status"
298:unit-t-0: 13:57:44 DEBUG juju.worker.uniter.operation executing operation "run update-status hook"
319:unit-t-0: 14:01:56 DEBUG juju.worker.uniter.operation executing operation "run update-status hook"
340:unit-t-0: 14:07:04 DEBUG juju.worker.uniter.operation executing operation "run update-status hook"
361:unit-t-0: 14:11:39 DEBUG juju.worker.uniter.operation executing operation "run update-status hook"
382:unit-t-0: 14:16:55 DEBUG juju.worker.uniter.operation executing operation "run update-status hook"
403:unit-t-0: 14:21:15 DEBUG juju.worker.uniter.operation executing operation "run update-status hook"



I'm pretty convinced that this is better than what we have, so I'll merge.
I do think we have some bugs remaining around 'update-status' in that the first hook didn't even fire for 20 minutes, and then setting 'update-status-hook-interval' to 1min had no effect.

@@ -30,8 +32,15 @@ func NewStatusHistoryCommand() cmd.Command {
return modelcmd.Wrap(&statusHistoryCommand{})
}
+// HistoryAPI is the API surface for the show-status-log command.
+type HistoryAPI interface {
@jameinel

jameinel Nov 28, 2017

Owner

Given the commands are now "show-status-log" rather than "status-history" likely we should be updating naming to match. I certainly mistype the command fairly often because I'm thinking the wrong thing.
Anyway, not something that needs to be fixed here.

Owner

jameinel commented Nov 28, 2017

$$merge$$

Contributor

jujubot commented Nov 28, 2017

Status: merge request accepted. Url: http://ci.jujucharms.com/job/github-merge-juju

@jujubot jujubot merged commit c4587a9 into juju:develop Nov 28, 2017

1 check passed

continuous-integration/jenkins/pr-merge This commit looks good
Details

@howbazaar howbazaar deleted the howbazaar:fix-status-history branch Nov 28, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment