fix #5739 refactor(nimbus): remove pause timeout special case #5838

lmorchard · 2021-06-24T00:40:32Z

Because:

we want to make pausing enrollment a manual task soon

This commit:

makes pausing enrollment in a live experiment subject to timeout in
remote settings just like other updates

Because: * we want to make pausing enrollment a manual task soon This commit: * makes pausing enrollment in a live experiment subject to timeout in remote settings just like other updates

lmorchard · 2021-06-24T00:48:35Z

Just gonna shotgun some review requests out there. I'll be out for 2 weeks and this blocks the end enrollment epic, so we might want to get it merged soon if it doesn't look disastrous. Alternatively, someone can take the baton and massage this to merge if needed

lmorchard · 2021-06-24T01:02:50Z

Oh yeah, and some notes on the behavior here and triggering it in local dev:

Push an experiment live (locally)
Go into /admin/ and edit that experiment: set the duration to something very long, then trigger enrollment pause by setting all the changelog dates back by a month or so
Soon, that experiment should get put into a LIVE/WAITING/LIVE state
If you wait for it to timeout (which it will), it will timeout but then immediately re-enter LIVE/WAITING/LIVE automatically
Now, you approve a different experiment for launch
Wait for the pausing experiment to timeout again - you should then see the new approved-for-launch experiment get moved into waiting status instead of seeing the pausing experiment automatically re-pushed

This should mean that, in advance of implementing the manual UI, any automatically pausing experiment that times out will keep getting retried unless some other experiment gets approved

LZoog · 2021-06-24T22:13:06Z

FWIW, the code changes LGTM.

Thank you for the detailed instructions 😀 Assuming we're going by status/publishStatus/statusNext, I saw the experiment go from LIVE/IDLE/null to LIVE/WAITING/LIVE. However, I seem to be stuck at:

If you wait for it to timeout (which it will), it will timeout but then immediately re-enter LIVE/WAITING/LIVE automatically

We're talking about the request to end the experiment, right? I seem to just get hung up on this screen, I refreshed periodically after a few minutes for more than 10 minutes, and nothing:

Feels like I'm not doing something right, I'll take another look tomorrow morning 🤔

lmorchard · 2021-06-24T23:23:03Z

We're talking about the request to end the experiment, right?

It says "end this experiment" but that's also another known bug in this epic: #5726 / https://mozilla-hub.atlassian.net/browse/EXP-1354

This will probably be fixed after #5742 / https://mozilla-hub.atlassian.net/browse/EXP-1349

I seem to just get hung up on this screen, I refreshed periodically after a few minutes for more than 10 minutes, and nothing:

I should have written that part better. TL;DR: Automated pause should just keep getting re-queued after timeout until it's approved in remote settings. But, it will step aside after timeout when some other experiment is reviewed for launch or ending.

It will time out, but it immediately re-enters LIVE/WAITING/LIVE automatically. So, the display will look hung, because the automated pause put it right back into the queue without giving a chance for the timeout to be displayed. (The timeout happens here, but then the re-queue happens later in the same task.)

But, if you approve a different experiment for launch, that takes priority over pausing. (The Celery tasks checks queues for launch, end, and pause - in that order)

So, once the request to pause times out, the new request to launch goes ahead. And the automated pause will wait. Only then will the pausing experiment show a visible timeout message

jodyheavener · 2021-06-25T20:04:30Z

@lmorchard just wanted to give a quick update since i'm heading out shortly - I tried running through your steps listed but couldn't get past step 3 where, after setting back the changelog dates, it should go into LIVE/WAITING/LIVE. It remained unchanged. But I might be doing something wrong here. I'll coordinate with Lauren on it!

LZoog · 2021-06-25T22:06:10Z

Following up here too before the break, I tried this out again this morning and for some reason got hung up when trying to push a different experiment live (step 5). I was able to see the re-requesting (step 4, your clarifying comment helped so much), but every time I tried to request a different experiment for review, I didn't see it come up in RS. 🤨 Probably a me/local problem, but I got tied up in other things today and haven't had a chance to run through it all again. Since we probably wanted @jaredlockhart's eyes anyway, I think it's fine for this to hang out until then.

jaredlockhart · 2021-07-06T16:25:53Z

Oh damn @lmorchard this wasn't exactly what I meant, I should've put clearer instructions in the ticket. This ticket should remove the automatically triggered pausing altogether so they're only manually triggered the same way launch/end are, the way it's laid out in the new kinto diagrams. After we do this, the only way to pause an experiment will be through the admin until we do the next ticket that adds the buttons to the UI, but we only get a handful of pauses in a week so that should be fine until we add the UI.

jaredlockhart · 2021-07-06T19:58:19Z

Oh @lmorchard is out this week, I'll fill in. Closing this, will refile.

lmorchard · 2021-07-12T16:56:15Z

Oh damn @lmorchard this wasn't exactly what I meant, I should've put clearer instructions in the ticket. This ticket should remove the automatically triggered pausing altogether so they're only manually triggered the same way launch/end are, the way it's laid out in the new kinto diagrams. After we do this, the only way to pause an experiment will be through the admin until we do the next ticket that adds the buttons to the UI, but we only get a handful of pauses in a week so that should be fine until we add the UI.

Didn't do that in this PR in order to keep status-quo functionality and just get rid of the timeout exception per the issue. Was intending to remove the automated pause along with adding the manual UI (#5742).

fix #5739 refactor(nimbus): remove pause timeout special case

30118d7

Because: * we want to make pausing enrollment a manual task soon This commit: * makes pausing enrollment in a live experiment subject to timeout in remote settings just like other updates

data-sync-user mentioned this pull request Jun 24, 2021

Add end enrollment UI flow #5742

Closed

lmorchard requested review from LZoog, jaredlockhart and jodyheavener June 24, 2021 00:47

jaredlockhart closed this Jul 6, 2021

LZoog mentioned this pull request Jul 7, 2021

fix #5742 feat(nimbus): implement UI for manual enrollment end #5852

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix #5739 refactor(nimbus): remove pause timeout special case #5838

fix #5739 refactor(nimbus): remove pause timeout special case #5838

lmorchard commented Jun 24, 2021

lmorchard commented Jun 24, 2021

lmorchard commented Jun 24, 2021 •

edited

LZoog commented Jun 24, 2021

lmorchard commented Jun 24, 2021 •

edited

jodyheavener commented Jun 25, 2021

LZoog commented Jun 25, 2021

jaredlockhart commented Jul 6, 2021

jaredlockhart commented Jul 6, 2021

lmorchard commented Jul 12, 2021 •

edited

fix #5739 refactor(nimbus): remove pause timeout special case #5838

fix #5739 refactor(nimbus): remove pause timeout special case #5838

Conversation

lmorchard commented Jun 24, 2021

lmorchard commented Jun 24, 2021

lmorchard commented Jun 24, 2021 • edited

LZoog commented Jun 24, 2021

lmorchard commented Jun 24, 2021 • edited

jodyheavener commented Jun 25, 2021

LZoog commented Jun 25, 2021

jaredlockhart commented Jul 6, 2021

jaredlockhart commented Jul 6, 2021

lmorchard commented Jul 12, 2021 • edited

lmorchard commented Jun 24, 2021 •

edited

lmorchard commented Jun 24, 2021 •

edited

lmorchard commented Jul 12, 2021 •

edited