Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix #5739 refactor(nimbus): remove pause timeout special case #5838

Closed
wants to merge 1 commit into from

Conversation

lmorchard
Copy link
Contributor

Because:

  • we want to make pausing enrollment a manual task soon

This commit:

  • makes pausing enrollment in a live experiment subject to timeout in
    remote settings just like other updates

Because:

* we want to make pausing enrollment a manual task soon

This commit:

* makes pausing enrollment in a live experiment subject to timeout in
  remote settings just like other updates
@lmorchard
Copy link
Contributor Author

Just gonna shotgun some review requests out there. I'll be out for 2 weeks and this blocks the end enrollment epic, so we might want to get it merged soon if it doesn't look disastrous. Alternatively, someone can take the baton and massage this to merge if needed

@lmorchard
Copy link
Contributor Author

lmorchard commented Jun 24, 2021

Oh yeah, and some notes on the behavior here and triggering it in local dev:

  1. Push an experiment live (locally)
  2. Go into /admin/ and edit that experiment: set the duration to something very long, then trigger enrollment pause by setting all the changelog dates back by a month or so
  3. Soon, that experiment should get put into a LIVE/WAITING/LIVE state
  4. If you wait for it to timeout (which it will), it will timeout but then immediately re-enter LIVE/WAITING/LIVE automatically
  5. Now, you approve a different experiment for launch
  6. Wait for the pausing experiment to timeout again - you should then see the new approved-for-launch experiment get moved into waiting status instead of seeing the pausing experiment automatically re-pushed

This should mean that, in advance of implementing the manual UI, any automatically pausing experiment that times out will keep getting retried unless some other experiment gets approved

@LZoog
Copy link
Contributor

LZoog commented Jun 24, 2021

FWIW, the code changes LGTM.

Thank you for the detailed instructions 😀 Assuming we're going by status/publishStatus/statusNext, I saw the experiment go from LIVE/IDLE/null to LIVE/WAITING/LIVE. However, I seem to be stuck at:

If you wait for it to timeout (which it will), it will timeout but then immediately re-enter LIVE/WAITING/LIVE automatically

We're talking about the request to end the experiment, right? I seem to just get hung up on this screen, I refreshed periodically after a few minutes for more than 10 minutes, and nothing:
image

Feels like I'm not doing something right, I'll take another look tomorrow morning 🤔

@lmorchard
Copy link
Contributor Author

lmorchard commented Jun 24, 2021

We're talking about the request to end the experiment, right?

It says "end this experiment" but that's also another known bug in this epic: #5726 / https://mozilla-hub.atlassian.net/browse/EXP-1354

This will probably be fixed after #5742 / https://mozilla-hub.atlassian.net/browse/EXP-1349

I seem to just get hung up on this screen, I refreshed periodically after a few minutes for more than 10 minutes, and nothing:

I should have written that part better. TL;DR: Automated pause should just keep getting re-queued after timeout until it's approved in remote settings. But, it will step aside after timeout when some other experiment is reviewed for launch or ending.

It will time out, but it immediately re-enters LIVE/WAITING/LIVE automatically. So, the display will look hung, because the automated pause put it right back into the queue without giving a chance for the timeout to be displayed. (The timeout happens here, but then the re-queue happens later in the same task.)

But, if you approve a different experiment for launch, that takes priority over pausing. (The Celery tasks checks queues for launch, end, and pause - in that order)

So, once the request to pause times out, the new request to launch goes ahead. And the automated pause will wait. Only then will the pausing experiment show a visible timeout message

@jodyheavener
Copy link
Contributor

@lmorchard just wanted to give a quick update since i'm heading out shortly - I tried running through your steps listed but couldn't get past step 3 where, after setting back the changelog dates, it should go into LIVE/WAITING/LIVE. It remained unchanged. But I might be doing something wrong here. I'll coordinate with Lauren on it!

@LZoog
Copy link
Contributor

LZoog commented Jun 25, 2021

Following up here too before the break, I tried this out again this morning and for some reason got hung up when trying to push a different experiment live (step 5). I was able to see the re-requesting (step 4, your clarifying comment helped so much), but every time I tried to request a different experiment for review, I didn't see it come up in RS. 🤨 Probably a me/local problem, but I got tied up in other things today and haven't had a chance to run through it all again. Since we probably wanted @jaredlockhart's eyes anyway, I think it's fine for this to hang out until then.

@jaredlockhart
Copy link
Collaborator

Oh damn @lmorchard this wasn't exactly what I meant, I should've put clearer instructions in the ticket. This ticket should remove the automatically triggered pausing altogether so they're only manually triggered the same way launch/end are, the way it's laid out in the new kinto diagrams. After we do this, the only way to pause an experiment will be through the admin until we do the next ticket that adds the buttons to the UI, but we only get a handful of pauses in a week so that should be fine until we add the UI.

@jaredlockhart
Copy link
Collaborator

Oh @lmorchard is out this week, I'll fill in. Closing this, will refile.

@lmorchard
Copy link
Contributor Author

lmorchard commented Jul 12, 2021

Oh damn @lmorchard this wasn't exactly what I meant, I should've put clearer instructions in the ticket. This ticket should remove the automatically triggered pausing altogether so they're only manually triggered the same way launch/end are, the way it's laid out in the new kinto diagrams. After we do this, the only way to pause an experiment will be through the admin until we do the next ticket that adds the buttons to the UI, but we only get a handful of pauses in a week so that should be fine until we add the UI.

Didn't do that in this PR in order to keep status-quo functionality and just get rid of the timeout exception per the issue. Was intending to remove the automated pause along with adding the manual UI (#5742).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants