New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix #5739 refactor(nimbus): remove pause timeout special case #5838
Conversation
Because: * we want to make pausing enrollment a manual task soon This commit: * makes pausing enrollment in a live experiment subject to timeout in remote settings just like other updates
Just gonna shotgun some review requests out there. I'll be out for 2 weeks and this blocks the end enrollment epic, so we might want to get it merged soon if it doesn't look disastrous. Alternatively, someone can take the baton and massage this to merge if needed |
Oh yeah, and some notes on the behavior here and triggering it in local dev:
This should mean that, in advance of implementing the manual UI, any automatically pausing experiment that times out will keep getting retried unless some other experiment gets approved |
It says "end this experiment" but that's also another known bug in this epic: #5726 / https://mozilla-hub.atlassian.net/browse/EXP-1354 This will probably be fixed after #5742 / https://mozilla-hub.atlassian.net/browse/EXP-1349
I should have written that part better. TL;DR: Automated pause should just keep getting re-queued after timeout until it's approved in remote settings. But, it will step aside after timeout when some other experiment is reviewed for launch or ending. It will time out, but it immediately re-enters LIVE/WAITING/LIVE automatically. So, the display will look hung, because the automated pause put it right back into the queue without giving a chance for the timeout to be displayed. (The timeout happens here, but then the re-queue happens later in the same task.) But, if you approve a different experiment for launch, that takes priority over pausing. (The Celery tasks checks queues for launch, end, and pause - in that order) So, once the request to pause times out, the new request to launch goes ahead. And the automated pause will wait. Only then will the pausing experiment show a visible timeout message |
@lmorchard just wanted to give a quick update since i'm heading out shortly - I tried running through your steps listed but couldn't get past step 3 where, after setting back the changelog dates, it should go into LIVE/WAITING/LIVE. It remained unchanged. But I might be doing something wrong here. I'll coordinate with Lauren on it! |
Following up here too before the break, I tried this out again this morning and for some reason got hung up when trying to push a different experiment live (step 5). I was able to see the re-requesting (step 4, your clarifying comment helped so much), but every time I tried to request a different experiment for review, I didn't see it come up in RS. 🤨 Probably a me/local problem, but I got tied up in other things today and haven't had a chance to run through it all again. Since we probably wanted @jaredlockhart's eyes anyway, I think it's fine for this to hang out until then. |
Oh damn @lmorchard this wasn't exactly what I meant, I should've put clearer instructions in the ticket. This ticket should remove the automatically triggered pausing altogether so they're only manually triggered the same way launch/end are, the way it's laid out in the new kinto diagrams. After we do this, the only way to pause an experiment will be through the admin until we do the next ticket that adds the buttons to the UI, but we only get a handful of pauses in a week so that should be fine until we add the UI. |
Oh @lmorchard is out this week, I'll fill in. Closing this, will refile. |
Didn't do that in this PR in order to keep status-quo functionality and just get rid of the timeout exception per the issue. Was intending to remove the automated pause along with adding the manual UI (#5742). |
Because:
This commit:
remote settings just like other updates