Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try run details are not being added to a bug #201

Closed
tomrittervg opened this issue Sep 30, 2021 · 3 comments
Closed

Try run details are not being added to a bug #201

tomrittervg opened this issue Sep 30, 2021 · 3 comments

Comments

@tomrittervg
Copy link
Collaborator

Observed in https://bugzilla.mozilla.org/show_bug.cgi?id=1733013

The problem sequence is:

  1. Library has new release A.
  2. We notice it, process it, file a bug, send in a try run
  3. We cron again, and maybe we need to retrigger jobs so we do
  4. Library has new release B
  5. We cron again, and see the new library release.
  6. We do not find an existing job for this new release.
  7. We process the new job

At this point whether or not we process the new release we will not make any follow up comment on the old release's bug. If we do process the new release we will close the old bug. If we don't, we leave it up but don't comment.

If we have never summarized the try run results, we should do so and add them to the bug - whether or not the bug is open or it is the newest release. Before summarizing them, we should retrigger them if needed. (Not retriggering them may make some sense but it's easier to not complicated the control flow and I think retriggering them also makes some sense.)

@tomrittervg tomrittervg added this to the v1 milestone Sep 30, 2021
@tomrittervg
Copy link
Collaborator Author

The whole clean_up_old_job stuff is already messy and this has the distinct possibility of making it messier. So we need to plan this patch out a bit.

We've got a few aspects of Updatebot this interacts with:

  • The clean_up_old_job logic inside of _process_new_job
  • The notion that we can only have one Updatebot job 'running' (i.e. not in the DONE state at any one time)
  • The notion of STATUS and OUTCOMES
  • Especially the special status of RELINQUISHED

Here are some assertions we've made that we're going to start violating. We'll need to change our code to accept and deal with these new violations:

  • Multiple Updatebot jobs may be 'running' (i.e. Not Done.)
    • However, only one of them (call if N) should be considered 'tip'. When we create a new job (call it M), we should close N's bug in favor of M's.
  • A job that has been relinquished may still need to be processed.
    • Relinquished can (and should) be thought of as "Do not modify this bug's Open/Close state". Not "Do nothing with this job".
  • A job can have the outcome of ABORTED
    • We will no longer abort jobs.

@tomrittervg
Copy link
Collaborator Author

There seems to be three aspects to this work.

  1. Change what we do w.r.t. prior jobs when we see a new revision to process.
  2. Change how we find existing jobs that we need to take action on.
  3. Decide what things we will do with an existing job depending on its state.

(2) seems easy. For each library that has Updatebot tasks, we can just ask the database for all the existing jobs for that library, and iterate through them, taking actions. (What actions to be decided by (3).) We should do this before we look for any new revisions, that way we don't have to exempt a job we just created.

(3) To start with, let's outline the actions we could take for a job:

  • we could close its bugzilla bug and mark it a duplicate of a newer bug.
  • we can the firefox versions affected by it and mark that on the bugzilla bug
  • if we're in a two-platform configuration, we can trigger the second set of platforms to run
  • we can retrigger failed tests
  • we can summarize the try run and post it as a comment
  • we can flag a patch for review
  • we can flag a bug for needinfo
  • we can assign a bug
  • we can abandon a phabricator revision
  • we can update its state in the database
  • we can do nothing to it

And let's define how we know a job's state:

  • the bug could be open or closed.
  • there could be a new library revision that has superseeded it (but we won't know that unless we check for a new revision before processing existing jobs...)
  • we could have marked its state in the database in some way

The problem we want to avoid is what Relinquished was designed to solve: we absolutely don't want Updatebot and Humans getting into a resolution-flipping loop where Updatebot closes something, a human re-opens it, and Updatebot sees it's open and tries to close it. I do not think we can reliably use Bugzilla status to determine this, so we need to have some indicator in the Updatebot database. For the sake of continuity, I'm going to refer to these jobs as Relinquished jobs for now...

The rules I am seeing emerge from this are:

a. If we have a new revision, and the prior job is not Relinquished and the job's bug is Open, we should abandon the patch, close the bug, supersede it by a new bug, and mark it as Relinquished in the database. (An abandoned patch can be commandeered and reclaimed, although this will require a second human to signoff on the patch.)
b. If a job's bug is closed, the only two things we should do are to summarize any existing try runs if we haven't already and abandon the revision (if the revision is still open). We should not trigger any new taskcluster jobs, update affected versions, mark it as a dupe, or place any flags. Summarizing the try runs ideally includes adding text saying that we would have done something had it not been closed (like retrigger failures.)
c. If a job's bug is open, we could mark FF versions as affected, trigger new taskcluster jobs, summarize the try run and post a comment, place flags (review, needinfo), and assign it.

Relinquished status only affects whether or not we will do the two-step process of closing a bug and abandoning a revision. It will not affect any other aspect of behavior. A job that has been relinquished can still do all the actions indicated in (b) or (c).

@tomrittervg
Copy link
Collaborator Author

I got most of this written, but then hit an unfortunate situation.

Job A runs, sends it to try.
A new job, Job B, runs before Job A has try results.
Job B supersedes Job A, closes its bug, abandons the phab revision, updates Job A's outcome to aborted and relinquishes Job A.
Job A now has a status of 'Pending' but an outcome of 'Aborted'.

We can't change its status - we need it to process the try run results later.

I think the solution to this is to remove the 'Aborted' outcome. It seems like it would be helpful to know that a job was actively superseded by another bug, but in practice I don't think we care about that outcome at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant