Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build/cmd/gopherbot: don't fight humans #21312

Open
josharian opened this Issue Aug 4, 2017 · 17 comments

Comments

Projects
None yet
8 participants
@josharian
Copy link
Contributor

josharian commented Aug 4, 2017

In #20892, I re-opened an issue, only to have @gopherbot re-close it. This is an issue to follow-up on what happened there. cc @andybons @kevinburke

@josharian josharian added this to the Unreleased milestone Aug 4, 2017

@gopherbot gopherbot added the Builders label Aug 4, 2017

@ALTree

This comment has been minimized.

Copy link
Member

ALTree commented Aug 4, 2017

In #21280 it tagged the issue as documentation and when I removed it (it wasn't clear at that point if it was a doc issue or a problem requiring code changes) it tagged again as doc. I thought it was supposed to not add a tag that was previously removed (and supposed to not close an issue that was just re-opened).

@josharian josharian removed the Builders label Aug 4, 2017

@gopherbot gopherbot added the Builders label Aug 4, 2017

@josharian

This comment has been minimized.

Copy link
Contributor Author

josharian commented Aug 4, 2017

Reproduced here--I just removed the Builders label on this issue and gopherbot put it right back.

@kevinburke

This comment has been minimized.

Copy link
Contributor

kevinburke commented Aug 4, 2017

(deleted)

@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Aug 4, 2017

That's try to minimize memes and GIFs in comments on bugs. There's a slack channel for that.

@dmitshur

This comment has been minimized.

Copy link
Member

dmitshur commented Nov 2, 2017

In #21280 it tagged the issue as documentation and when I removed it (it wasn't clear at that point if it was a doc issue or a problem requiring code changes) it tagged again as doc.

This also happened in #22302:

image

When I removed it once more about 45 minutes later, the issue didn't happen again:

image

I thought it was supposed to not add a tag that was previously removed (and supposed to not close an issue that was just re-opened).

That's right. I've checked the code, and it certainly has the code to not fight with humans. For example, for the labelDocumentationIssues task, it's:

func (b *gopherbot) labelDocumentationIssues(ctx context.Context) error {
	return b.gorepo.ForeachIssue(func(gi *maintner.GitHubIssue) error {
		if gi.Closed || !isDocumentationTitle(gi.Title) || gi.HasLabel("Documentation") || gi.HasEvent("unlabeled") {
			return nil
		}
		printIssue("label-documentation", gi)
		if *dryRun {
			return nil
		}
		return b.addLabel(ctx, gi, "Documentation")
	})
}

The gi.HasEvent("unlabeled") part is what makes it not apply the documentation label again if a human has unlabeled it already.

It does work the second time; the bot seems to fight at most once per issue/task.

It feels like a timing/race condition bug (not a Go race condition, but a general one). It might happen that the labelDocumentationIssues task runs after the label was removed, just before it's aware of the "unlabeled" event has happened, so gi.HasEvent("unlabeled") returns false... but the issue doesn't already have "documentation" label, so it thinks it should be added.

I'm not completely sure how it's possible that the documentation label is removed yet gi.HasEvent("unlabeled") returns false; I would expect those 2 things to happen atomically.

Perhaps it's worth looking more into that direction; I just wanted to share what I know so far.

@dmitshur

This comment has been minimized.

Copy link
Member

dmitshur commented Nov 29, 2017

Looks like it happened to @griesemer in #22695 (comment):

image

@griesemer

This comment has been minimized.

Copy link
Contributor

griesemer commented Nov 29, 2017

@shurcooL Not sure. I think I fat-fingered this one.

@FiloSottile FiloSottile changed the title x/build: gopherbot too close-happy x/build/gopherbot: too close-happy Apr 12, 2018

@dmitshur

This comment has been minimized.

Copy link
Member

dmitshur commented Apr 15, 2018

Yet again in #24875:

image

@dmitshur dmitshur changed the title x/build/gopherbot: too close-happy x/build/cmd/gopherbot: too close-happy Apr 19, 2018

@dmitshur

This comment has been minimized.

Copy link
Member

dmitshur commented Apr 19, 2018

Yet another thing I noticed recently that I suspect is related.

In issue #24850, if you look at the issue events API, you can see gopherbot actually applied the "Proposal" and "Documentation" labels three times each. It wasn't fighting humans, but I suspect the underlying reason is similar: it might not receive the "label has already been applied" information quickly enough and ends up iterating over "this label should be applied on this issue because it's not there yet" code 3 times before finally seeing that the label has been applied.

It's not visible in GitHub's UI because it collapses nearby "apply label" events into one, but effectively what happened was this (in the span of 2 seconds):

image

I reported this to GitHub separately (i.e., it should probably be disallowed at API level to apply the same label more than once in a row). But gopherbot shouldn't be trying to do that either.

(One hacky idea for a fix is to add some delays between gopherbot taking actions, so it's slower (like a human) and has less chances of making repeat actions it shouldn't. But of course that's not as nice as a more proper that doesn't doesn't depend on timing.)

@mark-rushakoff

This comment has been minimized.

Copy link
Contributor

mark-rushakoff commented Aug 17, 2018

I filed #27058 but I'm closing it as a dupe of this. In the issue referenced there, I did a "gopherbot please remove label documentation" and then saw

gopherbot added Documentation and removed Documentation labels

(indicating that the label was removed and then immediately re-added) presumably because the issue title contained the word "documentation" at that time.

@josharian

This comment has been minimized.

Copy link
Contributor Author

josharian commented Aug 26, 2018

Another: #26623

@ALTree ALTree added the NeedsFix label Sep 22, 2018

@dmitshur dmitshur changed the title x/build/cmd/gopherbot: too close-happy x/build/cmd/gopherbot: don't fight humans Oct 15, 2018

@dmitshur

This comment has been minimized.

Copy link
Member

dmitshur commented Oct 15, 2018

Another: #28103. I was expecting it to happen, so I fired up a local copy of gopherbot running in dry-mode and watched it fight me (one time), as well as looked at gopherbot and maintnerd logs at the same time.

I have a good grasp of the underlying cause now. The problem is that maintner provides an inconsistent world-view during very short periods of time, and gopherbot takes (incorrect) action during those very short periods of time. Edit: Filed #28226 for this. /cc @bradfitz

It's as I guessed in #21312 (comment), except now I have data to back up that hypothesis. gopherbot runs in a loop, most of the time blocked in a corpus.Update call. As soon as that call returns after receiving new data, gopherbot runs all of its tasks.

Here's a log of gopherbot activity with annotations (the mutations that are received by the maintner client):

2018/10/15 14:46:12 Updating data from log *maintner.netMutSource ...
2018/10/15 14:46:46 Downloading 47 bytes of https://maintner.golang.org/logs/41 ...
2018/10/15 14:46:46 wrote /Users/dmitshur/Library/Caches/golang-maintner/0041.growing.mutlog
2018/10/15 14:46:46 Reloaded data from log *maintner.netMutSource.
2018/10/15 14:46:46 got corpus update after 34.209702362s
    mutation: github_issue:<owner:"golang" repo:"go" number:28103 updated:<seconds:1539629204 > remove_label:223401461 > 

              aka "the issue 28103 no longer has "Builders" label
              but the "a human has unlabeled this issue" mutation isn't here yet, so it applies the Builders label...
label-Builders [dry-run]
	https://golang.org/issue/28103  x/build/version: add $GOROOT/bin to $PATH when running go run/generate
2018/10/15 14:46:46 gopherbot ran in 8.874186ms
2018/10/15 14:46:46 Updating data from log *maintner.netMutSource ...
2018/10/15 14:46:49 Downloading 42 bytes of https://maintner.golang.org/logs/41 ...
2018/10/15 14:46:49 wrote /Users/dmitshur/Library/Caches/golang-maintner/0041.growing.mutlog
2018/10/15 14:46:49 Reloaded data from log *maintner.netMutSource.
2018/10/15 14:46:49 got corpus update after 2.476171757s
    mutation: github_issue:<owner:"golang" repo:"go" number:28103 comment_status:<server_date:<seconds:1539629209 > > > 

              some internal info about server time of latest comments
              ... still haven't received the "a human has unlabeled this issue" mutation yet, so...
label-Builders [dry-run]
	https://golang.org/issue/28103  x/build/version: add $GOROOT/bin to $PATH when running go run/generate
2018/10/15 14:46:49 gopherbot ran in 9.168746ms
2018/10/15 14:46:49 Updating data from log *maintner.netMutSource ...
2018/10/15 14:46:49 Downloading 130 bytes of https://maintner.golang.org/logs/41 ...
2018/10/15 14:46:49 wrote /Users/dmitshur/Library/Caches/golang-maintner/0041.growing.mutlog
2018/10/15 14:46:49 Reloaded data from log *maintner.netMutSource.
2018/10/15 14:46:49 got corpus update after 154.471501ms
    mutation: github_issue:<owner:"golang" repo:"go" number:28103 event:<id:1904921842 event_type:"unlabeled" actor_id:1924134 created:<seconds:1539629204 > label:<name:"Builders" > > event:<id:1904921913 event_type:"labeled" actor_id:8566911 created:<seconds:1539629206 > label:<name:"Builders" > > event_status:<server_date:<seconds:1539629209 > > > 

              finally it received the "there was an unlabel event" mutation, so gopherbot knows it shouldn't
              fight humans, hence it no longer tries to apply the "Builders" label to the issue
2018/10/15 14:46:49 gopherbot ran in 8.387027ms
2018/10/15 14:46:49 Updating data from log *maintner.netMutSource ...

What happens is the mutation for "this issue no longer has a label [that it should have according to the rules]" comes in a few seconds before the "there was an unlabel event for this issue [because a human has unlabeled it]" mutation. gopherbot acts immediately after receiving the first mutation and re-applies the label. (It tries to apply the Builders label after second mutation too, but it does nothing since the label is already applied.)

Suggested Workaround

I'm going to file a maintner issue for the underlying problem (edit: opened #28226), but I propose the following workaround for gopherbot specifically until then: give it a slower (gopher-like) reaction time, rather than having it garden issues instantaneously. A 5 second buffer should make it avoid acting during a short inconsistent maintner view of the world.

I'm going to test this approach and confirm it works as expected.

@dmitshur

This comment has been minimized.

Copy link
Member

dmitshur commented Oct 16, 2018

I tested the approach (locally) and it worked in solving the problem. Going to send a CL. Edit: Sent CL 142362.

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Oct 16, 2018

Change https://golang.org/cl/142362 mentions this issue: cmd/gopherbot: reduce gardening reaction time

@dmitshur

This comment has been minimized.

Copy link
Member

dmitshur commented Oct 16, 2018

Starting to have second thoughts about the proposed workaround. In the CL 142362 commit message, I wrote:

The goal of this change is to reduce occurrences of gopherbot
fighting people (issue #21312, which causes confusion
and is distracting), without introducing serious drawbacks.
A slightly slower reaction time for gardening is more human-like
(or gopher-like) and should not cause problems.

It's true that a slower reaction time for gardening new issues is unlikely to be problematic. However, I've realized gopherbot is often used in other ways, where it being very responsive is important.

For example, I notice people sometimes apply the NeedsFix label and rely on gopherbot to remove the NeedsDecision label. Or when asking gopherbot to backport issues.

image

If gopherbot ends up taking 15-30 seconds to respond (conceivable if there's a lot of continuous corpus activity), it may seem that it's broken or lagging, and as a result quite detrimental to the user experience and the ability to rely on it.

@josharian

This comment has been minimized.

Copy link
Contributor Author

josharian commented Oct 16, 2018

@dmitshur what if gopherbot immediately added itself as an assignee of the issue, but didn’t take action on it for 60sec, and then removed itself as an assignee. Then there’d be immediate feedback “gopherbot is listening!”.

@mark-rushakoff

This comment has been minimized.

Copy link
Contributor

mark-rushakoff commented Dec 6, 2018

There's another documentation label fight at #29127.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.