Skip to content
This repository has been archived by the owner on Feb 20, 2023. It is now read-only.

Way to approve contributer pull requests for automation #9295

Closed
kbrosnan opened this issue Mar 20, 2020 · 16 comments
Closed

Way to approve contributer pull requests for automation #9295

kbrosnan opened this issue Mar 20, 2020 · 16 comments
Labels
eng:automation Build automation, Continuous integration, .. P1 Current sprint wontfix

Comments

@kbrosnan
Copy link
Contributor

kbrosnan commented Mar 20, 2020

Do to some configuration choices we do not run pull requests from people outside an approved group. It would be good to have some way to mark the PR as approved for testing.

Possibly a label or comment that triggers the automation?

Right now the only way I know of to run the automation would be to fork the users PR and submit a request under an approved member.

┆Issue is synchronized with this Jira Task

@kbrosnan kbrosnan added eng:automation Build automation, Continuous integration, .. needs:group-triage labels Mar 20, 2020
@github-actions github-actions bot added the needs:triage Issue needs triage label Mar 20, 2020
@liuche liuche self-assigned this Mar 24, 2020
@liuche liuche removed the needs:triage Issue needs triage label Mar 27, 2020
@liuche
Copy link
Contributor

liuche commented Mar 27, 2020

Having some discussion on slack with @JohanLorenzo and @grigoryk about the automation solutions around here.

There are two stages of goals that we have here:

  1. Short-term: we can trigger CI for contributor PRs (this issue). Speed of CI completing is not a big concern.
  2. Longer-term: adding reliable, fast(?!!) autolanding workflow

For 1, we could just add bors back, and only use bors for triggering contributor builds. @sblatz do you know how much work that would be? If it's quick, we could just throw that back in right now.

For 2, we could look into a different autoland bot, or we could put more time into bors to make it work for Fenix. (bors works for other projects, like Servo because it doesn't use GH Checks, and AC bc no long-running UI tests that fail or bors times out on.)

  • Add a single "completion" task that bors can check (see discussion) that doesn't block merging if we want to try this out
  • ...?

@JohanLorenzo
Copy link
Contributor

  1. Agreed to bring bors back! (marking Fix bors problems or decide to remove it #6139 to link both issues)

  2. I think Servo doesn't use the same bors as we did. There are several bors projects:

That said, the fact that A-C works is maybe a hint that we need to make UI tests faster. #6117 will help bors to report errors if one comes up, but it won't make the overall execution time faster.

@boek boek added P1 Current sprint and removed needs:group-triage labels Mar 31, 2020
@liuche
Copy link
Contributor

liuche commented Apr 1, 2020

Sounds good, @JohanLorenzo! Do you know if getting bors to run again is as simple as putting the bors.toml back in from the removal PR, and not even adding it to the TC yaml? If we only want to manually run it, we probably don't need to to hook it into any events.

@JohanLorenzo
Copy link
Contributor

I guess so! I'm positive we need the TC changes. Otherwise, too many jobs will be scheduled by bors. I think we also need to set https://github.com/mozilla-mobile/fenix/pull/6157/files#diff-2f8c9db9253b64f9a332c09f492cd54eL1 to an empty list and lower https://github.com/mozilla-mobile/fenix/pull/6157/files#diff-2f8c9db9253b64f9a332c09f492cd54eL16 down to 0.

@JohanLorenzo
Copy link
Contributor

For the record, @tarekziade wanted to make this simple change #9826 and his PR remain stuck. Do you need me to help with bors or would you all prefer to handle it?

liuche added a commit to liuche/fenix that referenced this issue Apr 10, 2020
JohanLorenzo added a commit that referenced this issue Apr 10, 2020
… PRs (#9843)

Co-authored-by: Johan Lorenzo <jlorenzo@mozilla.com>
@hwine
Copy link

hwine commented Apr 10, 2020

Unfortunately, we do not want to run bohrs on sensitive repos. (See bug 1599394#c5 for latest details.) I've rejected @JohanLorenzo request for installation. We do not have a good way to make this information more widely known at present.

Mergify does have conditional approval.

In mozilla-mobile, there is also the option of using GitHub Actions -- I have no idea if there is a solution there. We have not done an RRA on GitHub Actions, as no one had yet requested one.

JohanLorenzo added a commit that referenced this issue Apr 10, 2020
liuche pushed a commit that referenced this issue Apr 10, 2020
@liuche liuche mentioned this issue Apr 13, 2020
32 tasks
liuche added a commit that referenced this issue Apr 13, 2020
… PRs (#9843)

Co-authored-by: Johan Lorenzo <jlorenzo@mozilla.com>
liuche pushed a commit that referenced this issue Apr 13, 2020
@liuche
Copy link
Contributor

liuche commented Apr 16, 2020

Hey @JohanLorenzo I'm trying to think of other options and whether they would work. Let me run some by you:

  • Can we have CI run on a specific label like "run-CI"? Contributors can't add labels. (variation: editing the title, I don't remember why this wasn't possible)
  • Is there a way to have a separate set of scopes for CI that is run if the PR author isn't part of the org?
  • I don't remember the original reason that we turned off "editing the title" triggers CI. Do you know who knows more about this, so I can try to see what alternatives we can take? I'm not sure the threat is decreased because the workaround is to pull down the PR and then repush it, which just takes more time to do.

Thanks again for your help on this issue! Also happy to chat over a video call if it's easier to explain/show in person.

@JohanLorenzo
Copy link
Contributor

  • Can we have CI run on a specific label like "run-CI"? Contributors can't add labels. (variation: editing the title, I don't remember why this wasn't possible)

At first, I don't think so. As far as I know, labels are different from a project to another. This means we need to store the list of labels to watch in the fenix repo (for instance). Any contributor who opens a PR can edit any file in a repo, including the list of labels to watch. Thus, even though they can't add labels, they can easily by-pass the check by shrinking the list down to [].

  • Is there a way to have a separate set of scopes for CI that is run if the PR author isn't part of the org?

I'm sorry, not as far as I know. Although, @owlishDeveloper is the person who owns the integration between Github and Taskcluster. I defer to her 🙂

  • I don't remember the original reason that we turned off "editing the title" triggers CI. Do you know who knows more about this, so I can try to see what alternatives we can take? I'm not sure the threat is decreased because the workaround is to pull down the PR and then repush it, which just takes more time to do.

I know this one 😃. The simplest wat to retrigger CI is by closing and reopening a PR. No text field to edit, no commits to push again, just a button to click 👍.
Context: The original problem was https://bugzilla.mozilla.org/show_bug.cgi?id=1608153. I seized the opportunity to make all mobile projects behave the same.

Let me know if you need more information or have any more ideas. I'll be happy to help 😃

@liuche
Copy link
Contributor

liuche commented Apr 22, 2020

Hmmm....I may need to read more about TC/GH actions.

Seems like we need a task that exists outside of our repo, or at least a step outside of our repo. @JohanLorenzo a few more questions:

  • Is there a place I can see what sorts of Taskcluster policies are available? Maybe there is a different one that we could use? (possible as a separate task)
  • Along that vein, what is stopping someone right now from editing the taskcluster policy pullRequests: collaborators from above to not use include that limitation? (e.g. could they delete the pullRequests line, or use a different policy?)
  • Would it be possible for me to build a TC cron job (or trigger-able hook with mobile scopes, so it's more manual) that has a different policy, that, say, looks for a particular label in a repo, and then runs those CI? Something that wouldn't live in our repo 🤔 (this might be more of a pain for us to maintain, but I could see other repos needing this ability to trigger CI for non-GH-collaborators too!)
  • Now for weirder ideas: What about chaining a task - something in the taskcluster.yaml file in the Fenix repo that should only allow collaborators (maybe by pushing a commit to the contributor branch?), but that kicks off a second task that independently verifies that the caller of the first task is in the collaborator group, before calling back into the Fenix repo?

These ideas are getting a little weirder, but it's because I'm a little fuzzy here on what TC knows or can learn - maybe I'm assuming TC is more magical than it is in reality!

@hwine
Copy link

hwine commented Apr 22, 2020

If I'm understanding the goal of this issue, we're trying to find a workable way to maintain the security invariant that no non-committer-approved-changes are subject to the full CI suite.

Assuming there aren't other goals here, maybe it's worth getting back to basics (from a kibitzer's viewpoint:

Right now the only way I know of to run the automation would be to fork the users PR and submit a request under an approved member.

Fwiw, there really isn't a fork needed. The manual steps, which could be scripted, are:

  • pull the contributor's PR into a local repo of someone with write perms to the production repo
  • push that branch (perhaps renamed) to the production repo, which will trigger a full CI build
  • drop a comment in the real PR pointing to the results of the proxy branch

There's likely a detail missing here, but it may not be outrageous to have the committer-who-decides-the-PR-is-safe-for-full-CI copy/paste the PR's location/sha into a script and have things done. (or, it could be a bot-message in the PR comment chain). That person needs to take some action to indicate their approval.

@liuche
Copy link
Contributor

liuche commented Apr 22, 2020

@hwine currently we do this workaround, and Kevin mentioned it in his initial comment, but this is a huge pain point for the team. Let me explain some of these a little more:

  • At any given time we have around 10-20 contributor PRs open. There's significant overhead to doing this for every PR
  • There's also the added friction that a lot of these PRs from new contributors might fail CI or lint immediately - if eng has done the review in the original branch, and then cloned and re-pushed, it requires back-and-forth to happen in 2 different PRs, and likely 2 reviews (one for content, and then one for nits/CI/lint). It's hard to ask contributors to follow along with our convoluted process (or for eng to feel bought in)
  • If we just automatically move everything over to run CI/lint first (after a code sanity check), then there is still a lot of overhead. The engineer mentor needs to either a) cherry pick any new changes over from the first contributor branch, or b) engineers themselves make the commits to the cloned branch (bc the contributors don't have read/write privs on that branch).

Our ideal proposal would be, we could do a quick lookover to make sure nothing sketchy and then trigger CI manually.

@hwine
Copy link

hwine commented Apr 22, 2020

I think we're saying the same thing, so obviously I didn't express myself well. My apologies. I mixed security concerns with implementation details. Let me stick to the security concerns, as we're in full agreement about the UX & friction issues.

From the security pov, the concern about running CI on non-committer PRs is the leakage of secrets (usually credentials for external services). There is no issue with running no-secret-needed steps on every commit, such as linters, compilers, etc. (Some CI tools automatically provide this by removing secrets when processing a PR from a non-committer. The build simply fails after all "public" steps.)

I am using the term "full CI" to mean "CI that requires secrets to run". There are suggestions in this thread about applying some attribute/flag/label to the PR to signal that "it's okay to run full CI on this PR"

My position is that the "trigger full CI" must be done for specific commit references (i.e. sha values, not HEAD) as long as a non-committer has write access to the PR branch. I.e. "okay to run full CI" is not an attribute of a PR, it is an attribute of a specific commit that is contained in the PR. And, yes, that seems to require a manual intervention for each full CI run. (I would love to be shown I'm wrong about the manual intervention requirement.)

@liuche
Copy link
Contributor

liuche commented Apr 23, 2020

Okay thanks - those guidelines make sense.

@JohanLorenzo I'm still unsure on how Taskcluster policy works, but could we make a separate policy for different steps in the TC chain, like Hal suggests? so the linter and compile are fine, but maybe the UI tests or unit tests don't run. To be clear, that's only worth it if splitting those non-secrets tasks are easy, because in the long run, we'd like to have the whole CI chain run if the engineer wants to trigger it.

And if that's not an option, how about one of the ideas I had above? (Maybe the label + manual trigger one - and that label should be removed after restarting CI so we don't accidentally trigger anything we don't want to....but as I explain the risks it seems less like a good idea 😝 ).

@JohanLorenzo
Copy link
Contributor

Hi there!

Sadly, we cannot separate policies. The only values we can use are these ones: https://docs.taskcluster.net/docs/reference/integrations/github/taskcluster-yml-v1#who-can-trigger-jobs-. TL;DR: The policy is enforced before the decision task runs. The decision task can be edited from the PR but the policy can't (it has to land on master, first). The decision task is in charge of deciding whether "full CI" or "partial CI without UI tests" is triggered. So, if the policy allows the decision task to run, an attacker can edit the decision task to run the "full CI". This is one of the limit of the Taskcluster model: it's great to edit when you're a trusted user, but we have no simple way to let non-trusted users run things without being sure they didn't edit anything.

* Is there a place I can see what sorts of [Taskcluster policies](https://github.com/mozilla-mobile/fenix/blob/master/.taskcluster.yml#L4) are available? 

Replied above, just pasting the link here too, for the sake of completion 🙂 https://docs.taskcluster.net/docs/reference/integrations/github/taskcluster-yml-v1#who-can-trigger-jobs-

* Along that vein, what is stopping someone right now from editing the taskcluster policy `pullRequests: collaborators` from above to not use include that limitation? 

That's a great question: that piece of config is the only read from the master branch and not the PR itself. That's how we ensure it's not modified.

* Would it be possible for me to build a TC cron job (or trigger-able hook with mobile scopes, so it's more manual) that has a different policy, that, say, looks for a particular label in a repo, and then runs those CI? 

That's interesting, I hadn't thought of that! I think it's technically feasible but I suspect it's not going to be a great experience. Every time someone wants to schedule such a job, they'll have to manually edit the hook with the right repo URL, the right branch, and the right revision. The hook will eventually be reset to its initial values, so that may require this someone to re-enter the same values once again.

* Now for weirder ideas: What about chaining a task - something in the taskcluster.yaml file in the Fenix repo that should only allow collaborators (maybe by pushing a commit to the contributor branch?), but that kicks off a second task that independently verifies that the caller of the first task is in the `collaborator` group, before calling **back** into the Fenix repo?

If I understand correctly, an attacker can edit the second task to make it dummy and return true.

All these answers combined make me realize we probably need to bump the priority of https://bugzilla.mozilla.org/show_bug.cgi?id=1601752. I don't see any viable workaround we can use in the meantime. I'll start an email thread.

@rpappalax
Copy link
Contributor

Okay thanks - those guidelines make sense.

@JohanLorenzo I'm still unsure on how Taskcluster policy works, but could we make a separate policy for different steps in the TC chain, like Hal suggests? so the linter and compile are fine, but maybe the UI tests or unit tests don't run. To be clear, that's only worth it if splitting those non-secrets tasks are easy, because in the long run, we'd like to have the whole CI chain run if the engineer wants to trigger it.

And if that's not an option, how about one of the ideas I had above? (Maybe the label + manual trigger one - and that label should be removed after restarting CI so we don't accidentally trigger anything we don't want to....but as I explain the risks it seems less like a good idea 😝 ).

If possible, I'd like to discourage this option. Having test not run for a given commit, defers big headaches later on.

There's are several options being considered here which also seem to be revisiting previous discussions. Does it make sense to meet up on this and summarize what those are, what is and what isn't possible?

liuche added a commit to liuche/fenix that referenced this issue Sep 10, 2020
…on contributor PRs (mozilla-mobile#9843)

Co-authored-by: Johan Lorenzo <jlorenzo@mozilla.com>
JohanLorenzo added a commit that referenced this issue Sep 21, 2020
… PRs (#9843) (#14967)

Co-authored-by: Johan Lorenzo <jlorenzo@mozilla.com>
@stale
Copy link

stale bot commented Jun 15, 2021

See: #17373 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jun 15, 2021
@stale stale bot closed this as completed Jun 22, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
eng:automation Build automation, Continuous integration, .. P1 Current sprint wontfix
Projects
None yet
Development

No branches or pull requests

6 participants