Skip to content

Conversation

samestep
Copy link
Contributor

@samestep samestep commented Mar 25, 2021

Context: As of #53719 (which replaced an equivalent solution from before), we use the ref input for actions/checkout to checkout the PR tip instead of a merge commit.

  • Upside: This means that if the PR author bases off of viable/strict instead of master, the repo checkout for GHA runs on the PR won't pull in stuff from the latest (often-broken) master.
  • Downside: This breaks atomicity, because even though the checkout comes from the PR tip, the actual workflow YAML files are still pulled from the merge commit of the PR with master. So if master adds a new workflow and some scripts that that workflow uses, any PR based on a commit behind that will fail in GHA. A couple examples:

Instead of breaking atomicity by checking out something besides the merge commit, this PR resolves the problem by forcing the merge commit to be the same as the PR head commit.

Specifically, it adds a workflow which listens for activity on PRs against master (such as when they're opened or when new commits are pushed), and updates the PR's base branch to be base/hhhhhhh, where hhhhhhh is the 7-digit prefix of the SHA1 hash of the merge-base of the PR head with pytorch:master. It also adds the "base pinned" label to the PR (if it doesn't have the label already). Because this new workflow is triggered by pull_request_target rather than pull_request, it must be merged before it can take effect. For demonstration purposes, I have manually set the base branch and "base pinned" label for this PR, but those would both be done automatically after it is landed.

Important: Before this PR is merged, a GitHub token called GITHUB_BASE_BRANCH_TOKEN must be added to the pytorch/pytorch repo. The new workflow uses said token to add a label in such a way that it will trigger a followup pull_request workflow run.

The timeline for a typical PR will look something like this:

  • user opens PR against master
  • pull_request workflows don't run because we restricted them to only run once they're pinned (or if they're ghstack PRs, which are always pinned anyway)
  • the new workflow pins the PR's base branch, and adds the "base pinned" label which triggers another pull_request event (this time of type labeled)
  • the pull_request workflows now run on that initial commit of the PR, using the PR head as the merge commit since the base branch has been pinned
  • user pushes new commits, possibly rebasing onto a different master commit
  • the new workflow switches the PR's base branch as necessary (only if the PR was rebased), but doesn't change the labels (assuming no one removed the "base pinned" label)
    • this is important because it ensures that the "Commits" and "Files" tabs only show the changes actually introduced by the PR, and not any that are already in the master history
  • on each commit after the initial one, the pull_request workflow runs as normal, with type synchronize, because the base branch matches base/* so even if the user rebases onto a more recent commit from master, the PR head and merge commits will still be exactly the same

A few caveats of this implementation:

  • if the user rebases on to a commit older than the one they were previously based on, the pull_request run immediately after that (before the "Pin base branch" workflow has a chance to run again) will use a merge commit different from the PR's head commit
    • this shouldn't usually happen since people usually rebase onto newer commits rather than older, and even if it does happen, it will only be for one commit: if the user pushes more commits after rebasing, the problem will go away because the "Pin base branch" workflow will have already fixed the problem
  • if anyone adds a different tag besides "base pinned", CI will be triggered on the PR even though no changes were pushed
    • this shouldn't matter as far as resources thanks to Add GHA to cancel redundant GHA workflows except on master #54689, and might actually be a good thing if we're already planning on using labels to decide which workflows to run on a PR
    • however, one other label that gets added to every PR is "cla signed", so this would add some noise
    • it may be possible to avoid this problem by using if to check the label that was added and cancel jobs if it's not "base pinned", but that would add more boilerplate
  • according to this script from Stack Overflow, there are currently 165 commits in PyTorch which require at least 8 digits of the hash to disambiguate, 12 of which actually require 9 digits; is 7 digits OK for now, or should we proactively set this workflow to use 9 or more digits?
    • even if we merge this as-is, there shouldn't really be a downside to increasing the number of digits in the hash later
  • this is a big one: the "Import to Phabricator" button currently doesn't work on this PR
  • a medium one: apparently if you rebase a PR whose base branch is pinned, it adds many code owners as reviewers, who then have to be manually removed

Other questions for reviewers:

  • is it OK that now these pull_request workflows are limited to PRs with base branch base/* or gh/**? what about PRs against other branches, such as release branches?

Test plan:

@janeyx99 and I tested this in the same GitHub repo used to test #54685, including with PRs from forks.

@samestep samestep added the base pinned The base branch for this pull request has been pinned to a specific commit, for determinism in CI. label Mar 25, 2021
@samestep samestep requested review from a team, janeyx99 and seemethere March 25, 2021 17:13
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Mar 25, 2021

💊 CI failures summary and remediations

As of commit 364a7e5 (more details on the Dr. CI page):


  • 1/2 failures possibly* introduced in this PR
    • 1/1 non-scanned failure(s)
  • 1/2 broken upstream at merge base 9e6877c from Mar 27 until Mar 30

🚧 1 fixed upstream failure:

These were probably caused by upstream breakages that were already fixed.

Please rebase on the viable/strict branch (expand for instructions)

If your commit is older than viable/strict, run these commands:

git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

@facebook-github-bot
Copy link
Contributor

@bigfootjon has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@codecov
Copy link

codecov bot commented Mar 25, 2021

Codecov Report

Merging #54693 (364a7e5) into base/9e6877c (9e6877c) will increase coverage by 0.00%.
The diff coverage is n/a.

@@              Coverage Diff              @@
##           base/9e6877c   #54693   +/-   ##
=============================================
  Coverage         77.45%   77.45%           
=============================================
  Files              1894     1894           
  Lines            186403   186403           
=============================================
+ Hits             144374   144383    +9     
+ Misses            42029    42020    -9     

@samestep samestep requested a review from a team March 26, 2021 14:31
@walterddr
Copy link
Contributor

Looks good to me for this approach. Have a couple of questions and potential follow ups.

The timeline for a typical PR will look something like this:

  • user opens PR against master
  • pull_request workflows don't run because we restricted them to only run once they're pinned (or if they're ghstack PRs, which are always pinned anyway)

Have we tried this workflow on a ghstack PR. Specifically: Would ghstack PRs have the "base pinned" label added by this workflow? if not do we plan to implement this in ghstack or modify this workflow?

...

  • user pushes new commits, possibly rebasing onto a different master commit
  • the new workflow switches the PR's base branch as necessary (only if the PR was rebased), but doesn't change the labels (assuming no one removed the "base pinned" label)

will we advance the current base branch tip to the new master commit or do we create a new base branch? (e.g. do we increase # of branches on the pytorch repo?

...
A few caveats of this implementation:

  • if the user rebases on to a commit older than the one they were previously based on, the pull_request run immediately after that (before the "Pin base branch" workflow has a chance to run again) will use a merge commit different from the PR's head commit

I dont think we need to worry about this, because a typical git rebase on an older commit will be a no-op as it will rebase all commits newer than the old commit hash onto the new branch if they are both on master.

@samestep
Copy link
Contributor Author

@walterddr

Have we tried this workflow on a ghstack PR. Specifically: Would ghstack PRs have the "base pinned" label added by this workflow? if not do we plan to implement this in ghstack or modify this workflow?

The "Pin base branch" workflow only runs when the PR's base branch is already master or base/*, so this doesn't affect ghstack PRs (since their base branch always matches gh/**).

will we advance the current base branch tip to the new master commit or do we create a new base branch? (e.g. do we increase # of branches on the pytorch repo?

I don't understand what you mean here; could you clarify?

I dont think we need to worry about this, because a typical git rebase on an older commit will be a no-op as it will rebase all commits newer than the old commit hash onto the new branch if they are both on master.

Agreed.

@walterddr
Copy link
Contributor

The "Pin base branch" workflow only runs when the PR's base branch is already master or base/*, so this doesn't affect ghstack PRs (since their base branch always matches gh/**).

in this case, who would be the one that tags base pinned to ghstack PRs? according to

...

  • the new workflow pins the PR's base branch, and adds the "base pinned" label which triggers another pull_request event (this time of type labeled)

This means base pinned tag needs to be added to trigger another pull_request event right?


I don't understand what you mean here; could you clarify?

For example currently this PR is based off pytorch:base/6f8328e, if you rebase against a later commit on master, say ( pytorch:base/abcdefg, will you delete pytorch:base/6f8328e?

@samestep
Copy link
Contributor Author

samestep commented Mar 26, 2021

in this case, who would be the one that tags base pinned to ghstack PRs? according to

...

  • the new workflow pins the PR's base branch, and adds the "base pinned" label which triggers another pull_request event (this time of type labeled)

This means base pinned tag needs to be added to trigger another pull_request event right?

The "Pin base branch" workflow adds that label automatically, which is why it requires a special GITHUB_BASE_BRANCH_TOKEN separate from the standard GITHUB_TOKEN (otherwise the followup pull_request event wouldn't trigger another workflow).

For example currently this PR is based off pytorch:base/6f8328e, if you rebase against a later commit on master, say ( pytorch:base/abcdefg, will you delete pytorch:base/6f8328e?

No, because it is possible that there are other PRs still using that same branch as their base (which is one of the reasons @seemethere suggested using hash prefixes instead of PR numbers for this).

@seemethere
Copy link
Member

Perhaps a good follow up to this PR would be to create a workflow to clean up these base branches when there’s no longer pull requests associated with them?

@walterddr
Copy link
Contributor

The "Pin base branch" workflow adds that label automatically, which is why it requires a special GITHUB_BASE_BRANCH_TOKEN separate from the standard GITHUB_TOKEN (otherwise the followup pull_request event wouldn't trigger another workflow).

Hmm. I guess i didn't express my question correctly. let me rephrase. as you mentioned,

  1. this workflow only runs on master or base/*,
  2. this workflow automatically adds the base pinned tags.
  3. ghstack bases are always gh/*.

and I further assume based pinned tag is important to trigger other GHA workflows in the future

My question is,

  1. since ghstack PRs are based off gh/* base branches, this workflow doesn't automatically add the base pinned tags to those ghstack PRs, is that correct?
  2. if Q1 is correct, do we expect ghstack PR authors to add them manually?

@walterddr
Copy link
Contributor

Perhaps a good follow up to this PR would be to create a workflow to clean up these base branches when there’s no longer pull requests associated with them?

yes. but I am not sure if base/x got updated to base/y, we can delete base/x? since base/x might still be used by other PRs?

@samestep
Copy link
Contributor Author

yes. but I am not sure if base/x got updated to base/y, we can delete base/x? since base/x might still be used by other PRs?

Not necessarily, but there's probably a way to check and see whether or the branch is still being used by other PRs (not sure about race conditions though).

@samestep
Copy link
Contributor Author

Hmm. I guess i didn't express my question correctly. let me rephrase. as you mentioned,

  1. this workflow only runs on master or base/*,
  2. this workflow automatically adds the base pinned tags.
  3. ghstack bases are always gh/*.

and I further assume based pinned tag is important to trigger other GHA workflows in the future

My question is,

  1. since ghstack PRs are based off gh/* base branches, this workflow doesn't automatically add the base pinned tags to those ghstack PRs, is that correct?
  2. if Q1 is correct, do we expect ghstack PR authors to add them manually?

Ah, I see what you're saying. The answer is that the point of the "base pinned" tag is just to kick off CI, so we don't need it for ghstack PRs.

facebook-github-bot pushed a commit that referenced this pull request Mar 29, 2021
Summary:
We've been using [pytorch/add-annotations-github-action](https://github.com/pytorch/add-annotations-github-action) to add annotations to PRs when they fail Flake8 or clang-tidy. Up until now, though, that functionality has only worked on PRs in pytorch/pytorch itself, not on PRs from forks. This PR fixes that using a technique from [this GitHub blog post](https://securitylab.github.com/research/github-actions-preventing-pwn-requests/) (also linked in a comment in this diff).

Pull Request resolved: #54779

Test Plan: janeyx99 and I tested this in the same GitHub repo used to test #54685 and #54693, including with PRs from forks.

Reviewed By: walterddr

Differential Revision: D27364777

Pulled By: samestep

fbshipit-source-id: a830d372d7bb3b2529fc633b707b44f2b6cf9baa
@facebook-github-bot
Copy link
Contributor

@samestep has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@samestep
Copy link
Contributor Author

Closing this for now (in favor of #54967) because it just introduces too many quirks.

@samestep samestep closed this Mar 30, 2021
facebook-github-bot pushed a commit that referenced this pull request Mar 30, 2021
Summary:
PRs #53652 and #54693 attempted to increase the consistency of our choice of commit (head vs merge) for CI on PRs, and have so far been unsuccessful. This PR takes a less ambitious approach to the problem by clarifying the choice in one specific way (see the following paragraph) and documenting it in `CONTRIBUTING.md`.

In addition to documentation, this PR also removes the current behavior of our GHA jobs that checkout the PR tip instead of the merge commit. At first glance, this behavior seems to increase consistency (by eliminating the special-case for `ghstack` PRs), but in reality, it actually just means that for non-`ghstack` PRs, the question "Which commit is used in CI?" has *two* answers instead of just one; see the description of #53652 for more details.

Once merged, this PR will unblock other PRs that add modify our GHA workflows in breaking ways, such as #54737.

Pull Request resolved: #54967

Test Plan: None.

Reviewed By: walterddr, seemethere

Differential Revision: D27435913

Pulled By: samestep

fbshipit-source-id: 405fb419cf015cf88107d5eb2498cfb5bcb7ce33
facebook-github-bot pushed a commit that referenced this pull request Mar 31, 2021
Summary:
We've been using [pytorch/add-annotations-github-action](https://github.com/pytorch/add-annotations-github-action) to add annotations to PRs when they fail Flake8 or clang-tidy. Up until now, though, that functionality has only worked on PRs in pytorch/pytorch itself, not on PRs from forks. This PR fixes that using a technique from [this GitHub blog post](https://securitylab.github.com/research/github-actions-preventing-pwn-requests/) (also linked in a comment in this diff).

Pull Request resolved: #54779

Test Plan: janeyx99 and I tested this in the same GitHub repo used to test #54685 and #54693, including with PRs from forks.

Reviewed By: seemethere, xuzhao9

Differential Revision: D27470866

Pulled By: samestep

fbshipit-source-id: d165b8e875d412b910592aa897163fb938d23365
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

base pinned The base branch for this pull request has been pinned to a specific commit, for determinism in CI. cla signed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants