-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reapply [workflows] Split pr-code-format into two parts to make it more secure (#78215) #80495
Conversation
@llvm/pr-subscribers-github-workflow Author: Tom Stellard (tstellar) ChangesAfter bc06cd5 we started pulling code-fromat-helper.py from the pull request branch, but if the pull request branch is not up-to-date, then the job will fail, because this workflow is incompatible with the old version of the script. In order to fix this, we need to pull the script from the main branch like we were doing before to ensure that it always gets the latest version of the script. Full diff: https://github.com/llvm/llvm-project/pull/80495.diff 1 Files Affected:
diff --git a/.github/workflows/pr-code-format.yml b/.github/workflows/pr-code-format.yml
index 1475d872498d4..3aa40a29852c9 100644
--- a/.github/workflows/pr-code-format.yml
+++ b/.github/workflows/pr-code-format.yml
@@ -25,6 +25,20 @@ jobs:
separator: ","
skip_initial_fetch: true
+ # We need to pull the script from the main branch, so that we ensure
+ # we get a version of the script that supports the --wirte-comment-to-file
+ # option.
+ - name: Fetch code formatting utils
+ uses: actions/checkout@v4
+ with:
+ reository: ${{ github.repository }}
+ ref: ${{ github.base_ref }}
+ sparse-checkout: |
+ llvm/utils/git/requirements_formatting.txt
+ llvm/utils/git/code-format-helper.py
+ sparse-checkout-cone-mode: false
+ path: code-format-tools
+
- name: "Listed files"
env:
CHANGED_FILES: ${{ steps.changed-files.outputs.all_changed_files }}
@@ -42,10 +56,10 @@ jobs:
with:
python-version: '3.11'
cache: 'pip'
- cache-dependency-path: 'llvm/utils/git/requirements_formatting.txt'
+ cache-dependency-path: 'code-format-tools/llvm/utils/git/requirements_formatting.txt'
- name: Install python dependencies
- run: pip install -r llvm/utils/git/requirements_formatting.txt
+ run: pip install -r code-format-tools/llvm/utils/git/requirements_formatting.txt
- name: Run code formatter
env:
@@ -58,7 +72,7 @@ jobs:
# explicitly in code-format-helper.py and not have to diff starting at
# the merge base.
run: |
- python ./llvm/utils/git/code-format-helper.py \
+ python ./code-format-tools/llvm/utils/git/code-format-helper.py \
--write-comment-to-file \
--token ${{ secrets.GITHUB_TOKEN }} \
--issue-number $GITHUB_PR_NUMBER \
|
Wouldn't any PR that has an updated version of |
…re secure (llvm#78215) Actions triggered by pull_request_target events have access to all repository secrets, so it is unsafe to use them when executing untrusted code. The pr-code-format workflow does not execute any untrusted code, but it passes untrused input into clang-format. An attacker could use this to exploit a flaw in clang-format and potentially gain access to the repository secrets. By splitting the workflow, we can use the pull_request target which is more secure and isolate the issue write permissions in a separate job. The pull_request target also makes it easier to test changes to the code-format-helepr.py script, because the version of the script from the pull request will be used rather than the version of the script from main. Fixes llvm#77142
My guess is that since the older branches had the pull_request_target version of the workflow that caused GitHub Actions to pull the workflow file from main. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the relevant line in the Github documentation:
For pull requests from a forked repository to the base repository, GitHub sends the pull_request, issue_comment, pull_request_review_comment, pull_request_review, and pull_request_target events to the base repository. No pull request events occur on the forked repository.
So it uses the workflow definition from the base branch, but then tries to use the script from the PR branch, where it doesn't exist in most cases. So pulling the scripts from the main branch should fix the issue.
LGTM, other than a couple minor nits.
github.event.workflow_run.event == 'pull_request' | ||
steps: | ||
- name: 'Download artifact' | ||
uses: actions/download-artifact@6b208ae046db98c579e8a3aa621ab581ff575935 # v4.1.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this version different than the upload artifact action in the pull_request
job? 4.3.0 vs 4.1.1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those two actions are versioned differently.
OK, I looked into this in a bit more detail: However, the issue is that we're explicitly instructing it to checkout the SHA of the "head" commit, not the "merge" commit. In fact, I think this whole process gets a lot easier if we make use the auto-generated merge commit, because then we:
Another advantage: if we fix an issue in the formatting jobs, most PRs will get the new version immediately, without the author needing to rebase or merge the main branch themselves. That said, I think it's fine to defer these fixes to a separate PR: it becomes a lot easier to test this when we're no longer using pull_request_target! |
console.log(runInfo); | ||
|
||
|
||
// Query to find the number of the pull request that triggered this job. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need all this? How can it end up with multiple associated pull requests? Or have a different baseRepository?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The associated pull requests are based off of the branch name, so if you create a pull request for a branch, close it, and then create another pull request with the same branch, then this query will return two associated pull requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, ok. Can you add comments to the code explaining that?
.github/workflows/issue-write.yml
Outdated
await comments.forEach(function (comment) { | ||
if (comment.id) { | ||
// Security check: Ensure that this comment was created by | ||
// the github-actions bot, so a malisious input won't overwrite |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: malicious
Right, and it doesn't run if there are any merge conflicts (one reason why the current design is the way it is). My point was mainly about where the workflow definition is pulled from that gets run. For some reason I thought it might have been the fork branch, but that isn't the case (based on the documentation). I agree that looking at the merge commit would make things a lot simpler. I think that's good to split out into a separate patch though so we can make sure that the current state in-tree is reliable after the split. That is another thing on my todo list. |
@jyknight @boomanaiden154 So it's OK to commit this as-is and then we switch to using the merge commit in a follow up patch? |
.github/workflows/pr-code-format.yml
Outdated
# PR for security reasons as we're using pull_request_target. Checkout | ||
# the target branch with the necessary files. | ||
# We need to pull the script from the main branch, so that we ensure | ||
# we get a version of the script that supports the --write-comment-to-file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think we should probably reword this to say something about making sure the script is updated rather than specifically mentioning the --write-comment-to-file
option, which I think would quickly become a dated comment.
I think it should be good. I would think any work being done to move this to checkout the default This just keeps the current state of things and doesn't introduce any regressions while being more secure, other than the (somewhat) edge case where there are merge conflicts, so seems fine to land to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it's fine to commit this as-is and do further cleanup in follow-on PRs.
console.log(runInfo); | ||
|
||
|
||
// Query to find the number of the pull request that triggered this job. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, ok. Can you add comments to the code explaining that?
…re secure (llvm#78215) (llvm#80495) Actions triggered by pull_request_target events have access to all repository secrets, so it is unsafe to use them when executing untrusted code. The pr-code-format workflow does not execute any untrusted code, but it passes untrused input into clang-format. An attacker could use this to exploit a flaw in clang-format and potentially gain access to the repository secrets. By splitting the workflow, we can use the pull_request target which is more secure and isolate the issue write permissions in a separate job. The pull_request target also makes it easier to test changes to the code-format-helepr.py script, because the version of the script from the pull request will be used rather than the version of the script from main. Fixes llvm#77142
- name: Fetch code formatting utils | ||
uses: actions/checkout@v4 | ||
with: | ||
reository: ${{ github.repository }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reository -> repository.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out!
…nnot (llvm#81142)" This reverts commit 124cd11. The job originally failed because any workflow run on a PR runs in a context that cannot write to the PR itself (otherwise people could damage the repo using a workflow in a PR). llvm#80495 recently added a job that is purely for commenting on issues, to solve a similair problem. This limits the damage the PR can do to adding a spam comment.
- name: 'Download artifact' | ||
uses: actions/download-artifact@6b208ae046db98c579e8a3aa621ab581ff575935 # v4.1.1 | ||
with: | ||
github-token: ${{ secrets.ISSUE_WRITE_DOWNLOAD_ARTIFACT }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tstellar, we use clang-format check in downstream repository and this split "breaks" some functionality. According to my understanding, we should have ISSUE_WRITE_DOWNLOAD_ARTIFACT
secret available to GitHub Actions in our repository in order to have comment from clang-format action. Could you clarify what permissions should be granted to ISSUE_WRITE_DOWNLOAD_ARTIFACT
secret, please?
BTW, why can't we use GITHUB_TOKEN
secret?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The token needs actions:read
permissions. You can't use 'GITHUB_TOKEN`, because it's only scoped for the current workflow run. See https://github.com/actions/download-artifact?tab=readme-ov-file#download-artifacts-from-other-workflow-runs-or-repositories
I do have a workaround that will allow us to download the artifact without a token, but that is part of a larger PR here. If you think it would be useful to avoid using the token, I can try to pull out this change into its own PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying!
I do have a workaround that will allow us to download the artifact without a token, but that is part of a larger PR here. If you think it would be useful to avoid using the token, I can try to pull out this change into its own PR.
Do you plan to use "Unprivileged Download Artifact" in issue-write? If no, we will use the token.
The only problem I have with introducing a new token is that it's kind of unexpected and requires some efforts to investigate why things are broken. If we can make it work out-of-box and keep the upstream project secure, it would be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you plan to use "Unprivileged Download Artifact" in issue-write? If no, we will use the token.
Yeah, I was planning to to do this eventually. I'll start looking into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for your help with this. I really appreciate the efforts to make GitHub Actions CI useful for downstream forks. 👍
Actions triggered by pull_request_target events have access to all repository secrets, so it is unsafe to use them when executing untrusted code. The pr-code-format workflow does not execute any untrusted code, but it passes untrused input into clang-format. An attacker could use this to exploit a flaw in clang-format and potentially gain access to the repository secrets.
By splitting the workflow, we can use the pull_request target which is more secure and isolate the issue write permissions in a separate job. The pull_request target also makes it easier to test changes to the code-format-helepr.py script, because the version of the script from the pull request will be used rather than the version of the script from main.
Fixes #77142