Reapply [workflows] Split pr-code-format into two parts to make it more secure (#78215) #80495

tstellar · 2024-02-02T21:06:48Z

Actions triggered by pull_request_target events have access to all repository secrets, so it is unsafe to use them when executing untrusted code. The pr-code-format workflow does not execute any untrusted code, but it passes untrused input into clang-format. An attacker could use this to exploit a flaw in clang-format and potentially gain access to the repository secrets.

By splitting the workflow, we can use the pull_request target which is more secure and isolate the issue write permissions in a separate job. The pull_request target also makes it easier to test changes to the code-format-helepr.py script, because the version of the script from the pull request will be used rather than the version of the script from main.

Fixes #77142

llvmbot · 2024-02-02T21:07:16Z

@llvm/pr-subscribers-github-workflow

Author: Tom Stellard (tstellar)

Changes

After bc06cd5 we started pulling code-fromat-helper.py from the pull request branch, but if the pull request branch is not up-to-date, then the job will fail, because this workflow is incompatible with the old version of the script.

In order to fix this, we need to pull the script from the main branch like we were doing before to ensure that it always gets the latest version of the script.

Full diff: https://github.com/llvm/llvm-project/pull/80495.diff

1 Files Affected:

(modified) .github/workflows/pr-code-format.yml (+17-3)

diff --git a/.github/workflows/pr-code-format.yml b/.github/workflows/pr-code-format.yml
index 1475d872498d4..3aa40a29852c9 100644
--- a/.github/workflows/pr-code-format.yml
+++ b/.github/workflows/pr-code-format.yml
@@ -25,6 +25,20 @@ jobs:
           separator: ","
           skip_initial_fetch: true
 
+      # We need to pull the script from the main branch, so that we ensure
+      # we get a version of the script that supports the --wirte-comment-to-file
+      # option.
+      - name: Fetch code formatting utils
+        uses: actions/checkout@v4
+        with:
+          reository: ${{ github.repository }}
+          ref: ${{ github.base_ref }}
+          sparse-checkout: |
+            llvm/utils/git/requirements_formatting.txt
+            llvm/utils/git/code-format-helper.py
+          sparse-checkout-cone-mode: false
+          path: code-format-tools
+
       - name: "Listed files"
         env:
           CHANGED_FILES: ${{ steps.changed-files.outputs.all_changed_files }}
@@ -42,10 +56,10 @@ jobs:
         with:
           python-version: '3.11'
           cache: 'pip'
-          cache-dependency-path: 'llvm/utils/git/requirements_formatting.txt'
+          cache-dependency-path: 'code-format-tools/llvm/utils/git/requirements_formatting.txt'
 
       - name: Install python dependencies
-        run: pip install -r llvm/utils/git/requirements_formatting.txt
+        run: pip install -r code-format-tools/llvm/utils/git/requirements_formatting.txt
 
       - name: Run code formatter
         env:
@@ -58,7 +72,7 @@ jobs:
         # explicitly in code-format-helper.py and not have to diff starting at
         # the merge base.
         run: |
-          python ./llvm/utils/git/code-format-helper.py \
+          python ./code-format-tools/llvm/utils/git/code-format-helper.py \
             --write-comment-to-file \
             --token ${{ secrets.GITHUB_TOKEN }} \
             --issue-number $GITHUB_PR_NUMBER \

boomanaiden154 · 2024-02-02T22:10:07Z

Wouldn't any PR that has an updated version of .github/works/pr-code-format.yml (so is actually running the code formatting job) also have an updated version of the python script, assuming they were both modified in the same commit?

…re secure (llvm#78215) Actions triggered by pull_request_target events have access to all repository secrets, so it is unsafe to use them when executing untrusted code. The pr-code-format workflow does not execute any untrusted code, but it passes untrused input into clang-format. An attacker could use this to exploit a flaw in clang-format and potentially gain access to the repository secrets. By splitting the workflow, we can use the pull_request target which is more secure and isolate the issue write permissions in a separate job. The pull_request target also makes it easier to test changes to the code-format-helepr.py script, because the version of the script from the pull request will be used rather than the version of the script from main. Fixes llvm#77142

tstellar · 2024-02-03T18:32:43Z

Wouldn't any PR that has an updated version of .github/works/pr-code-format.yml (so is actually running the code formatting job) also have an updated version of the python script, assuming they were both modified in the same commit?

My guess is that since the older branches had the pull_request_target version of the workflow that caused GitHub Actions to pull the workflow file from main.

boomanaiden154

I think this is the relevant line in the Github documentation:

For pull requests from a forked repository to the base repository, GitHub sends the pull_request, issue_comment, pull_request_review_comment, pull_request_review, and pull_request_target events to the base repository. No pull request events occur on the forked repository.

So it uses the workflow definition from the base branch, but then tries to use the script from the PR branch, where it doesn't exist in most cases. So pulling the scripts from the main branch should fix the issue.

LGTM, other than a couple minor nits.

.github/workflows/pr-code-format.yml

boomanaiden154 · 2024-02-04T07:42:37Z

.github/workflows/issue-write.yml

+      github.event.workflow_run.event == 'pull_request'
+    steps:
+      - name: 'Download artifact'
+        uses: actions/download-artifact@6b208ae046db98c579e8a3aa621ab581ff575935 # v4.1.1


Why is this version different than the upload artifact action in the pull_request job? 4.3.0 vs 4.1.1.

Those two actions are versioned differently.

jyknight · 2024-02-06T21:12:45Z

OK, I looked into this in a bit more detail:
Firstly -- a pull_request run is always run in the context of the "merge" branch, not the PR head. The "merge" branch is auto-updated (if there are no file conflicts) to the latest revision of the target branch. So, a PR will generally automatically see a new .github/workflow file after it's committed -- and by default, we'd get all the rest of the new files too!

However, the issue is that we're explicitly instructing it to checkout the SHA of the "head" commit, not the "merge" commit.

In fact, I think this whole process gets a lot easier if we make use the auto-generated merge commit, because then we:

Checkout the merge commit (which is the default behavior of "checkout" if we weren't passing with: ref: ${{ github.event.pull_request.head.sha }}) with fetch_depth=2. We don't need to deepen, because the 2 parents of the merge-commit are the exact two things we need to compare!
For the diff between those two commit hashes: get the list of changed files, run the formatter on the diff, etc...we don't care about any intermediate commits -- just those two endpoints is all we need.

Another advantage: if we fix an issue in the formatting jobs, most PRs will get the new version immediately, without the author needing to rebase or merge the main branch themselves.

That said, I think it's fine to defer these fixes to a separate PR: it becomes a lot easier to test this when we're no longer using pull_request_target!

jyknight · 2024-02-06T21:20:02Z

.github/workflows/issue-write.yml

+            console.log(runInfo);
+
+
+            // Query to find the number of the pull request that triggered this job.


Why do we need all this? How can it end up with multiple associated pull requests? Or have a different baseRepository?

The associated pull requests are based off of the branch name, so if you create a pull request for a branch, close it, and then create another pull request with the same branch, then this query will return two associated pull requests.

Ugh, ok. Can you add comments to the code explaining that?

jyknight · 2024-02-06T21:20:18Z

.github/workflows/issue-write.yml

+            await comments.forEach(function (comment) {
+              if (comment.id) {
+                // Security check: Ensure that this comment was created by
+                // the github-actions bot, so a malisious input won't overwrite


Typo: malicious

boomanaiden154 · 2024-02-06T23:05:24Z

Firstly -- a pull_request run is always run in the context of the "merge" branch, not the PR head. The "merge" branch is auto-updated (if there are no file conflicts) to the latest revision of the target branch. So, a PR will generally automatically see a new .github/workflow file after it's committed -- and by default, we'd get all the rest of the new files too!

Right, and it doesn't run if there are any merge conflicts (one reason why the current design is the way it is). My point was mainly about where the workflow definition is pulled from that gets run. For some reason I thought it might have been the fork branch, but that isn't the case (based on the documentation).

I agree that looking at the merge commit would make things a lot simpler. I think that's good to split out into a separate patch though so we can make sure that the current state in-tree is reliable after the split. That is another thing on my todo list.

tstellar · 2024-02-16T14:47:38Z

@jyknight @boomanaiden154 So it's OK to commit this as-is and then we switch to using the merge commit in a follow up patch?

boomanaiden154 · 2024-02-16T22:24:49Z

.github/workflows/pr-code-format.yml

-      # PR for security reasons as we're using pull_request_target. Checkout
-      # the target branch with the necessary files.
+      # We need to pull the script from the main branch, so that we ensure
+      # we get a version of the script that supports the --write-comment-to-file


Nit: I think we should probably reword this to say something about making sure the script is updated rather than specifically mentioning the --write-comment-to-file option, which I think would quickly become a dated comment.

boomanaiden154 · 2024-02-16T22:28:29Z

@jyknight @boomanaiden154 So it's OK to commit this as-is and then we switch to using the merge commit in a follow up patch?

I think it should be good. I would think any work being done to move this to checkout the default pull_request ref would either need to be done in this patch, or would be preferred to be done after this patch, so it should just land to make forward progress there.

This just keeps the current state of things and doesn't introduce any regressions while being more secure, other than the (somewhat) edge case where there are merge conflicts, so seems fine to land to me.

jyknight

Yes, I think it's fine to commit this as-is and do further cleanup in follow-on PRs.

jyknight · 2024-02-26T01:57:12Z

.github/workflows/issue-write.yml

+            console.log(runInfo);
+
+
+            // Query to find the number of the pull request that triggered this job.


Ugh, ok. Can you add comments to the code explaining that?

…re secure (llvm#78215) (llvm#80495) Actions triggered by pull_request_target events have access to all repository secrets, so it is unsafe to use them when executing untrusted code. The pr-code-format workflow does not execute any untrusted code, but it passes untrused input into clang-format. An attacker could use this to exploit a flaw in clang-format and potentially gain access to the repository secrets. By splitting the workflow, we can use the pull_request target which is more secure and isolate the issue write permissions in a separate job. The pull_request target also makes it easier to test changes to the code-format-helepr.py script, because the version of the script from the pull request will be used rather than the version of the script from main. Fixes llvm#77142

bader · 2024-04-02T00:13:49Z

.github/workflows/pr-code-format.yml

      - name: Fetch code formatting utils
        uses: actions/checkout@v4
        with:
+          reository: ${{ github.repository }}


reository -> repository.

f6c87be

Thanks for pointing this out!

…nnot (llvm#81142)" This reverts commit 124cd11. The job originally failed because any workflow run on a PR runs in a context that cannot write to the PR itself (otherwise people could damage the repo using a workflow in a PR). llvm#80495 recently added a job that is purely for commenting on issues, to solve a similair problem. This limits the damage the PR can do to adding a spam comment.

bader · 2024-05-30T18:28:11Z

.github/workflows/issue-write.yml

+      - name: 'Download artifact'
+        uses: actions/download-artifact@6b208ae046db98c579e8a3aa621ab581ff575935 # v4.1.1
+        with:
+          github-token: ${{ secrets.ISSUE_WRITE_DOWNLOAD_ARTIFACT }}


@tstellar, we use clang-format check in downstream repository and this split "breaks" some functionality. According to my understanding, we should have ISSUE_WRITE_DOWNLOAD_ARTIFACT secret available to GitHub Actions in our repository in order to have comment from clang-format action. Could you clarify what permissions should be granted to ISSUE_WRITE_DOWNLOAD_ARTIFACT secret, please?
BTW, why can't we use GITHUB_TOKEN secret?

The token needs actions:read permissions. You can't use 'GITHUB_TOKEN`, because it's only scoped for the current workflow run. See https://github.com/actions/download-artifact?tab=readme-ov-file#download-artifacts-from-other-workflow-runs-or-repositories

I do have a workaround that will allow us to download the artifact without a token, but that is part of a larger PR here. If you think it would be useful to avoid using the token, I can try to pull out this change into its own PR.

Thanks for clarifying!

I do have a workaround that will allow us to download the artifact without a token, but that is part of a larger PR here. If you think it would be useful to avoid using the token, I can try to pull out this change into its own PR.

Do you plan to use "Unprivileged Download Artifact" in issue-write? If no, we will use the token.

The only problem I have with introducing a new token is that it's kind of unexpected and requires some efforts to investigate why things are broken. If we can make it work out-of-box and keep the upstream project secure, it would be great.

Do you plan to use "Unprivileged Download Artifact" in issue-write? If no, we will use the token.

Yeah, I was planning to to do this eventually. I'll start looking into this.

Thanks a lot for your help with this. I really appreciate the efforts to make GitHub Actions CI useful for downstream forks. 👍

llvmbot added the github:workflow label Feb 2, 2024

tstellar force-pushed the format-fix branch from f89641d to 3925d28 Compare February 3, 2024 07:52

tstellar changed the title ~~[workflows] Always pull code-format-helper.py from the main branch~~ Reapply [workflows] Split pr-code-format into two parts to make it more secure (#78215) Feb 3, 2024

tstellar requested review from boomanaiden154 and jyknight February 3, 2024 07:56

tstellar mentioned this pull request Feb 3, 2024

[workflows] Split pr-code-format into two parts to make it more secure #78216

Merged

tstellar requested a review from tru February 3, 2024 17:59

boomanaiden154 approved these changes Feb 4, 2024

View reviewed changes

jyknight reviewed Feb 6, 2024

View reviewed changes

Fix typos

0c678db

Merge branch 'main' into format-fix

da5a97d

boomanaiden154 reviewed Feb 16, 2024

View reviewed changes

jyknight approved these changes Feb 26, 2024

View reviewed changes

tstellar added 2 commits March 20, 2024 14:18

Update issue-write.yml

1d24e7e

Update pr-code-format.yml

e9f5935

tstellar merged commit 2120f57 into llvm:main Mar 22, 2024
4 of 5 checks passed

bader reviewed Apr 2, 2024

View reviewed changes

bader reviewed May 30, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reapply [workflows] Split pr-code-format into two parts to make it more secure (#78215) #80495

Reapply [workflows] Split pr-code-format into two parts to make it more secure (#78215) #80495

tstellar commented Feb 2, 2024 •

edited

llvmbot commented Feb 2, 2024

boomanaiden154 commented Feb 2, 2024

tstellar commented Feb 3, 2024

boomanaiden154 left a comment

boomanaiden154 Feb 4, 2024

tstellar Feb 16, 2024

jyknight commented Feb 6, 2024 •

edited

jyknight Feb 6, 2024

tstellar Feb 6, 2024

jyknight Feb 26, 2024

jyknight Feb 6, 2024

boomanaiden154 commented Feb 6, 2024

tstellar commented Feb 16, 2024

boomanaiden154 Feb 16, 2024

boomanaiden154 commented Feb 16, 2024

jyknight left a comment

jyknight Feb 26, 2024

bader Apr 2, 2024

boomanaiden154 Apr 2, 2024

bader May 30, 2024

tstellar May 30, 2024

bader May 30, 2024

tstellar May 30, 2024

bader May 30, 2024

		console.log(runInfo);


		// Query to find the number of the pull request that triggered this job.

Reapply [workflows] Split pr-code-format into two parts to make it more secure (#78215) #80495

Reapply [workflows] Split pr-code-format into two parts to make it more secure (#78215) #80495

Conversation

tstellar commented Feb 2, 2024 • edited

llvmbot commented Feb 2, 2024

boomanaiden154 commented Feb 2, 2024

tstellar commented Feb 3, 2024

boomanaiden154 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jyknight commented Feb 6, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

boomanaiden154 commented Feb 6, 2024

tstellar commented Feb 16, 2024

Choose a reason for hiding this comment

boomanaiden154 commented Feb 16, 2024

jyknight left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tstellar commented Feb 2, 2024 •

edited

jyknight commented Feb 6, 2024 •

edited