[workflows] Split pr-code-format into two parts to make it more secure #78216

tstellar · 2024-01-16T00:20:15Z

Actions triggered by pull_request_target events have access to all repository secrets, so it is unsafe to use them when executing untrusted code. The pr-code-format workflow does not execute any untrusted code, but it passes untrused input into clang-format. An attacker could use this to exploit a flaw in clang-format and potentially gain access to the repository secrets.

By splitting the workflow, we can use the pull_request target which is more secure and isolate the issue write permissions in a separate job. The pull_request target also makes it easier to test changes to the code-format-helepr.py script, because the version of the script from the pull request will be used rather than the version of the script from main.

Fixes #77142

llvmbot · 2024-01-16T00:20:45Z

@llvm/pr-subscribers-github-workflow

Author: Tom Stellard (tstellar)

Changes

Actions triggered by pull_request_target events have access to all repository secrets, so it is unsafe to use them when executing untrusted code. The pr-code-format workflow does not execute any untrusted code, but it passes untrused input into clang-format. An attacker could use this to exploit a flaw in clang-format and potentially gain access to the repository secrets.

By splitting the workflow, we can use the pull_request target which is more secure and isolate the issue write permissions in a separate job. The pull_request target also makes it easier to test changes to the code-format-helepr.py script, because the version of the script from the pull request will be used rather than the version of the script from main.

Fixes #77142

Full diff: https://github.com/llvm/llvm-project/pull/78216.diff

3 Files Affected:

(added) .github/workflows/issue-write.yml (+72)
(modified) .github/workflows/pr-code-format.yml (+12-18)
(modified) llvm/utils/git/code-format-helper.py (+27)

diff --git a/.github/workflows/issue-write.yml b/.github/workflows/issue-write.yml
new file mode 100644
index 00000000000000..acc625a3e02a3d
--- /dev/null
+++ b/.github/workflows/issue-write.yml
@@ -0,0 +1,72 @@
+name: Comment on an issue
+
+on:
+  workflow_run:
+    workflows: ["Check code formatting"]
+    types:
+      - completed
+
+permissions:
+  contents: read
+
+jobs:
+  pr-comment:
+    runs-on: ubuntu-latest
+    permissions:
+      pull-requests: write
+    if: >
+      github.event.workflow_run.event == 'pull_request'
+    steps:
+      - name: 'Download artifact'
+        # v7.0.1
+        uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea
+        with:
+          script: |
+            let allArtifacts = await github.rest.actions.listWorkflowRunArtifacts({
+               owner: context.repo.owner,
+               repo: context.repo.repo,
+               run_id: context.payload.workflow_run.id,
+            });
+            let matchArtifact = allArtifacts.data.artifacts.filter((artifact) => {
+              return artifact.name == "workflow-args"
+            })[0];
+            let download = await github.rest.actions.downloadArtifact({
+               owner: context.repo.owner,
+               repo: context.repo.repo,
+               artifact_id: matchArtifact.id,
+               archive_format: 'zip',
+            });
+            let fs = require('fs');
+            fs.writeFileSync(`${process.env.GITHUB_WORKSPACE}/workflow-args.zip`, Buffer.from(download.data));
+
+      - run: unzip workflow-args.zip
+
+      - name: 'Comment on PR'
+        uses: actions/github-script@v3
+        with:
+          github-token: ${{ secrets.GITHUB_TOKEN }}
+          script: |
+            var fs = require('fs');
+            const comments = JSON.parse(fs.readFileSync('./comments'));
+            if (!comments) {
+              return;
+            }
+            console.log(comments);
+            await comments.forEach(function (comment) {
+              if (comment.id) {
+                github.issues.updateComment({
+                  owner: context.repo.owner,
+                  repo: context.repo.repo,
+                  issue_number: comment.number,
+                  comment_id: comment.id,
+                  body: comment.body
+                });
+              } else {
+                github.issues.createComment({
+                  owner: context.repo.owner,
+                  repo: context.repo.repo,
+                  issue_number: comment.number,
+                  body: comment.body
+                });
+              }
+            });
diff --git a/.github/workflows/pr-code-format.yml b/.github/workflows/pr-code-format.yml
index 5223089ee8a93d..1475d872498d49 100644
--- a/.github/workflows/pr-code-format.yml
+++ b/.github/workflows/pr-code-format.yml
@@ -1,7 +1,5 @@
 name: "Check code formatting"
-on: pull_request_target
-permissions:
-  pull-requests: write
+on: pull_request
 
 jobs:
   code_formatter:
@@ -27,18 +25,6 @@ jobs:
           separator: ","
           skip_initial_fetch: true
 
-      # We need to make sure that we aren't executing/using any code from the
-      # PR for security reasons as we're using pull_request_target. Checkout
-      # the target branch with the necessary files.
-      - name: Fetch code formatting utils
-        uses: actions/checkout@v4
-        with:
-          sparse-checkout: |
-            llvm/utils/git/requirements_formatting.txt
-            llvm/utils/git/code-format-helper.py
-          sparse-checkout-cone-mode: false
-          path: code-format-tools
-
       - name: "Listed files"
         env:
           CHANGED_FILES: ${{ steps.changed-files.outputs.all_changed_files }}
@@ -56,10 +42,10 @@ jobs:
         with:
           python-version: '3.11'
           cache: 'pip'
-          cache-dependency-path: 'code-format-tools/llvm/utils/git/requirements_formatting.txt'
+          cache-dependency-path: 'llvm/utils/git/requirements_formatting.txt'
 
       - name: Install python dependencies
-        run: pip install -r code-format-tools/llvm/utils/git/requirements_formatting.txt
+        run: pip install -r llvm/utils/git/requirements_formatting.txt
 
       - name: Run code formatter
         env:
@@ -72,9 +58,17 @@ jobs:
         # explicitly in code-format-helper.py and not have to diff starting at
         # the merge base.
         run: |
-          python ./code-format-tools/llvm/utils/git/code-format-helper.py \
+          python ./llvm/utils/git/code-format-helper.py \
+            --write-comment-to-file \
             --token ${{ secrets.GITHUB_TOKEN }} \
             --issue-number $GITHUB_PR_NUMBER \
             --start-rev $(git merge-base $START_REV $END_REV) \
             --end-rev $END_REV \
             --changed-files "$CHANGED_FILES"
+
+      - uses: actions/upload-artifact@v2
+        if: always()
+        with:
+          name: workflow-args
+          path: |
+            comments
diff --git a/llvm/utils/git/code-format-helper.py b/llvm/utils/git/code-format-helper.py
index 8a29a57d8d16bd..f96c9da586dfcc 100755
--- a/llvm/utils/git/code-format-helper.py
+++ b/llvm/utils/git/code-format-helper.py
@@ -44,6 +44,7 @@ class FormatArgs:
     token: str = None
     verbose: bool = True
     issue_number: int = 0
+    write_comment_to_file: bool = False
 
     def __init__(self, args: argparse.Namespace = None) -> None:
         if not args is None:
@@ -53,12 +54,14 @@ def __init__(self, args: argparse.Namespace = None) -> None:
             self.token = args.token
             self.changed_files = args.changed_files
             self.issue_number = args.issue_number
+            self.write_comment_to_file = args.write_comment_to_file
 
 
 class FormatHelper:
     COMMENT_TAG = "<!--LLVM CODE FORMAT COMMENT: {fmt}-->"
     name: str
     friendly_name: str
+    comment: dict = None
 
     @property
     def comment_tag(self) -> str:
@@ -119,6 +122,16 @@ def update_pr(self, comment_text: str, args: FormatArgs, create_new: bool) -> No
         comment_text = self.comment_tag + "\n\n" + comment_text
 
         existing_comment = self.find_comment(pr)
+
+        if args.write_comment_to_file:
+            self.comment = {
+                'number' : pr.number,
+                'body' : comment_text
+            }
+            if existing_comment:
+                self.comment['id'] =  existing_comment.id
+            return
+
         if existing_comment:
             existing_comment.edit(comment_text)
         elif create_new:
@@ -309,6 +322,8 @@ def hook_main():
         if fmt.has_tool():
             if not fmt.run(args.changed_files, args):
                 failed_fmts.append(fmt.name)
+            if fmt.comment:
+              comments.append(fmt.comment)
         else:
             print(f"Couldn't find {fmt.name}, can't check " + fmt.friendly_name.lower())
 
@@ -349,6 +364,10 @@ def hook_main():
         type=str,
         help="Comma separated list of files that has been changed",
     )
+    parser.add_argument(
+        "--write-comment-to-file",
+        action='store_true',
+        help="Don't create a comments on the PR, instead write the comments and metadata a file called 'comment'"   )
 
     args = FormatArgs(parser.parse_args())
 
@@ -357,9 +376,17 @@ def hook_main():
         changed_files = args.changed_files.split(",")
 
     failed_formatters = []
+    comments = []
     for fmt in ALL_FORMATTERS:
         if not fmt.run(changed_files, args):
             failed_formatters.append(fmt.name)
+        if fmt.comment:
+            comments.append(fmt.comment)
+    
+    if len(comments):
+        with open('comments', 'w') as f:
+            import json
+            json.dump(comments, f)
 
     if len(failed_formatters) > 0:
         print(f"error: some formatters failed: {' '.join(failed_formatters)}")

github-actions · 2024-01-16T00:22:43Z

✅ With the latest revision this PR passed the Python code formatter.

llvm/utils/git/code-format-helper.py

boomanaiden154

Some comments. Seems to work at least somewhat given the comment on this PR.

Also, is there a way to use the pull_request_target event but without any permissions? That would allow code formatting to still run even if there is a merge conflict and would help alleviate security concerns about untrusted JSON input into the second job.

boomanaiden154 · 2024-01-16T06:36:52Z

.github/workflows/issue-write.yml

+                  repo: context.repo.repo,
+                  issue_number: comment.number,
+                  comment_id: comment.id,
+                  body: comment.body


I'm not sure input validation is that necessary here, but this change means that the JSON artifact sent over should be completely untrusted as anyone can theoretically modify the workflow and make it send whatever. I don't think this should happen much in practice (depending upon the settings for whether or not new contributors need approval for workflow runs), but it theoretically gives the ability for anyone to post a comment on any PR/Issue under the Github moniker.

I was able to figure out how to get the PR number from the event payload, so that will prevent someone from posting a comment on any issue in the project. So now the only untrusted inputs are the comment id and the comment body.

Maybe we should ensure that the provided commit_id was previously created by this script, rather than a random other user's comment?

.github/workflows/pr-code-format.yml

llvm/utils/git/code-format-helper.py

tru

Thanks for working on this. I like the idea of having a reusable workflow that can write comments to issues and PRs so that future workflows can just "hook-in" to that. Will make it a lot easier when we want to add more in the future.

There are some smaller issues that needs to be addressed, and I think it should be tested somewhere first. Also make sure to correctly format the python code.

.github/workflows/pr-code-format.yml

tru · 2024-01-16T07:19:45Z

.github/workflows/issue-write.yml

+          script: |
+            var fs = require('fs');
+            const comments = JSON.parse(fs.readFileSync('./comments'));
+            if (!comments) {


should we handle errors from reading the json here and print the problems? so that we can debug that later if something goes wrong?

I added a step to always dump the comment file (pass or fail) at the end of the job.

llvm/utils/git/code-format-helper.py

github-actions · 2024-01-20T06:30:32Z

✅ With the latest revision this PR passed the C/C++ code formatter.

Actions triggered by pull_request_target events have access to all repository secrets, so it is unsafe to use them when executing untrusted code. The pr-code-format workflow does not execute any untrusted code, but it passes untrused input into clang-format. An attacker could use this to exploit a flaw in clang-format and potentially gain access to the repository secrets. By splitting the workflow, we can use the pull_request target which is more secure and isolate the issue write permissions in a separate job. The pull_request target also makes it easier to test changes to the code-format-helepr.py script, because the version of the script from the pull request will be used rather than the version of the script from main. Fixes llvm#77142

tstellar · 2024-01-20T06:41:16Z

Some comments. Seems to work at least somewhat given the comment on this PR.

The comments on this PR are coming from the old version of the code format job that is currently in the main branch.

boomanaiden154

This LGTM. Since we're checking out the same ref as before github.event.pull_request.head.sha, there shouldn't be any new issues. Smoke tests on that would be good though (as well as in general to test that the whole setup works as expected).

There is somewhat of a security hole as mentioned before where someone could modify the workflow to produce whatever comment text/id that they want, but this should be caught before someone hits approve on the workflow and if someone does get by that, given the current mitigations, they can only post a comment on the PR which I don't think is that big of a deal.

It would be good if we could get this landed. I think switching to pull_request and running code formatting over the merge commit would fix #79661, but that fix requires landing this first.

nickdesaulniers · 2024-02-02T19:37:20Z

Hi @tstellar this is causing unrelated presubmit style check failures.

Example: https://github.com/llvm/llvm-project/actions/runs/7760656533/job/21167382114?pr=80344

Run python ./llvm/utils/git/code-format-helper.py \
usage: code-format-helper.py [-h] --token TOKEN [--repo REPO] --issue-number
                             ISSUE_NUMBER --start-rev START_REV --end-rev
                             END_REV [--changed-files CHANGED_FILES]
code-format-helper.py: error: unrecognized arguments: --write-comment-to-file
Error: Process completed with exit code 2.

tstellar · 2024-02-02T19:53:29Z

@nickdesaulniers Yes, I noticed that. I have a fix here: #80483

You could also fix it in the PR by merging the latest main branch into your PR branch.

…re secure (#78216)" This reverts commit bc06cd5. This caused the job to fail for PRs which still had an older version of code-format-helper.py in their tree.

tstellar · 2024-02-02T20:05:44Z

I've reverted this now while I investigate.

jyknight · 2024-02-02T21:08:01Z

.github/workflows/issue-write.yml

+    if: >
+      github.event.workflow_run.event == 'pull_request'
+    steps:
+      - name: 'Download artifact'


I think i you use the newer upload-artifacts and download-artifacts packages, then you don't need to deal with all the listing/downloading/unzip code -- it'll do it for you just based on a name and run_id.

tstellar · 2024-02-03T17:59:41Z

I have an updated version of this patch in #80495.

llvm#78216) Actions triggered by pull_request_target events have access to all repository secrets, so it is unsafe to use them when executing untrusted code. The pr-code-format workflow does not execute any untrusted code, but it passes untrused input into clang-format. An attacker could use this to exploit a flaw in clang-format and potentially gain access to the repository secrets. By splitting the workflow, we can use the pull_request target which is more secure and isolate the issue write permissions in a separate job. The pull_request target also makes it easier to test changes to the code-format-helepr.py script, because the version of the script from the pull request will be used rather than the version of the script from main. Fixes llvm#77142

…re secure (llvm#78216)" This reverts commit bc06cd5. This caused the job to fail for PRs which still had an older version of code-format-helper.py in their tree.

tstellar requested review from tru and boomanaiden154 January 16, 2024 00:20

llvmbot added the github:workflow label Jan 16, 2024

boomanaiden154 reviewed Jan 16, 2024

View reviewed changes

llvm/utils/git/code-format-helper.py Show resolved Hide resolved

boomanaiden154 reviewed Jan 16, 2024

View reviewed changes

tru reviewed Jan 16, 2024

View reviewed changes

tstellar added 2 commits January 19, 2024 22:36

Dump comments file to make debugging easier

cd37e90

tstellar force-pushed the code-format-split branch from 2ccc828 to cd37e90 Compare January 20, 2024 06:37

tstellar added 4 commits January 19, 2024 22:43

Fix python formatting

8ab4b7d

Fetch PR number through API

1b98494

Fix python formatting

c375d8f

Fix python format

6bd48ed

boomanaiden154 approved these changes Feb 1, 2024

View reviewed changes

tstellar merged commit bc06cd5 into llvm:main Feb 2, 2024
4 of 5 checks passed

jyknight reviewed Feb 2, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[workflows] Split pr-code-format into two parts to make it more secure #78216

[workflows] Split pr-code-format into two parts to make it more secure #78216

tstellar commented Jan 16, 2024

llvmbot commented Jan 16, 2024

github-actions bot commented Jan 16, 2024 •

edited

boomanaiden154 left a comment

boomanaiden154 Jan 16, 2024

tstellar Jan 20, 2024

jyknight Feb 2, 2024

tru left a comment

tru Jan 16, 2024

tstellar Jan 20, 2024

github-actions bot commented Jan 20, 2024 •

edited

tstellar commented Jan 20, 2024

boomanaiden154 left a comment

nickdesaulniers commented Feb 2, 2024 •

edited

tstellar commented Feb 2, 2024

tstellar commented Feb 2, 2024

jyknight Feb 2, 2024

tstellar commented Feb 3, 2024

[workflows] Split pr-code-format into two parts to make it more secure #78216

[workflows] Split pr-code-format into two parts to make it more secure #78216

Conversation

tstellar commented Jan 16, 2024

llvmbot commented Jan 16, 2024

github-actions bot commented Jan 16, 2024 • edited

boomanaiden154 left a comment

Choose a reason for hiding this comment

boomanaiden154 Jan 16, 2024

Choose a reason for hiding this comment

tstellar Jan 20, 2024

Choose a reason for hiding this comment

jyknight Feb 2, 2024

Choose a reason for hiding this comment

tru left a comment

Choose a reason for hiding this comment

tru Jan 16, 2024

Choose a reason for hiding this comment

tstellar Jan 20, 2024

Choose a reason for hiding this comment

github-actions bot commented Jan 20, 2024 • edited

tstellar commented Jan 20, 2024

boomanaiden154 left a comment

Choose a reason for hiding this comment

nickdesaulniers commented Feb 2, 2024 • edited

tstellar commented Feb 2, 2024

tstellar commented Feb 2, 2024

jyknight Feb 2, 2024

Choose a reason for hiding this comment

tstellar commented Feb 3, 2024

github-actions bot commented Jan 16, 2024 •

edited

github-actions bot commented Jan 20, 2024 •

edited

nickdesaulniers commented Feb 2, 2024 •

edited