Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[workflows] Split pr-code-format into two parts to make it more secure #78216

Merged
merged 6 commits into from
Feb 2, 2024

Conversation

tstellar
Copy link
Collaborator

Actions triggered by pull_request_target events have access to all repository secrets, so it is unsafe to use them when executing untrusted code. The pr-code-format workflow does not execute any untrusted code, but it passes untrused input into clang-format. An attacker could use this to exploit a flaw in clang-format and potentially gain access to the repository secrets.

By splitting the workflow, we can use the pull_request target which is more secure and isolate the issue write permissions in a separate job. The pull_request target also makes it easier to test changes to the code-format-helepr.py script, because the version of the script from the pull request will be used rather than the version of the script from main.

Fixes #77142

@llvmbot
Copy link
Collaborator

llvmbot commented Jan 16, 2024

@llvm/pr-subscribers-github-workflow

Author: Tom Stellard (tstellar)

Changes

Actions triggered by pull_request_target events have access to all repository secrets, so it is unsafe to use them when executing untrusted code. The pr-code-format workflow does not execute any untrusted code, but it passes untrused input into clang-format. An attacker could use this to exploit a flaw in clang-format and potentially gain access to the repository secrets.

By splitting the workflow, we can use the pull_request target which is more secure and isolate the issue write permissions in a separate job. The pull_request target also makes it easier to test changes to the code-format-helepr.py script, because the version of the script from the pull request will be used rather than the version of the script from main.

Fixes #77142


Full diff: https://github.com/llvm/llvm-project/pull/78216.diff

3 Files Affected:

  • (added) .github/workflows/issue-write.yml (+72)
  • (modified) .github/workflows/pr-code-format.yml (+12-18)
  • (modified) llvm/utils/git/code-format-helper.py (+27)
diff --git a/.github/workflows/issue-write.yml b/.github/workflows/issue-write.yml
new file mode 100644
index 00000000000000..acc625a3e02a3d
--- /dev/null
+++ b/.github/workflows/issue-write.yml
@@ -0,0 +1,72 @@
+name: Comment on an issue
+
+on:
+  workflow_run:
+    workflows: ["Check code formatting"]
+    types:
+      - completed
+
+permissions:
+  contents: read
+
+jobs:
+  pr-comment:
+    runs-on: ubuntu-latest
+    permissions:
+      pull-requests: write
+    if: >
+      github.event.workflow_run.event == 'pull_request'
+    steps:
+      - name: 'Download artifact'
+        # v7.0.1
+        uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea
+        with:
+          script: |
+            let allArtifacts = await github.rest.actions.listWorkflowRunArtifacts({
+               owner: context.repo.owner,
+               repo: context.repo.repo,
+               run_id: context.payload.workflow_run.id,
+            });
+            let matchArtifact = allArtifacts.data.artifacts.filter((artifact) => {
+              return artifact.name == "workflow-args"
+            })[0];
+            let download = await github.rest.actions.downloadArtifact({
+               owner: context.repo.owner,
+               repo: context.repo.repo,
+               artifact_id: matchArtifact.id,
+               archive_format: 'zip',
+            });
+            let fs = require('fs');
+            fs.writeFileSync(`${process.env.GITHUB_WORKSPACE}/workflow-args.zip`, Buffer.from(download.data));
+
+      - run: unzip workflow-args.zip
+
+      - name: 'Comment on PR'
+        uses: actions/github-script@v3
+        with:
+          github-token: ${{ secrets.GITHUB_TOKEN }}
+          script: |
+            var fs = require('fs');
+            const comments = JSON.parse(fs.readFileSync('./comments'));
+            if (!comments) {
+              return;
+            }
+            console.log(comments);
+            await comments.forEach(function (comment) {
+              if (comment.id) {
+                github.issues.updateComment({
+                  owner: context.repo.owner,
+                  repo: context.repo.repo,
+                  issue_number: comment.number,
+                  comment_id: comment.id,
+                  body: comment.body
+                });
+              } else {
+                github.issues.createComment({
+                  owner: context.repo.owner,
+                  repo: context.repo.repo,
+                  issue_number: comment.number,
+                  body: comment.body
+                });
+              }
+            });
diff --git a/.github/workflows/pr-code-format.yml b/.github/workflows/pr-code-format.yml
index 5223089ee8a93d..1475d872498d49 100644
--- a/.github/workflows/pr-code-format.yml
+++ b/.github/workflows/pr-code-format.yml
@@ -1,7 +1,5 @@
 name: "Check code formatting"
-on: pull_request_target
-permissions:
-  pull-requests: write
+on: pull_request
 
 jobs:
   code_formatter:
@@ -27,18 +25,6 @@ jobs:
           separator: ","
           skip_initial_fetch: true
 
-      # We need to make sure that we aren't executing/using any code from the
-      # PR for security reasons as we're using pull_request_target. Checkout
-      # the target branch with the necessary files.
-      - name: Fetch code formatting utils
-        uses: actions/checkout@v4
-        with:
-          sparse-checkout: |
-            llvm/utils/git/requirements_formatting.txt
-            llvm/utils/git/code-format-helper.py
-          sparse-checkout-cone-mode: false
-          path: code-format-tools
-
       - name: "Listed files"
         env:
           CHANGED_FILES: ${{ steps.changed-files.outputs.all_changed_files }}
@@ -56,10 +42,10 @@ jobs:
         with:
           python-version: '3.11'
           cache: 'pip'
-          cache-dependency-path: 'code-format-tools/llvm/utils/git/requirements_formatting.txt'
+          cache-dependency-path: 'llvm/utils/git/requirements_formatting.txt'
 
       - name: Install python dependencies
-        run: pip install -r code-format-tools/llvm/utils/git/requirements_formatting.txt
+        run: pip install -r llvm/utils/git/requirements_formatting.txt
 
       - name: Run code formatter
         env:
@@ -72,9 +58,17 @@ jobs:
         # explicitly in code-format-helper.py and not have to diff starting at
         # the merge base.
         run: |
-          python ./code-format-tools/llvm/utils/git/code-format-helper.py \
+          python ./llvm/utils/git/code-format-helper.py \
+            --write-comment-to-file \
             --token ${{ secrets.GITHUB_TOKEN }} \
             --issue-number $GITHUB_PR_NUMBER \
             --start-rev $(git merge-base $START_REV $END_REV) \
             --end-rev $END_REV \
             --changed-files "$CHANGED_FILES"
+
+      - uses: actions/upload-artifact@v2
+        if: always()
+        with:
+          name: workflow-args
+          path: |
+            comments
diff --git a/llvm/utils/git/code-format-helper.py b/llvm/utils/git/code-format-helper.py
index 8a29a57d8d16bd..f96c9da586dfcc 100755
--- a/llvm/utils/git/code-format-helper.py
+++ b/llvm/utils/git/code-format-helper.py
@@ -44,6 +44,7 @@ class FormatArgs:
     token: str = None
     verbose: bool = True
     issue_number: int = 0
+    write_comment_to_file: bool = False
 
     def __init__(self, args: argparse.Namespace = None) -> None:
         if not args is None:
@@ -53,12 +54,14 @@ def __init__(self, args: argparse.Namespace = None) -> None:
             self.token = args.token
             self.changed_files = args.changed_files
             self.issue_number = args.issue_number
+            self.write_comment_to_file = args.write_comment_to_file
 
 
 class FormatHelper:
     COMMENT_TAG = "<!--LLVM CODE FORMAT COMMENT: {fmt}-->"
     name: str
     friendly_name: str
+    comment: dict = None
 
     @property
     def comment_tag(self) -> str:
@@ -119,6 +122,16 @@ def update_pr(self, comment_text: str, args: FormatArgs, create_new: bool) -> No
         comment_text = self.comment_tag + "\n\n" + comment_text
 
         existing_comment = self.find_comment(pr)
+
+        if args.write_comment_to_file:
+            self.comment = {
+                'number' : pr.number,
+                'body' : comment_text
+            }
+            if existing_comment:
+                self.comment['id'] =  existing_comment.id
+            return
+
         if existing_comment:
             existing_comment.edit(comment_text)
         elif create_new:
@@ -309,6 +322,8 @@ def hook_main():
         if fmt.has_tool():
             if not fmt.run(args.changed_files, args):
                 failed_fmts.append(fmt.name)
+            if fmt.comment:
+              comments.append(fmt.comment)
         else:
             print(f"Couldn't find {fmt.name}, can't check " + fmt.friendly_name.lower())
 
@@ -349,6 +364,10 @@ def hook_main():
         type=str,
         help="Comma separated list of files that has been changed",
     )
+    parser.add_argument(
+        "--write-comment-to-file",
+        action='store_true',
+        help="Don't create a comments on the PR, instead write the comments and metadata a file called 'comment'"   )
 
     args = FormatArgs(parser.parse_args())
 
@@ -357,9 +376,17 @@ def hook_main():
         changed_files = args.changed_files.split(",")
 
     failed_formatters = []
+    comments = []
     for fmt in ALL_FORMATTERS:
         if not fmt.run(changed_files, args):
             failed_formatters.append(fmt.name)
+        if fmt.comment:
+            comments.append(fmt.comment)
+    
+    if len(comments):
+        with open('comments', 'w') as f:
+            import json
+            json.dump(comments, f)
 
     if len(failed_formatters) > 0:
         print(f"error: some formatters failed: {' '.join(failed_formatters)}")

Copy link

github-actions bot commented Jan 16, 2024

✅ With the latest revision this PR passed the Python code formatter.

Copy link
Contributor

@boomanaiden154 boomanaiden154 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments. Seems to work at least somewhat given the comment on this PR.

Also, is there a way to use the pull_request_target event but without any permissions? That would allow code formatting to still run even if there is a merge conflict and would help alleviate security concerns about untrusted JSON input into the second job.

repo: context.repo.repo,
issue_number: comment.number,
comment_id: comment.id,
body: comment.body
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure input validation is that necessary here, but this change means that the JSON artifact sent over should be completely untrusted as anyone can theoretically modify the workflow and make it send whatever. I don't think this should happen much in practice (depending upon the settings for whether or not new contributors need approval for workflow runs), but it theoretically gives the ability for anyone to post a comment on any PR/Issue under the Github moniker.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to figure out how to get the PR number from the event payload, so that will prevent someone from posting a comment on any issue in the project. So now the only untrusted inputs are the comment id and the comment body.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should ensure that the provided commit_id was previously created by this script, rather than a random other user's comment?

.github/workflows/pr-code-format.yml Show resolved Hide resolved
llvm/utils/git/code-format-helper.py Show resolved Hide resolved
Copy link
Collaborator

@tru tru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. I like the idea of having a reusable workflow that can write comments to issues and PRs so that future workflows can just "hook-in" to that. Will make it a lot easier when we want to add more in the future.

There are some smaller issues that needs to be addressed, and I think it should be tested somewhere first. Also make sure to correctly format the python code.

.github/workflows/pr-code-format.yml Show resolved Hide resolved
script: |
var fs = require('fs');
const comments = JSON.parse(fs.readFileSync('./comments'));
if (!comments) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we handle errors from reading the json here and print the problems? so that we can debug that later if something goes wrong?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a step to always dump the comment file (pass or fail) at the end of the job.

llvm/utils/git/code-format-helper.py Show resolved Hide resolved
Copy link

github-actions bot commented Jan 20, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Actions triggered by pull_request_target events have access to all
repository secrets, so it is unsafe to use them when executing
untrusted code.  The pr-code-format workflow does not execute
any untrusted code, but it passes untrused input into clang-format.
An attacker could use this to exploit a flaw in clang-format and
potentially gain access to the repository secrets.

By splitting the workflow, we can use the pull_request target which
is more secure and isolate the issue write permissions in a separate
job.  The pull_request target also makes it easier to test changes
to the code-format-helepr.py script, because the version of the
script from the pull request will be used rather than the
version of the script from main.

Fixes llvm#77142
@tstellar
Copy link
Collaborator Author

Some comments. Seems to work at least somewhat given the comment on this PR.

The comments on this PR are coming from the old version of the code format job that is currently in the main branch.

Copy link
Contributor

@boomanaiden154 boomanaiden154 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM. Since we're checking out the same ref as before github.event.pull_request.head.sha, there shouldn't be any new issues. Smoke tests on that would be good though (as well as in general to test that the whole setup works as expected).

There is somewhat of a security hole as mentioned before where someone could modify the workflow to produce whatever comment text/id that they want, but this should be caught before someone hits approve on the workflow and if someone does get by that, given the current mitigations, they can only post a comment on the PR which I don't think is that big of a deal.

It would be good if we could get this landed. I think switching to pull_request and running code formatting over the merge commit would fix #79661, but that fix requires landing this first.

@tstellar tstellar merged commit bc06cd5 into llvm:main Feb 2, 2024
4 of 5 checks passed
@nickdesaulniers
Copy link
Member

nickdesaulniers commented Feb 2, 2024

Hi @tstellar this is causing unrelated presubmit style check failures.

Example: https://github.com/llvm/llvm-project/actions/runs/7760656533/job/21167382114?pr=80344

Run python ./llvm/utils/git/code-format-helper.py \
usage: code-format-helper.py [-h] --token TOKEN [--repo REPO] --issue-number
                             ISSUE_NUMBER --start-rev START_REV --end-rev
                             END_REV [--changed-files CHANGED_FILES]
code-format-helper.py: error: unrecognized arguments: --write-comment-to-file
Error: Process completed with exit code 2.

@tstellar
Copy link
Collaborator Author

tstellar commented Feb 2, 2024

@nickdesaulniers Yes, I noticed that. I have a fix here: #80483

You could also fix it in the PR by merging the latest main branch into your PR branch.

tstellar added a commit that referenced this pull request Feb 2, 2024
…re secure (#78216)"

This reverts commit bc06cd5.

This caused the job to fail for PRs which still had an older version
of code-format-helper.py in their tree.
@tstellar
Copy link
Collaborator Author

tstellar commented Feb 2, 2024

I've reverted this now while I investigate.

if: >
github.event.workflow_run.event == 'pull_request'
steps:
- name: 'Download artifact'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think i you use the newer upload-artifacts and download-artifacts packages, then you don't need to deal with all the listing/downloading/unzip code -- it'll do it for you just based on a name and run_id.

@tstellar
Copy link
Collaborator Author

tstellar commented Feb 3, 2024

I have an updated version of this patch in #80495.

agozillon pushed a commit to agozillon/llvm-project that referenced this pull request Feb 5, 2024
llvm#78216)

Actions triggered by pull_request_target events have access to all
repository secrets, so it is unsafe to use them when executing untrusted
code. The pr-code-format workflow does not execute any untrusted code,
but it passes untrused input into clang-format. An attacker could use
this to exploit a flaw in clang-format and potentially gain access to
the repository secrets.

By splitting the workflow, we can use the pull_request target which is
more secure and isolate the issue write permissions in a separate job.
The pull_request target also makes it easier to test changes to the
code-format-helepr.py script, because the version of the script from the
pull request will be used rather than the version of the script from
main.

Fixes llvm#77142
agozillon pushed a commit to agozillon/llvm-project that referenced this pull request Feb 5, 2024
…re secure (llvm#78216)"

This reverts commit bc06cd5.

This caused the job to fail for PRs which still had an older version
of code-format-helper.py in their tree.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Split pr-code-format.yml into separate untrusted+trusted workflows
6 participants