-
Notifications
You must be signed in to change notification settings - Fork 10.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[workflows] Split pr-code-format into two parts to make it more secure #78216
Conversation
@llvm/pr-subscribers-github-workflow Author: Tom Stellard (tstellar) ChangesActions triggered by pull_request_target events have access to all repository secrets, so it is unsafe to use them when executing untrusted code. The pr-code-format workflow does not execute any untrusted code, but it passes untrused input into clang-format. An attacker could use this to exploit a flaw in clang-format and potentially gain access to the repository secrets. By splitting the workflow, we can use the pull_request target which is more secure and isolate the issue write permissions in a separate job. The pull_request target also makes it easier to test changes to the code-format-helepr.py script, because the version of the script from the pull request will be used rather than the version of the script from main. Fixes #77142 Full diff: https://github.com/llvm/llvm-project/pull/78216.diff 3 Files Affected:
diff --git a/.github/workflows/issue-write.yml b/.github/workflows/issue-write.yml
new file mode 100644
index 00000000000000..acc625a3e02a3d
--- /dev/null
+++ b/.github/workflows/issue-write.yml
@@ -0,0 +1,72 @@
+name: Comment on an issue
+
+on:
+ workflow_run:
+ workflows: ["Check code formatting"]
+ types:
+ - completed
+
+permissions:
+ contents: read
+
+jobs:
+ pr-comment:
+ runs-on: ubuntu-latest
+ permissions:
+ pull-requests: write
+ if: >
+ github.event.workflow_run.event == 'pull_request'
+ steps:
+ - name: 'Download artifact'
+ # v7.0.1
+ uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea
+ with:
+ script: |
+ let allArtifacts = await github.rest.actions.listWorkflowRunArtifacts({
+ owner: context.repo.owner,
+ repo: context.repo.repo,
+ run_id: context.payload.workflow_run.id,
+ });
+ let matchArtifact = allArtifacts.data.artifacts.filter((artifact) => {
+ return artifact.name == "workflow-args"
+ })[0];
+ let download = await github.rest.actions.downloadArtifact({
+ owner: context.repo.owner,
+ repo: context.repo.repo,
+ artifact_id: matchArtifact.id,
+ archive_format: 'zip',
+ });
+ let fs = require('fs');
+ fs.writeFileSync(`${process.env.GITHUB_WORKSPACE}/workflow-args.zip`, Buffer.from(download.data));
+
+ - run: unzip workflow-args.zip
+
+ - name: 'Comment on PR'
+ uses: actions/github-script@v3
+ with:
+ github-token: ${{ secrets.GITHUB_TOKEN }}
+ script: |
+ var fs = require('fs');
+ const comments = JSON.parse(fs.readFileSync('./comments'));
+ if (!comments) {
+ return;
+ }
+ console.log(comments);
+ await comments.forEach(function (comment) {
+ if (comment.id) {
+ github.issues.updateComment({
+ owner: context.repo.owner,
+ repo: context.repo.repo,
+ issue_number: comment.number,
+ comment_id: comment.id,
+ body: comment.body
+ });
+ } else {
+ github.issues.createComment({
+ owner: context.repo.owner,
+ repo: context.repo.repo,
+ issue_number: comment.number,
+ body: comment.body
+ });
+ }
+ });
diff --git a/.github/workflows/pr-code-format.yml b/.github/workflows/pr-code-format.yml
index 5223089ee8a93d..1475d872498d49 100644
--- a/.github/workflows/pr-code-format.yml
+++ b/.github/workflows/pr-code-format.yml
@@ -1,7 +1,5 @@
name: "Check code formatting"
-on: pull_request_target
-permissions:
- pull-requests: write
+on: pull_request
jobs:
code_formatter:
@@ -27,18 +25,6 @@ jobs:
separator: ","
skip_initial_fetch: true
- # We need to make sure that we aren't executing/using any code from the
- # PR for security reasons as we're using pull_request_target. Checkout
- # the target branch with the necessary files.
- - name: Fetch code formatting utils
- uses: actions/checkout@v4
- with:
- sparse-checkout: |
- llvm/utils/git/requirements_formatting.txt
- llvm/utils/git/code-format-helper.py
- sparse-checkout-cone-mode: false
- path: code-format-tools
-
- name: "Listed files"
env:
CHANGED_FILES: ${{ steps.changed-files.outputs.all_changed_files }}
@@ -56,10 +42,10 @@ jobs:
with:
python-version: '3.11'
cache: 'pip'
- cache-dependency-path: 'code-format-tools/llvm/utils/git/requirements_formatting.txt'
+ cache-dependency-path: 'llvm/utils/git/requirements_formatting.txt'
- name: Install python dependencies
- run: pip install -r code-format-tools/llvm/utils/git/requirements_formatting.txt
+ run: pip install -r llvm/utils/git/requirements_formatting.txt
- name: Run code formatter
env:
@@ -72,9 +58,17 @@ jobs:
# explicitly in code-format-helper.py and not have to diff starting at
# the merge base.
run: |
- python ./code-format-tools/llvm/utils/git/code-format-helper.py \
+ python ./llvm/utils/git/code-format-helper.py \
+ --write-comment-to-file \
--token ${{ secrets.GITHUB_TOKEN }} \
--issue-number $GITHUB_PR_NUMBER \
--start-rev $(git merge-base $START_REV $END_REV) \
--end-rev $END_REV \
--changed-files "$CHANGED_FILES"
+
+ - uses: actions/upload-artifact@v2
+ if: always()
+ with:
+ name: workflow-args
+ path: |
+ comments
diff --git a/llvm/utils/git/code-format-helper.py b/llvm/utils/git/code-format-helper.py
index 8a29a57d8d16bd..f96c9da586dfcc 100755
--- a/llvm/utils/git/code-format-helper.py
+++ b/llvm/utils/git/code-format-helper.py
@@ -44,6 +44,7 @@ class FormatArgs:
token: str = None
verbose: bool = True
issue_number: int = 0
+ write_comment_to_file: bool = False
def __init__(self, args: argparse.Namespace = None) -> None:
if not args is None:
@@ -53,12 +54,14 @@ def __init__(self, args: argparse.Namespace = None) -> None:
self.token = args.token
self.changed_files = args.changed_files
self.issue_number = args.issue_number
+ self.write_comment_to_file = args.write_comment_to_file
class FormatHelper:
COMMENT_TAG = "<!--LLVM CODE FORMAT COMMENT: {fmt}-->"
name: str
friendly_name: str
+ comment: dict = None
@property
def comment_tag(self) -> str:
@@ -119,6 +122,16 @@ def update_pr(self, comment_text: str, args: FormatArgs, create_new: bool) -> No
comment_text = self.comment_tag + "\n\n" + comment_text
existing_comment = self.find_comment(pr)
+
+ if args.write_comment_to_file:
+ self.comment = {
+ 'number' : pr.number,
+ 'body' : comment_text
+ }
+ if existing_comment:
+ self.comment['id'] = existing_comment.id
+ return
+
if existing_comment:
existing_comment.edit(comment_text)
elif create_new:
@@ -309,6 +322,8 @@ def hook_main():
if fmt.has_tool():
if not fmt.run(args.changed_files, args):
failed_fmts.append(fmt.name)
+ if fmt.comment:
+ comments.append(fmt.comment)
else:
print(f"Couldn't find {fmt.name}, can't check " + fmt.friendly_name.lower())
@@ -349,6 +364,10 @@ def hook_main():
type=str,
help="Comma separated list of files that has been changed",
)
+ parser.add_argument(
+ "--write-comment-to-file",
+ action='store_true',
+ help="Don't create a comments on the PR, instead write the comments and metadata a file called 'comment'" )
args = FormatArgs(parser.parse_args())
@@ -357,9 +376,17 @@ def hook_main():
changed_files = args.changed_files.split(",")
failed_formatters = []
+ comments = []
for fmt in ALL_FORMATTERS:
if not fmt.run(changed_files, args):
failed_formatters.append(fmt.name)
+ if fmt.comment:
+ comments.append(fmt.comment)
+
+ if len(comments):
+ with open('comments', 'w') as f:
+ import json
+ json.dump(comments, f)
if len(failed_formatters) > 0:
print(f"error: some formatters failed: {' '.join(failed_formatters)}")
|
✅ With the latest revision this PR passed the Python code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments. Seems to work at least somewhat given the comment on this PR.
Also, is there a way to use the pull_request_target
event but without any permissions? That would allow code formatting to still run even if there is a merge conflict and would help alleviate security concerns about untrusted JSON input into the second job.
repo: context.repo.repo, | ||
issue_number: comment.number, | ||
comment_id: comment.id, | ||
body: comment.body |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure input validation is that necessary here, but this change means that the JSON artifact sent over should be completely untrusted as anyone can theoretically modify the workflow and make it send whatever. I don't think this should happen much in practice (depending upon the settings for whether or not new contributors need approval for workflow runs), but it theoretically gives the ability for anyone to post a comment on any PR/Issue under the Github moniker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to figure out how to get the PR number from the event payload, so that will prevent someone from posting a comment on any issue in the project. So now the only untrusted inputs are the comment id and the comment body.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should ensure that the provided commit_id was previously created by this script, rather than a random other user's comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this. I like the idea of having a reusable workflow that can write comments to issues and PRs so that future workflows can just "hook-in" to that. Will make it a lot easier when we want to add more in the future.
There are some smaller issues that needs to be addressed, and I think it should be tested somewhere first. Also make sure to correctly format the python code.
script: | | ||
var fs = require('fs'); | ||
const comments = JSON.parse(fs.readFileSync('./comments')); | ||
if (!comments) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we handle errors from reading the json here and print the problems? so that we can debug that later if something goes wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a step to always dump the comment file (pass or fail) at the end of the job.
✅ With the latest revision this PR passed the C/C++ code formatter. |
Actions triggered by pull_request_target events have access to all repository secrets, so it is unsafe to use them when executing untrusted code. The pr-code-format workflow does not execute any untrusted code, but it passes untrused input into clang-format. An attacker could use this to exploit a flaw in clang-format and potentially gain access to the repository secrets. By splitting the workflow, we can use the pull_request target which is more secure and isolate the issue write permissions in a separate job. The pull_request target also makes it easier to test changes to the code-format-helepr.py script, because the version of the script from the pull request will be used rather than the version of the script from main. Fixes llvm#77142
2ccc828
to
cd37e90
Compare
The comments on this PR are coming from the old version of the code format job that is currently in the main branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM. Since we're checking out the same ref as before github.event.pull_request.head.sha
, there shouldn't be any new issues. Smoke tests on that would be good though (as well as in general to test that the whole setup works as expected).
There is somewhat of a security hole as mentioned before where someone could modify the workflow to produce whatever comment text/id that they want, but this should be caught before someone hits approve on the workflow and if someone does get by that, given the current mitigations, they can only post a comment on the PR which I don't think is that big of a deal.
It would be good if we could get this landed. I think switching to pull_request
and running code formatting over the merge commit would fix #79661, but that fix requires landing this first.
Hi @tstellar this is causing unrelated presubmit style check failures. Example: https://github.com/llvm/llvm-project/actions/runs/7760656533/job/21167382114?pr=80344
|
@nickdesaulniers Yes, I noticed that. I have a fix here: #80483 You could also fix it in the PR by merging the latest main branch into your PR branch. |
I've reverted this now while I investigate. |
if: > | ||
github.event.workflow_run.event == 'pull_request' | ||
steps: | ||
- name: 'Download artifact' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think i you use the newer upload-artifacts and download-artifacts packages, then you don't need to deal with all the listing/downloading/unzip code -- it'll do it for you just based on a name and run_id.
I have an updated version of this patch in #80495. |
llvm#78216) Actions triggered by pull_request_target events have access to all repository secrets, so it is unsafe to use them when executing untrusted code. The pr-code-format workflow does not execute any untrusted code, but it passes untrused input into clang-format. An attacker could use this to exploit a flaw in clang-format and potentially gain access to the repository secrets. By splitting the workflow, we can use the pull_request target which is more secure and isolate the issue write permissions in a separate job. The pull_request target also makes it easier to test changes to the code-format-helepr.py script, because the version of the script from the pull request will be used rather than the version of the script from main. Fixes llvm#77142
…re secure (llvm#78216)" This reverts commit bc06cd5. This caused the job to fail for PRs which still had an older version of code-format-helper.py in their tree.
Actions triggered by pull_request_target events have access to all repository secrets, so it is unsafe to use them when executing untrusted code. The pr-code-format workflow does not execute any untrusted code, but it passes untrused input into clang-format. An attacker could use this to exploit a flaw in clang-format and potentially gain access to the repository secrets.
By splitting the workflow, we can use the pull_request target which is more secure and isolate the issue write permissions in a separate job. The pull_request target also makes it easier to test changes to the code-format-helepr.py script, because the version of the script from the pull request will be used rather than the version of the script from main.
Fixes #77142