Skip to content

Reland pr code format updates #71131

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

boomanaiden154
Copy link
Contributor

This PR relands #69766 after it caused failures due to being unable to fetch branches from forks. I was using the wrong ref, so it tried to pull the same branch name from llvm/llvm-project rather than trying to pull the branch from the fork that the PR was created from. This is also why it didn't show up in my testing on my fork as I was only testing PRs on branches from within the same repository.

…lvm#69766)"

This commit relands 4aa12af. This was
originally reverted as it caused fetch failures due to the usage of the
wrong ref. I didn't catch this in testing as it was attempting to check
the branch out from origin, where it didn't exist, but did on my fork as
I never tested any cross-repo PRs.
@llvmbot
Copy link
Member

llvmbot commented Nov 3, 2023

@llvm/pr-subscribers-mlgo

@llvm/pr-subscribers-github-workflow

Author: Aiden Grossman (boomanaiden154)

Changes

This PR relands #69766 after it caused failures due to being unable to fetch branches from forks. I was using the wrong ref, so it tried to pull the same branch name from llvm/llvm-project rather than trying to pull the branch from the fork that the PR was created from. This is also why it didn't show up in my testing on my fork as I was only testing PRs on branches from within the same repository.


Full diff: https://github.com/llvm/llvm-project/pull/71131.diff

1 Files Affected:

  • (modified) .github/workflows/pr-code-format.yml (+40-13)
diff --git a/.github/workflows/pr-code-format.yml b/.github/workflows/pr-code-format.yml
index fa0e69211b14788..555081471a7ef7b 100644
--- a/.github/workflows/pr-code-format.yml
+++ b/.github/workflows/pr-code-format.yml
@@ -7,17 +7,44 @@ jobs:
   code_formatter:
     runs-on: ubuntu-latest
     steps:
-      - name: Fetch LLVM sources
-        uses: actions/checkout@v4
-        with:
-          fetch-depth: 2 # Fetches only the last 2 commits
-
+      # Get changed files before checking out the repository to force the action
+      # to analyze the diff from the Github API rather than looking at the
+      # shallow clone and erroring out, which is significantly more prone to
+      # failure.
       - name: Get changed files
         id: changed-files
         uses: tj-actions/changed-files@v39
         with:
           separator: ","
-          fetch_depth: 2000 # Fetches only the last 2000 commits
+
+      - name: Calculate number of commits to fetch
+        run: echo "PR_FETCH_DEPTH=$(( ${{ github.event.pull_request.commits }} + 1 ))" >> "${GITHUB_ENV}"
+
+      - name: Fetch PR sources
+        uses: actions/checkout@v4
+        with:
+          ref: ${{ github.event.pull_request.head.sha }}
+          fetch-depth: ${{ env.PR_FETCH_DEPTH }}
+          path: pr-sources
+
+      - name: Print commits being evaluated
+        working-directory: ./pr-sources
+        env:
+          PR_DEPTH: ${{ github.event.pull_request.commits }}
+        run: |
+          git log HEAD~$PR_DEPTH...HEAD
+
+      # We need to make sure that we aren't executing/using any code from the
+      # PR for security reasons as we're using pull_request_target. Checkout
+      # the target branch with the necessary files.
+      - name: Fetch LLVM Sources
+        uses: actions/checkout@v4
+        with:
+          sparse-checkout: |
+            llvm/utils/git/requirements_formatting.txt
+            llvm/utils/git/code-format-helper.py
+          sparse-checkout-cone-mode: false
+          path: llvm-sources
 
       - name: "Listed files"
         run: |
@@ -34,21 +61,21 @@ jobs:
         with:
           python-version: '3.11'
           cache: 'pip'
-          cache-dependency-path: 'llvm/utils/git/requirements_formatting.txt'
+          cache-dependency-path: 'llvm-sources/llvm/utils/git/requirements_formatting.txt'
 
       - name: Install python dependencies
-        run: pip install -r llvm/utils/git/requirements_formatting.txt
+        run: pip install -r llvm-sources/llvm/utils/git/requirements_formatting.txt
 
       - name: Run code formatter
         env:
           GITHUB_PR_NUMBER: ${{ github.event.pull_request.number }}
-          START_REV: ${{ github.event.pull_request.base.sha }}
-          END_REV: ${{ github.event.pull_request.head.sha }}
+          PR_DEPTH: ${{ github.event.pull_request.commits }}
           CHANGED_FILES: ${{ steps.changed-files.outputs.all_changed_files }}
+        working-directory: ./pr-sources
         run: |
-          python llvm/utils/git/code-format-helper.py \
+          python ../llvm-sources/llvm/utils/git/code-format-helper.py \
             --token ${{ secrets.GITHUB_TOKEN }} \
             --issue-number $GITHUB_PR_NUMBER \
-            --start-rev $START_REV \
-            --end-rev $END_REV \
+            --start-rev HEAD~$PR_DEPTH \
+            --end-rev HEAD \
             --changed-files "$CHANGED_FILES"

@boomanaiden154
Copy link
Contributor Author

One thing to note with these changes is that it still doesn't handle merging changes from main into the PR branch to update it very well. When someone does this, the n commits within the PR are not in the first n commits anymore. I'm not sure how common of a problem this is as the recommended workflow is to rebase and force push, but I have seen it happen in some cases.

This is somewhat of a regression from the old behavior, but not a complete one from what I understand. Since merging changes the order, especially if done multiple times (like I've seen in some PRs), starting and stopping at a specific revision will pull in more changes than just the commits in the PR. This is somewhat alleviated by only looking at a specific set of files though.

This should probably be fixed at some point (maybe even before this lands given @llvm-beanz's patch should help with quite a few cases). Not sure of the best way to do it though. It would be easiest if we could just pull the merge ref and diff the merge commit, but that doesn't work if there are any merge conflicts (fails to fetch the ref). We could iteratively fetch until we hit the merge base and the only run clang-format on individual commits, but I'd rather avoid the iterative fetching if possible.

@tstellar
Copy link
Collaborator

tstellar commented Nov 3, 2023

I know you can fetch a patch for the pull request via the API. Would that help with any of the corner cases?

@boomanaiden154 boomanaiden154 force-pushed the reland-pr-code-format-updates branch from c7ea36b to dbcd677 Compare November 3, 2023 07:00
@boomanaiden154
Copy link
Contributor Author

I know you can fetch a patch for the pull request via the API. Would that help with any of the corner cases?

Potentially, but even a tool like https://github.com/llvm/llvm-project/blob/main/clang/tools/clang-format/clang-format-diff.py relies on having a checkout where the diff applies cleanly, which would require finding the merge base.

I was able to get this working by diffing merge commits, with the caveat that the job doesn't run if there's a merge conflict. This matches the behavior when using pull_request, but isn't ideal as we should still be able to check code formatting when there are merge conflicts. I don't think that would result in code landing without ever being checked by the action however (or at least make it exceedingly rare).

I'll run some experiments with iteratively fetching until we hit the merge base and see what the performance is like.

I think I'd push to land this diffing merge commits and not working with merge conflicts with iterative fetching/script modifications to enable running the action on the merge conflicts case as a follow up patch, but interested to hear what others think. This isn't super urgent as #70946 reduces the number of failures to only a couple per day (n=1 day, didn't go through all of the past 24 hours).

@boomanaiden154 boomanaiden154 force-pushed the reland-pr-code-format-updates branch from 3c51c88 to c7bc9de Compare November 4, 2023 04:38
@boomanaiden154 boomanaiden154 force-pushed the reland-pr-code-format-updates branch from c7bc9de to 324fcd6 Compare November 4, 2023 04:39
@llvmbot llvmbot added the mlgo label Nov 4, 2023
@boomanaiden154 boomanaiden154 force-pushed the reland-pr-code-format-updates branch from c4f3adc to 324fcd6 Compare November 4, 2023 05:29
@boomanaiden154
Copy link
Contributor Author

A bit more hacking:

  • Apparently it's not as easy as I thought to check the mergability of a PR. It needs to be done through a separate API request specifically for that PR as the pull_request.mergeable field/associated fields are not populated in the CI run.
  • Weird behavior exists. For example, you can still fetch the merge ref of a PR that has conflicts, it's just the merge ref for the last version of the PR that didn't have any conflicts (from what I can tell). Failing on mergeability fixes this issue though.

This PR should be in a landable state, acknowledging the lack of functionality in the case where there are merge conflicts. I've kept the changed-files step after the checkout to hopefully avoid an additional API call. If that still causes issues (there are still occasional transient failures not even related to fetch depth issues) and we're running out of API requests, we can also switch to the llvm bot token given this uses pull_request_target.

@tstellar
Copy link
Collaborator

tstellar commented Nov 7, 2023

What happens when there are merge conflicts? Does the job just fail or does it give an informative error message?

@boomanaiden154
Copy link
Contributor Author

Currently it just fails in the step where it checks if the PR is mergeable. Would it be preferable to make it adjust the comment with a message?

Copy link
Collaborator

@tru tru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in theory. The test is always in production of course. But feel free to merge and test.

@tstellar
Copy link
Collaborator

tstellar commented Nov 8, 2023

Currently it just fails in the step where it checks if the PR is mergeable. Would it be preferable to make it adjust the comment with a message?

Yeah, I think it would be useful to produce a message like: "Failed to check code formatting, please rebase your PR on the main branch to verify the formatting."

I think most people get confused if they see a failed test, so we should try to help as much as possible.

@boomanaiden154
Copy link
Contributor Author

Closing in favor of #72020.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants