Skip to content

Conversation

@ilovepi
Copy link
Contributor

@ilovepi ilovepi commented Nov 5, 2025

As discussed on discourse
https://discourse.llvm.org/t/rfc-require-pull-requests-for-all-llvm-project-commits/88164

This is an attempt at a script to automatically make and land PRs.

It creates a branch (or a series of branches for stacks), and lands them one by one, without waiting for review or checks to pass.

It supports --auto-merge, for single PRs. For stacks, use Graphite or some other tool.

It can work with GitHub tokens for https or ssh keys.

Example:

GITHUB_TOKEN=$(gh auth token) python3 llvm_push_pr.py --upstream-remote origin

As discussed on discourse
https://discourse.llvm.org/t/rfc-require-pull-requests-for-all-llvm-project-commits/88164

This is an attempt at a script to automatically make and land PRs.

It creates a branch (or a series of branches for stacks), and lands them
one by one, without waiting for review or checks to pass.

It supports --auto-merge, for single PRs. For stacks, use Graphite or
some other tool.

It can work with GitHub tokens for https or ssh keys.

Example:
```console
GITHUB_TOKEN=$(gh auth token) python3 llvm_push_pr.py --upstream-remote origin
```
self.printer.print(f"On branch: {self.original_branch}")

try:
self._rebase_current_branch()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this part of the flow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me see what it looks like w/o a rebase here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think that without a stack, I shouldn't have to rebase to open my PR (just like I would normally).

(Not that it is a big deal either I guess)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, after we started rebasing reliably after every PR lands, running successive commands no longer runs into the class of problems I was seeing.

"For stacks, the script must merge sequentially.",
file=sys.stderr,
)
sys.exit(1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is that? When pushing branches to upstream, you can cascade pull-requests to create your stack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to build support for every feature under the sun, just some MVP that people that will actually use this tool can enhance. certainly its possible in the API to set the base branch, but that was much more complicated IMO than what was asked for as an MVP on discourse.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to build support for every feature under the sun,

Sure, but then a good starting point would be to just not support stacks at all and bail out when there are multiple commits!

Otherwise I would argue we should implement just the most straightforward flow for stacks, which I think is a cascade of branches (which is the only way you can correctly setup a stack without merging it AFAIK).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, but landing a stack was specifically asked for in the discourse thread. I have no plans to implement a cascade of related PRs. IMO that's work out of scope for the MVP, while making it possible to land a stack was mostly straight forward, since it was mostly just a loop around the existing logic for a single PR.

Copy link
Contributor

@boomanaiden154 boomanaiden154 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming you've tested this somewhere?

"""Wrapper for run_command that passes the dry_run flag."""
return self.printer.run_command(command, read_only=read_only, **kwargs)

def _get_repo_slug(self) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplicates functionality below and should also be constant given we only need to support the monorepo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. Also handled pretty differently now. I just assumed a bunch of things about LLVM that are basically always going to be true.

self.printer.print(f"On branch: {self.original_branch}")

try:
self._rebase_current_branch()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to rebase on top of main. Github handles the actual merge for us, and I think not rebasing is one of the selling points of using this script.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have a stack, its going to get rebased anyway. it seemed better to find out right away that it failed due to a merge conflict or something, but we can certainly how it works if I omit it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I found in my refactoring that if you don't have an accurate merge base w/ this approach, and you have a stack, you'll make redundant empty PRs if you don't say rebase your branch on top of main before trying to land again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I shored this up, and it seems to be working fine now that any PR landing always has a rebase operation following it. so the initial rebase is no longer necessary.

def _get_commit_stack(self) -> List[str]:
"""Gets the stack of commits between the current branch's HEAD and its merge base with upstream."""
target = f"{self.args.upstream_remote}/{self.args.base}"
merge_base_result = self._run_cmd(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to find a merge base here? Why can't we just do a three dot diff between main and the user's branch?

I don't think this handles causes like someone rebasing a bunch of merged commits any more nicely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC it was to handle the case where main is ahead of your branch. IIRC I actually ran into this when prototyping.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The three dot diff will handle this case. You only run into problems with that with a two dot diff.

self.printer.print("Cleaning up temporary local branches...")
self._run_cmd(["git", "branch", "-D"] + self.created_branches)
self.printer.print("Cleaning up temporary remote branches...")
self._run_cmd(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this delete branches that have automerge enabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, maybe. this wasn't something I could easily test. maybe the delete operation should only happen if its not an automerge situation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the delete operation ever happens when we have automerge enabled now. Do double check my logic, but when I tested it, that didn't happen, and I didn't see it in the verbose logging. The repo where I tested it didn't have much in the way of CI though, so its was just some linting. Still, I didn't see it retry or anything under --verbose when I tried it out last.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. Probably something that's easiest to test once it lands.

def _get_commit_details(self, commit_hash: str) -> tuple[str, str]:
"""Gets the title and body of a commit."""
result = self._run_cmd(
["git", "show", "-s", "--format=%s%n%n%b", commit_hash],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
["git", "show", "-s", "--format=%s%n%n%b", commit_hash],
["git", "show", "-s", "--format=%B", commit_hash],

Why not just %B?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prototyped this with a lot of 1 line commits (subject only) and some weirdly formatted bodies. having the subject line then \n\n and the rest being the body worked well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying that you saw issues with %B?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the logic to use %B. Turns out I was overcomplicating the parsing behavior in my head. thanks.

Comment on lines 432 to 433
self._run_cmd(["git", "branch", "-f", branch_name, commit_hash])
push_command = ["git", "push", self.args.remote, branch_name]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we creating local branches? We can just push a sha directly:

Suggested change
self._run_cmd(["git", "branch", "-f", branch_name, commit_hash])
push_command = ["git", "push", self.args.remote, branch_name]
push_command = ["git", "push", self.args.remote, f"{commit_hash}:refs/head/{branch_name}"]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I actually just used the same workflow I normally would. It seemed nicer for debugging, but maybe you're right, and we don't need that ATM.

"""Checks if git is installed and if inside a git repository."""
printer.print("Checking prerequisites...")
printer.run_command(["git", "--version"], capture_output=True, read_only=True)
if not os.getenv("GITHUB_TOKEN"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we name it GITHUB_LLVM_TOKEN?
In general folks may want to segregate their tokens to be finer grain, and so having a name that is more specific allows them to put it in a .bashrc or similar without colliding with other token needs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

parser = argparse.ArgumentParser(
description="Create and land a stack of Pull Requests."
)
GITHUB_REMOTE_NAME = "origin"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why "origin" is the best name, instead of "fork" for example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're in your fork, isn't "origin" always your remote? or are you suggesting it's more common for folks to have a checkout of the upstream repository, and set "fork" to track their personal fork of LLVM (origin being llvm/llvm-project)? Or maybe I'm missing your point?

Copy link
Contributor

@boomanaiden154 boomanaiden154 Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always clone https://github.com/llvm/llvm-project, so origin points to upstream and then I have fork as a remote to my fork.

Not sure what's more common though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. I wrote this assuming forks are more common. That constant is just the default though, and you can override it w/ --remote and the upstream w/ --upstream-remote. Maybe there are better names for these though.

After starting to address a few comments, I basically realized I needed
a new approach. I by and large kept the structure the same, but have
revised the library usage to ust use things from the standard library.

I've slightly modified some of the initial classes to be provide more
useful abstractions. There's probably room for more improvements, but
I wanted to keep this as simple as possible.
@github-actions
Copy link

github-actions bot commented Nov 14, 2025

✅ With the latest revision this PR passed the Python code formatter.

@github-actions
Copy link

github-actions bot commented Nov 18, 2025

🐧 Linux x64 Test Results

  • 186344 tests passed
  • 4854 tests skipped

@ilovepi ilovepi marked this pull request as ready for review November 18, 2025 22:18
Copy link
Contributor

@boomanaiden154 boomanaiden154 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is pretty close.

def _get_commit_stack(self) -> List[str]:
"""Gets the stack of commits between the current branch's HEAD and its merge base with upstream."""
target = f"{self.args.upstream_remote}/{self.args.base}"
merge_base_result = self._run_cmd(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The three dot diff will handle this case. You only run into problems with that with a two dot diff.

self.printer.print("Cleaning up temporary local branches...")
self._run_cmd(["git", "branch", "-D"] + self.created_branches)
self.printer.print("Cleaning up temporary remote branches...")
self._run_cmd(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. Probably something that's easiest to test once it lands.

git_env["GIT_TERMINAL_PROMPT"] = "0"
return git_env

def _run_cmd(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is an extra function here really necessary?

parser.add_argument(
"--dry-run", action="store_true", help="Print commands without executing them."
)
group = parser.add_mutually_exclusive_group()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we can't name the two groups differently? I think that would keep things more clear even though it doesn't impact functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants