-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed regression when using fetch-depth = 1 #267
Comments
@winterqt Thanks for reporting this. I've previously pinpointed the problem in nixpkgs with changing the default ref that's checked out. Please try again without the non-default However, I don't understand how it could lead to such a speed regression. I'd actually expect that it wouldn't work. Do you have the logs of such a run for me? Perhaps these shed some light on what's going on. I might also be able to use the data of an actual run to reproduce the issue locally. |
Here are a few logs:
(Side note, are these workflow file URLs returning the page on GitHub for the file (with syntax highlighting etc.) or is it returning the raw content of the file? GitHub can't make up its mind seemingly :/ might be a browser thing.) |
Wow, I'm really surprised about those results. It's completely the opposite of what I'd expect. I'll need some time to dive into this. Thanks again for reporting it and collecting all this data. 🙇
👍 They appear with syntax highlighting on my browser (brave) |
I haven't tested it yet, but I think all that's missing is specifying a fetch depth of the fetches executed by the action. I guess currently it fetches all ancestor commits of the fetched commit but that's really unnecessary. This would explain why it takes so long 😌 See |
Copied for visibility from #268 (review) |
This week I've read a lot about shallow repos, and grafted commits. I've played around with some fetching strategies that could work for the backport-action. And I've thought about optimizing the action for performance on larger repos. TL;DRWe can optimize the action for fetch speed, but until then please consider cloning with the entire history (untested but may work): - uses: actions/checkout@v3
with:
fetch-depth: 0 The current problemsAt the moment, the action uses the git history to determine the relevant commits using In the logs you shared we see that fetching the base and head refs both take about 10 minutes while fetching the target only takes ~50s.
If we don't want to fetch the entire history but use a shallow repo, then we need to fetch an arbitrary number of commits from the base branch. This number would be dependent on the repo's speed to add more commits. For large repos like Nixpkgs, the base is growing very rapidly (I think it's at ~900 commits per week for Nixpkgs now). And so this may not be the best approach anymore. A different approachInstead of determining the commits to cherry-pick based on the repo's history (i.e. git mergebase), we can ask GitHub to tell us about the commits on the PR. Both the REST API as well as the GraphQL API provide methods for this.
When we know the exact commit shas, we can shallowly fetch only the relevant parts. This should provide optimal performance for repos with large histories, and doesn't negatively impact young repos. Cloning the repo shallowlyA current design choice is that the action can backport a PR to multiple targets. This means it needs to check out each target's latest commit and then cherry-pick the relevant commits from the PR on top of it. So we cannot simply clone the target branch shallowly. So we'll need to start with the PR head. Sadly, it is not possible with
Fetch only the relevant commits to cherry-pickNow we can fetch the relevant commits. To retrieve those, we can fetch To work around this, we can simply fetch 1 commit deeper:
(for each target:) Fetch the target and cherry-pick the commitsThe target can again be shallowly fetched as a grafted commit because we don't really care about that latest commit. We just want to cherry-pick on top of it.
We can verify whether the commits that we want to cherry-pick are available to use:
And now we can cherry-pick them.
In my experiments, I was unable to cherry-pick a range of commits this way. Perhaps this is because of the grafted commits, 🤷 But for now it's fine to reference them individually. If this command becomes too long we can even cherry-pick them individually.
What's blocking us from doing this?The tests are too coupled to the implementation and don't really allow such a refactoring. Recently, I've been working on https://github.com/korthout/backport-action-test as a way to test only the behavior. I don't want to make the old tests work with this new way. Instead, I'd prefer to replace them with more cases in backport-action-test. Sadly these tests are still a bit brittle and slow. I hope to improve that soon. But for now, there aren't really many test cases in place that support these changes. So we'd have to test these changes more or less in production. That may be okay, but: Is there anything I can try in the meantime?I think we can try to checkout the repo differently and potentially see a similar performance as v0.0.5. If we clone with fetch-depth 0 (entire history), I think that the following fetches should complete pretty much instantly. - uses: actions/checkout@v3
with:
fetch-depth: 0 @winterqt Could you please give this a try? |
Firstly, thank you for all the research and work you've put into this, I greatly appreciate it. Secondly, sure -- I assume you want me to bump the action to v0.0.8 and try keeping our |
👍 Yes that's correct. Curious to your results 🙇
❤️ Thanks, that's great to hear. |
It's taken me some time, but I'm happy to report that I was able to successfully perform a lot of successful tests with a new version of the backport-action that implements the approach I described above. 🎉 That means:
- uses: actions/checkout@v3
with:
fetch-depth: 1
I was able to test this:
@winterqt I'd love to receive some feedback on this from you. Does it really speed up the checkout and the backporting in a large project like Nixpkgs? Please try it out by switching the checkout action's - uses: actions/checkout@v3
with:
fetch-depth: 1
- name: Backport
uses: zeebe-io/backport-action@korthout-fetch-depth If these changes solve the problem as expected, this will form the basis of v1.0 of this action. |
@winterqt My team has switched to
Of course, I'm curious about the effect this release will have on Nixpkgs (430k commits), but I don't recommend switching the repository to this release candidate already, as it hasn't been battle-tested yet. I just wanted to keep you in the loop. |
There seems to be a speed regression between using
fetch-depth = 0
andfetch-depth = 1
alongside #162.We use this action on Nixpkgs, which is a very large repository. When using v0.5.0 with
fetch-depth = 0
, our checkout takes ~5m, while the actual backport takes ~15s. Unexpectedly, when switching tov0.8.0
and the defaultfetch-depth
value (1
), our checkout takes ~10s, and the backport can take over 10m.Using the logs, I've tracked the points where time is spent the most down to these three lines:
https://github.com/zeebe-io/backport-action/blob/7078123ca60e45c3452ca61d534dbd55c5319fec/src/backport.ts#L68-L69
https://github.com/zeebe-io/backport-action/blob/7078123ca60e45c3452ca61d534dbd55c5319fec/src/backport.ts#L95
which is all fetching. Is there anything that can be done to improve this? It seems weird to me that selectively fetching the history as done here can take double the time that fetching the entire history does. Thanks!
The text was updated successfully, but these errors were encountered: