New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build: Simplify and optimize git backend: New clone+fetch pattern #10430
Conversation
6447850
to
9d616f0
Compare
I've just discovered that We need to rely on I don't think it's nice to have a property called "verbose" as if it's human readable and used this way. But I think it's pretty consistent to what already happened with PR numbers :) |
…me if tags should work :/
…rbose_name if tags should work :/" This reverts commit 82178a3.
…me` fields are used
4c1fe9b
to
b3af824
Compare
Now that I think the clone and fetch methods look like they should, here's a test with a large repository (cpython): git cloneBefore: Total time: 90.6 seconds Details
After: Total time: 10.7 seconds Details
git fetchBefore: We are running this perhaps with an understanding that things already fetched during Total time: 92.3 seconds Details
After: Total time: 91.3s For tags and branches, we are now adding a remote specifier to the fetch command. Most PRs have this, too (GitHub+GitLab). It doesn't make a big difference. I think that's because we have Details
A little extra experimentWe can cut off almost the entire 1½ minute from the fetch command by just fetching 1 commit and no tags. I think the problem is that Total time: 0.7s Details
I get this assumption confirmed by running the command with
Total time: 76.5s Details
If we want to keep This command means "get the
Total time: 1.2s To make it work for a tag, we would have to do:
This means "get a tag (no need to guess that it's a tag, we use the explicit Details
Next step?With the current changes and scope of work in mind, I would also go ahead and remove @humitos what's your thoughts on this? I think that comparing it to all the other changes, it's such a small adjustment with a big effect. |
…ature/git-clone-fetch-checkout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the current changes and scope of work in mind, I would also go ahead and remove --tags from the fetch command. I'm leaning towards slicing off the additional 1½ minute from git fetch by removing --tags, now that we are already doing all the work to test and verify.
@humitos what's your thoughts on this? I think that comparing it to all the other changes, it's such a small adjustment with a big effect.
Sounds good to me. To summarize, can you write here what are going to be the commands for each case? I understand that the set of commands will be exactly the same for all the cases, with the only difference being the fetched reference (at the end of the command)
- on branches:
{verbose_name}
- on tags:
refs/tags/{verbose_name}:refs/tags/{verbose_name}
- on PRs:
pull/{verbose_name}/head:external-{verbose_name}
Then, the set of commands would be:
git clone --depth 1 --no-checkout <repository url>
git fetch origin --depth 50 <reference generated by RTD>
git checkout <verbose_name>
If I am correct here, let's move forward with this approach. Then we can slowly roll this out and check if there are projects affected by this and fix those edge cases.
There is one more thing that you will need to think about. The case when there is no |
That's why the identifier is optional for the fetch command. I can definitely check the behavior for projects without default branches 👍 |
…o use verbose_name if tags should work :/"" This reverts commit a45b42b.
Co-authored-by: Manuel Kaufmann <humitos@gmail.com>
…docs/readthedocs.org into feature/git-clone-fetch-checkout
…ying to use it for debugging!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding some comments and questions about various decisions. If you see anything you would like as an inline comment, LMK 👍
Feel free to resolve!
{branch.verbose_name for branch in repo.branches}, | ||
) | ||
|
||
def test_update_without_branch_name(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is new
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going in a good direction 👍🏼
I have some suggestions and opinions here about the implementation, code style, naming and others. I wrote some comments and questions that I'd like you to address before merging.
Besides, I'm pinging @stsewd to jump into the review of this code as well because it touches the build system, which is not trivial and it's easy to miss some use cases, and edge ones. The code itself it's not complex, but the context is. So, better to have more eyes here.
Co-authored-by: Manuel Kaufmann <humitos@gmail.com>
Once this is integrated, I will rebase: |
Co-authored-by: Manuel Kaufmann <humitos@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some comments around not breaking other VCSs backends. There are a lot of previous conversations I didn't take a look at, let me know if there are any relevant ones that my need my input.
…ature/git-clone-fetch-checkout
Did a bunch of local Q&A with the latest |
…ature/git-clone-fetch-checkout
I did some local QA and I was able to build all the versions I tested from Thanks for this awesome work 💯! This feature was not an easy one 👍🏼 |
Fixes #9736
Still in draft mode. But we've discussed the approach and this seems to be the best compromise between reaching goals to make the process faster and not risking too much breakage.
not
on the feature flagGitBackend.branches
andGitBackend.tags
is used. Old tests were assuming that it would return all branches after a clone. But we removed that by design, so this assumption has to be eliminated (we just need to force ls-remote then this is fine).stable
versions (it's stored a version_type=TAG, but such a git tag doesn't exist) = we need to user Version.machine to be aware that it's an auto-generated version and use its commit hash to fetch.TestGitBackendTwiceInARow
High-level
git clone
andgit fetch
have been changed.git clone
no longer fetches branch and tag information, it just clones with minimal information and doesn't check out anything because we added--no-checkout
. We know that we'll always callfetch
afterwards and fetch exactly what's needed. We also know that we'll callgit checkout
(with a single known exception).git fetch
no longer contains--tags
, meaning that it only fetches what is specified by the remote identifier.identifier
argument for.update()
VCS method)TODO
comments. For instance, the new pattern still contains some legacy calls tomake_clean_working_dir
.set_remote_url
.Changes:
git clone
git clone
fetches as little data as possible.The new command is:
git clone --no-checkout --depth 1 {repo_url} .
Changes:
git fetch
git fetch
doesn't have--tags
anymore, and for branch and tag builds, it will now add the remote specifier for the branch or tag that is being fetched.git fetch origin --force --prune --prune-tags --depth 50 {remote}
{Version.identifier}
refs/tags/{Version.verbose_name}:refs/tags/{Version.verbose_name}
pull/{Version.verbose_name}/head:external-{Version.verbose_name}
Edit: I noticed that auto-created "stable" versions need special treatment because they are marked as a git tag. But we don't really know which branch the "stable" originates from because we throw away that information when setting "stable". So we'll have to keep fetching everything and then checking out the commit.
Note for default branches
For newly imported projects without a branch name set, a build is automatically started.
So we are building without any information about branch, tag or commit. We already have a mechanism that skips the "checkout" operation since we don't know what to check out.
It's quite easily solved, as the new clone operation automatically sits at the "HEAD" of the remote repo. But we need to remove
--no-checkout
fromgit clone
in this case.Example 🎉
Follow-ups
After these changes, our clone+fetch operations are likely so fast that no one should notice it, even big projects.
But there are some changes that can reduce the complexity in the codebase:
--prune
,--force
,--prune-tags
and other options that throw in strange assumptions that somehow the working directory isn't clean?--depth 50
ongit fetch
SyncRepositoryMixin.sync_versions
to always use lsremote.GitBackend.branches
andGitBackend.tags
entirely, and make sure everything just uses lsremote.gitpython
dependency.