Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build: Simplify and optimize git backend: New clone+fetch pattern #10430

Merged
merged 60 commits into from Jul 10, 2023
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
f8f35ac
New git clone/fetch/checkout behavior first steps
benjaoming Jun 13, 2023
9d616f0
Use `identifier` argument for `.update()`, add lots of comments for f…
benjaoming Jun 13, 2023
614963d
Add missing str() cast
benjaoming Jun 13, 2023
82178a3
Remove the `identifier` keyword since we are forced to use verbose_na…
benjaoming Jun 13, 2023
a45b42b
Revert "Remove the `identifier` keyword since we are forced to use ve…
benjaoming Jun 14, 2023
746d94b
We need the identifier for branches, but not for tags and PRs
benjaoming Jun 14, 2023
b3af824
Update some comments after noticing how `Build.identifier/.verbose_na…
benjaoming Jun 14, 2023
1f538b6
Merge branch 'main' of github.com:readthedocs/readthedocs.org into fe…
benjaoming Jun 14, 2023
04f3caf
lint
benjaoming Jun 14, 2023
2f8d346
Revert "Revert "Remove the `identifier` keyword since we are forced t…
benjaoming Jun 15, 2023
c634ca4
Use version_identifier as an instance property
benjaoming Jun 15, 2023
6725dbb
Update readthedocs/vcs_support/backends/git.py
benjaoming Jun 15, 2023
e186e7f
Updates to return values, comments and logging
benjaoming Jun 15, 2023
7fcf937
Merge branch 'feature/git-clone-fetch-checkout' of github.com:readthe…
benjaoming Jun 15, 2023
8d13787
Restore common
benjaoming Jun 15, 2023
a41e352
Stop fetching --tags and fetch the direct remote reference
benjaoming Jun 15, 2023
0cb043a
Appease linter: Use raise-from pattern
benjaoming Jun 15, 2023
e9322df
Hello darker my old friend
benjaoming Jun 15, 2023
b6ba466
Restore error handling for clone operation
benjaoming Jun 15, 2023
230e04f
Update a few comments, add a bit more raise-from pattern (actually tr…
benjaoming Jun 15, 2023
e3fcbf5
WIP: refine some comments, log calls and small things while testing l…
benjaoming Jun 15, 2023
abf41b9
New test class with Feature flag defined
benjaoming Jun 19, 2023
c2b3d44
Adds a tearDown method after discovering that tests were wrongly pass…
benjaoming Jun 19, 2023
84492ad
Merge branch 'main' of github.com:readthedocs/readthedocs.org into fe…
benjaoming Jun 19, 2023
9587844
Log a warning if we aren't specifying the version type when using Git…
benjaoming Jun 19, 2023
9dbc8c3
Update test cases for new git clone+fetch pattern
benjaoming Jun 19, 2023
d8cbc50
Git fetch change: Fetch branches to :refs/remotes/origin/<branch-name…
benjaoming Jun 19, 2023
15eab05
Overwrite tests to call clone_ng and fetch_ng, and rewrite some entirely
benjaoming Jun 19, 2023
c5abe46
Update update() docstring
benjaoming Jun 19, 2023
577bf51
Always use lsremote for GIT_CLONE_FETCH_CHECKOUT_PATTERN
benjaoming Jun 20, 2023
b507d02
Builds need to know about Version.machine=True for tags "stable" so t…
benjaoming Jun 20, 2023
52f071c
Conditionally log a warning on stable versions
benjaoming Jun 20, 2023
693eef9
linting
benjaoming Jun 20, 2023
bd98bf6
add "machine" to version API test case
benjaoming Jun 20, 2023
21dd16e
Use "refspec" terminology more consistently
benjaoming Jun 20, 2023
b1cea44
Add test case for un-named default branches
benjaoming Jun 20, 2023
2eba2f5
Don't have "--no-checkout" when we *need* to checkout the remote HEAD
benjaoming Jun 20, 2023
176f802
Add test case to verify that we can clone+fetch special tag "stable" …
benjaoming Jun 20, 2023
4191e11
Update readthedocs/vcs_support/backends/git.py
benjaoming Jun 20, 2023
acb7dd9
Remove unused logging
benjaoming Jun 20, 2023
3fe7486
Use `git fetch <commit hash>` for special "stable" versions
benjaoming Jun 20, 2023
33fa7da
Building an unnamed default branch now skips `git fetch`
benjaoming Jun 20, 2023
6431c00
Remove unnecessary logging, add a lot of comments to tests
benjaoming Jun 20, 2023
e42f902
Apply suggestions from @humitos code review
benjaoming Jun 22, 2023
c573736
Update readthedocs/projects/models.py
benjaoming Jun 22, 2023
4343fcf
Always clean up the presumed repo working dir after a test case
benjaoming Jun 26, 2023
f02c8ff
Move `machine` to `VersionAdminSerializer`
benjaoming Jun 26, 2023
ae064b8
Add precision to comment
benjaoming Jun 26, 2023
2f3e6ca
Use identifier and verbose_name that looks more like actual data woul…
benjaoming Jun 26, 2023
f2fc2b0
remove unused local reference in git fetch for "stable" tag
benjaoming Jun 26, 2023
6628d4f
Don't use --no-checkout, just checkout whatever is at the FETCH_HEAD
benjaoming Jun 26, 2023
5d5a6f5
Add more logging
benjaoming Jun 26, 2023
85ac0e4
Update comment
benjaoming Jun 26, 2023
f1d426b
Fix logic condition: Always include `vcs_repository.supports_lsremote`
benjaoming Jun 27, 2023
19eb282
Log a warning when we try to generate a pr/mr refspec for an unsuppor…
benjaoming Jun 27, 2023
5b10400
improve doc strings
benjaoming Jun 27, 2023
b007df9
Merge branch 'main' of github.com:readthedocs/readthedocs.org into fe…
benjaoming Jun 27, 2023
a2d7223
Remove TestGitBackendTwiceInARow
benjaoming Jun 27, 2023
41e1f09
Use real commit hash in test case
benjaoming Jun 27, 2023
98597f2
Merge branch 'main' of github.com:readthedocs/readthedocs.org into fe…
humitos Jul 5, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion common
benjaoming marked this conversation as resolved.
Show resolved Hide resolved
8 changes: 2 additions & 6 deletions readthedocs/api/v2/views/integrations.py
Expand Up @@ -15,11 +15,7 @@
from rest_framework.status import HTTP_400_BAD_REQUEST
from rest_framework.views import APIView

from readthedocs.builds.constants import (
EXTERNAL_VERSION_STATE_CLOSED,
EXTERNAL_VERSION_STATE_OPEN,
LATEST,
)
from readthedocs.builds.constants import LATEST
from readthedocs.core.signals import webhook_bitbucket, webhook_github, webhook_gitlab
from readthedocs.core.views.hooks import (
build_branches,
Expand Down Expand Up @@ -190,7 +186,7 @@ def get_response_push(self, project, branches):

:param project: Project instance
:type project: Project
:param branches: List of branch names to build
:param branches: List of branch/tag names to build
:type branches: list(str)
"""
to_build, not_building = build_branches(project, branches)
Expand Down
6 changes: 4 additions & 2 deletions readthedocs/builds/tasks.py
Expand Up @@ -292,8 +292,10 @@ def sync_versions_task(project_pk, tags_data, branches_data, **kwargs):
Creates new Version objects for tags/branches that aren't tracked in the database,
and deletes Version objects for tags/branches that don't exists in the repository.

:param tags_data: List of dictionaries with ``verbose_name`` and ``identifier``.
:param branches_data: Same as ``tags_data`` but for branches.
:param tags_data: List of dictionaries with ``verbose_name`` and ``identifier``
Example: ("v1.0.0", "67a9035990f44cb33091026d7453d51606350519").
:param branches_data: Same as ``tags_data`` but for branches (branch name, branch identifier).
Example: ("latest", "main")
benjaoming marked this conversation as resolved.
Show resolved Hide resolved
:returns: `True` or `False` if the task succeeded.
"""
project = Project.objects.get(pk=project_pk)
Expand Down
6 changes: 3 additions & 3 deletions readthedocs/doc_builder/director.py
Expand Up @@ -209,10 +209,10 @@ def build(self):
def checkout(self):
"""Checkout Git repo and load build config file."""

log.info("Cloning repository.")
self.vcs_repository.update()

identifier = self.data.build_commit or self.data.version.identifier
log.info("Cloning and fetching.", identifier=identifier)
self.vcs_repository.update(identifier=identifier)
benjaoming marked this conversation as resolved.
Show resolved Hide resolved

log.info("Checking out.", identifier=identifier)
self.vcs_repository.checkout(identifier)

Expand Down
14 changes: 12 additions & 2 deletions readthedocs/projects/models.py
Expand Up @@ -995,7 +995,11 @@ def has_htmlzip(self, version_slug=LATEST, version_type=None):
)

def vcs_repo(
self, environment, version=LATEST, verbose_name=None, version_type=None
self,
environment,
version=LATEST,
verbose_name=None,
version_type=None,
):
"""
Return a Backend object for this project able to handle VCS commands.
Expand Down Expand Up @@ -1938,6 +1942,7 @@ def add_features(sender, **kwargs):
INDEX_FROM_HTML_FILES = 'index_from_html_files'

# Build related features
GIT_CLONE_FETCH_CHECKOUT_PATTERN = "git_clone_fetch_checkout"
benjaoming marked this conversation as resolved.
Show resolved Hide resolved
DONT_CREATE_INDEX = "dont_create_index"
HOSTING_INTEGRATIONS = "hosting_integrations"
NO_CONFIG_FILE_DEPRECATED = "no_config_file"
Expand Down Expand Up @@ -2066,7 +2071,12 @@ def add_features(sender, **kwargs):
"sources"
),
),

(
GIT_CLONE_FETCH_CHECKOUT_PATTERN,
_(
"Build: Use simplified and optimized git clone + git fetch + git checkout patterns"
),
),
(
DONT_CREATE_INDEX,
_(
Expand Down
4 changes: 2 additions & 2 deletions readthedocs/vcs_support/backends/bzr.py
Expand Up @@ -16,8 +16,8 @@ class Backend(BaseVCS):
supports_tags = True
fallback_branch = ''

def update(self):
super().update()
def update(self, identifier=None):
super().update(identifier=identifier)
benjaoming marked this conversation as resolved.
Show resolved Hide resolved
if self.repo_exists():
return self.up()
return self.clone()
Expand Down
135 changes: 123 additions & 12 deletions readthedocs/vcs_support/backends/git.py
Expand Up @@ -8,7 +8,7 @@
from git.exc import BadName, InvalidGitRepositoryError, NoSuchPathError
from gitdb.util import hex_to_bin

from readthedocs.builds.constants import EXTERNAL
from readthedocs.builds.constants import BRANCH, EXTERNAL, TAG
from readthedocs.config import ALL
from readthedocs.projects.constants import (
GITHUB_BRAND,
Expand Down Expand Up @@ -55,18 +55,122 @@ def _get_clone_url(self):
def set_remote_url(self, url):
return self.run('git', 'remote', 'set-url', 'origin', url)

def update(self):
"""Clone or update the repository."""
super().update()
if self.repo_exists():
self.set_remote_url(self.repo_url)
return self.fetch()
self.make_clean_working_dir()
# A fetch is always required to get external versions properly
def update(self, identifier=None):
"""
Clone or update the repository.

:param identifier: This is the optional identifier for git fetch - a branch or tag name.
PR references are generated automatically for certain Git providers.
benjaoming marked this conversation as resolved.
Show resolved Hide resolved
:return:
"""
super().update(identifier=identifier)

if self.use_clone_fetch_checkout_pattern():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this method and just put the line instead. It will be clear we are relying on a feature flag:

if self.project.has_feature(Feature.GIT_CLONE_FETCH_CHECKOUT_PATTERN):


# New behavior: Clone is responsible for skipping the operation if the
# repo is already cloned.
self.clone_ng()
# New behavior: No confusing return value. We are not using return values
# in the callers.
self.fetch_ng(identifier=identifier)
benjaoming marked this conversation as resolved.
Show resolved Hide resolved

else:
if self.repo_exists():
self.set_remote_url(self.repo_url)
return self.fetch()
self.make_clean_working_dir()
# A fetch is always required to get external versions properly
if self.version_type == EXTERNAL:
self.clone()
return self.fetch()
return self.clone()

def get_remote_fetch_reference(self, identifier):
"""
Gets a valid remote reference for the identifier.

This method is terrible. It decides how to treat the incoming identifier from
knowledge of how the caller (the build process) uses build data.

Build.identifier = a branch name (branches)
Build.identifier = commit (tags)
Build.identifier = commit (external versions)
Build.verbose_name = branch alias, e.g. latest (branches)
Build.verbose_name = tag name (tags)
Build.verbose_name = PR number (external versions)

:param identifier: Should be a branch or tag name when building branches or tags.
:return: A reference valid for fetch operation
"""
# Branches have the branch identifier set by the caller who instantiated the
# Git backend
if self.version_type == BRANCH:
return identifier
# Tags
if self.version_type == TAG:
return self.verbose_name
if self.version_type == EXTERNAL:
self.clone()
return self.fetch()
return self.clone()

# TODO: We should be able to resolve this without looking up in oauth registry
git_provider_name = self.project.git_provider_name

# TODO: Why are these our only patterns?
benjaoming marked this conversation as resolved.
Show resolved Hide resolved
if git_provider_name == GITHUB_BRAND:
return GITHUB_PR_PULL_PATTERN.format(id=self.verbose_name)
if self.project.git_provider_name == GITLAB_BRAND:
return GITLAB_MR_PULL_PATTERN.format(id=self.verbose_name)
benjaoming marked this conversation as resolved.
Show resolved Hide resolved

# This seems to be the default behavior when we don't know the remote
# reference for a PR/MR: Just fetch everything
# TODO: Provide more information about fetch operations without references
return None

def clone_ng(self):
benjaoming marked this conversation as resolved.
Show resolved Hide resolved
# If the repository is already cloned, we don't do anything.
# This is likely legacy, but we may want to be able to call .update()
# several times in the same build
benjaoming marked this conversation as resolved.
Show resolved Hide resolved
if self.repo_exists():
return

# --no-checkout: Makes it explicit what we are doing here. Nothing is checked out
# until it's explicitly done.
benjaoming marked this conversation as resolved.
Show resolved Hide resolved
# --depth 1: Shallow clone, fetch as little data as possible.
cmd = ["git", "clone", "--no-checkout", "--depth", "1", self.repo_url, "."]

code, stdout, stderr = self.run(*cmd)
return code, stdout, stderr

def fetch_ng(self, identifier):
"""Implementation for new clone+fetch+checkout pattern."""

# --force: Likely legacy, it seems to be irrelevant to this usage
# --tags: We need to fetch tags in order to resolve these references in the checkout
# --prune: Likely legacy, we don't expect a previous fetch command to have run
# --prune-tags: Likely legacy, we don't expect a previous fetch command to have run
# --tags: This flag was used in the previous approach such that all tags were fetched
benjaoming marked this conversation as resolved.
Show resolved Hide resolved
# in order to checkout a tag afterwards.
benjaoming marked this conversation as resolved.
Show resolved Hide resolved
# --depth: This flag should be made unnecessary, it's downloading commit data that's
# never used.
benjaoming marked this conversation as resolved.
Show resolved Hide resolved
cmd = [
"git",
"fetch",
"origin",
"--force",
"--tags",
"--prune",
"--prune-tags",
"--depth",
str(self.repo_depth),
]
remote_reference = self.get_remote_fetch_reference(identifier)

if remote_reference:
# TODO: We are still fetching the latest 50 commits.
# A PR might have another commit added after the build has started...
cmd.append(remote_reference)

code, stdout, stderr = self.run(*cmd)
return code, stdout, stderr

def repo_exists(self):
try:
Expand Down Expand Up @@ -150,6 +254,13 @@ def use_shallow_clone(self):
from readthedocs.projects.models import Feature
return not self.project.has_feature(Feature.DONT_SHALLOW_CLONE)

def use_clone_fetch_checkout_pattern(self):
"""Use the new and optimized clone / fetch / checkout pattern."""
from readthedocs.projects.models import Feature

# TODO: Remove the 'not'
return not self.project.has_feature(Feature.GIT_CLONE_FETCH_CHECKOUT_PATTERN)
benjaoming marked this conversation as resolved.
Show resolved Hide resolved

def fetch(self):
# --force lets us checkout branches that are not fast-forwarded
# https://github.com/readthedocs/readthedocs.org/issues/6097
Expand Down
4 changes: 2 additions & 2 deletions readthedocs/vcs_support/backends/hg.py
Expand Up @@ -12,8 +12,8 @@ class Backend(BaseVCS):
supports_branches = True
fallback_branch = 'default'

def update(self):
super().update()
def update(self, identifier=None):
super().update(identifier=identifier)
benjaoming marked this conversation as resolved.
Show resolved Hide resolved
if self.repo_exists():
return self.pull()
return self.clone()
Expand Down
4 changes: 2 additions & 2 deletions readthedocs/vcs_support/backends/svn.py
Expand Up @@ -26,8 +26,8 @@ def __init__(self, *args, **kwargs):
else:
self.base_url = self.repo_url

def update(self):
super().update()
def update(self, identifier=None):
super().update(identifier=identifier)
if self.repo_exists():
return self.up()
return self.co()
Expand Down
2 changes: 1 addition & 1 deletion readthedocs/vcs_support/base.py
Expand Up @@ -83,7 +83,7 @@ def make_clean_working_dir(self):
safe_rmtree(self.working_dir, ignore_errors=True)
self.check_working_dir()

def update(self):
def update(self, identifier=None):
"""
Update a local copy of the repository in self.working_dir.

Expand Down