Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build: Simplify and optimize git backend: New clone+fetch pattern #10430

Merged
merged 60 commits into from Jul 10, 2023
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
f8f35ac
New git clone/fetch/checkout behavior first steps
benjaoming Jun 13, 2023
9d616f0
Use `identifier` argument for `.update()`, add lots of comments for f…
benjaoming Jun 13, 2023
614963d
Add missing str() cast
benjaoming Jun 13, 2023
82178a3
Remove the `identifier` keyword since we are forced to use verbose_na…
benjaoming Jun 13, 2023
a45b42b
Revert "Remove the `identifier` keyword since we are forced to use ve…
benjaoming Jun 14, 2023
746d94b
We need the identifier for branches, but not for tags and PRs
benjaoming Jun 14, 2023
b3af824
Update some comments after noticing how `Build.identifier/.verbose_na…
benjaoming Jun 14, 2023
1f538b6
Merge branch 'main' of github.com:readthedocs/readthedocs.org into fe…
benjaoming Jun 14, 2023
04f3caf
lint
benjaoming Jun 14, 2023
2f8d346
Revert "Revert "Remove the `identifier` keyword since we are forced t…
benjaoming Jun 15, 2023
c634ca4
Use version_identifier as an instance property
benjaoming Jun 15, 2023
6725dbb
Update readthedocs/vcs_support/backends/git.py
benjaoming Jun 15, 2023
e186e7f
Updates to return values, comments and logging
benjaoming Jun 15, 2023
7fcf937
Merge branch 'feature/git-clone-fetch-checkout' of github.com:readthe…
benjaoming Jun 15, 2023
8d13787
Restore common
benjaoming Jun 15, 2023
a41e352
Stop fetching --tags and fetch the direct remote reference
benjaoming Jun 15, 2023
0cb043a
Appease linter: Use raise-from pattern
benjaoming Jun 15, 2023
e9322df
Hello darker my old friend
benjaoming Jun 15, 2023
b6ba466
Restore error handling for clone operation
benjaoming Jun 15, 2023
230e04f
Update a few comments, add a bit more raise-from pattern (actually tr…
benjaoming Jun 15, 2023
e3fcbf5
WIP: refine some comments, log calls and small things while testing l…
benjaoming Jun 15, 2023
abf41b9
New test class with Feature flag defined
benjaoming Jun 19, 2023
c2b3d44
Adds a tearDown method after discovering that tests were wrongly pass…
benjaoming Jun 19, 2023
84492ad
Merge branch 'main' of github.com:readthedocs/readthedocs.org into fe…
benjaoming Jun 19, 2023
9587844
Log a warning if we aren't specifying the version type when using Git…
benjaoming Jun 19, 2023
9dbc8c3
Update test cases for new git clone+fetch pattern
benjaoming Jun 19, 2023
d8cbc50
Git fetch change: Fetch branches to :refs/remotes/origin/<branch-name…
benjaoming Jun 19, 2023
15eab05
Overwrite tests to call clone_ng and fetch_ng, and rewrite some entirely
benjaoming Jun 19, 2023
c5abe46
Update update() docstring
benjaoming Jun 19, 2023
577bf51
Always use lsremote for GIT_CLONE_FETCH_CHECKOUT_PATTERN
benjaoming Jun 20, 2023
b507d02
Builds need to know about Version.machine=True for tags "stable" so t…
benjaoming Jun 20, 2023
52f071c
Conditionally log a warning on stable versions
benjaoming Jun 20, 2023
693eef9
linting
benjaoming Jun 20, 2023
bd98bf6
add "machine" to version API test case
benjaoming Jun 20, 2023
21dd16e
Use "refspec" terminology more consistently
benjaoming Jun 20, 2023
b1cea44
Add test case for un-named default branches
benjaoming Jun 20, 2023
2eba2f5
Don't have "--no-checkout" when we *need* to checkout the remote HEAD
benjaoming Jun 20, 2023
176f802
Add test case to verify that we can clone+fetch special tag "stable" …
benjaoming Jun 20, 2023
4191e11
Update readthedocs/vcs_support/backends/git.py
benjaoming Jun 20, 2023
acb7dd9
Remove unused logging
benjaoming Jun 20, 2023
3fe7486
Use `git fetch <commit hash>` for special "stable" versions
benjaoming Jun 20, 2023
33fa7da
Building an unnamed default branch now skips `git fetch`
benjaoming Jun 20, 2023
6431c00
Remove unnecessary logging, add a lot of comments to tests
benjaoming Jun 20, 2023
e42f902
Apply suggestions from @humitos code review
benjaoming Jun 22, 2023
c573736
Update readthedocs/projects/models.py
benjaoming Jun 22, 2023
4343fcf
Always clean up the presumed repo working dir after a test case
benjaoming Jun 26, 2023
f02c8ff
Move `machine` to `VersionAdminSerializer`
benjaoming Jun 26, 2023
ae064b8
Add precision to comment
benjaoming Jun 26, 2023
2f3e6ca
Use identifier and verbose_name that looks more like actual data woul…
benjaoming Jun 26, 2023
f2fc2b0
remove unused local reference in git fetch for "stable" tag
benjaoming Jun 26, 2023
6628d4f
Don't use --no-checkout, just checkout whatever is at the FETCH_HEAD
benjaoming Jun 26, 2023
5d5a6f5
Add more logging
benjaoming Jun 26, 2023
85ac0e4
Update comment
benjaoming Jun 26, 2023
f1d426b
Fix logic condition: Always include `vcs_repository.supports_lsremote`
benjaoming Jun 27, 2023
19eb282
Log a warning when we try to generate a pr/mr refspec for an unsuppor…
benjaoming Jun 27, 2023
5b10400
improve doc strings
benjaoming Jun 27, 2023
b007df9
Merge branch 'main' of github.com:readthedocs/readthedocs.org into fe…
benjaoming Jun 27, 2023
a2d7223
Remove TestGitBackendTwiceInARow
benjaoming Jun 27, 2023
41e1f09
Use real commit hash in test case
benjaoming Jun 27, 2023
98597f2
Merge branch 'main' of github.com:readthedocs/readthedocs.org into fe…
humitos Jul 5, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion common
benjaoming marked this conversation as resolved.
Show resolved Hide resolved
Submodule common updated 0 files
32 changes: 14 additions & 18 deletions readthedocs/api/v2/views/integrations.py
Expand Up @@ -15,11 +15,7 @@
from rest_framework.status import HTTP_400_BAD_REQUEST
from rest_framework.views import APIView

from readthedocs.builds.constants import (
EXTERNAL_VERSION_STATE_CLOSED,
EXTERNAL_VERSION_STATE_OPEN,
LATEST,
)
from readthedocs.builds.constants import LATEST
from readthedocs.core.signals import webhook_bitbucket, webhook_github, webhook_gitlab
from readthedocs.core.views.hooks import (
build_branches,
Expand Down Expand Up @@ -97,8 +93,8 @@ def post(self, request, project_slug):
if not Project.objects.is_active(self.project):
resp = {'detail': 'This project is currently disabled'}
return Response(resp, status=status.HTTP_406_NOT_ACCEPTABLE)
except Project.DoesNotExist:
raise NotFound('Project not found')
except Project.DoesNotExist as exc:
raise NotFound("Project not found") from exc
if not self.is_payload_valid():
log.warning('Invalid payload for project and integration.')
return Response(
Expand Down Expand Up @@ -190,7 +186,7 @@ def get_response_push(self, project, branches):

:param project: Project instance
:type project: Project
:param branches: List of branch names to build
:param branches: List of branch/tag names to build
:type branches: list(str)
"""
to_build, not_building = build_branches(project, branches)
Expand Down Expand Up @@ -365,7 +361,7 @@ def get_external_version_data(self):
return data
except KeyError as e:
key = e.args[0]
raise ParseError(f"Invalid payload. {key} is required.")
raise ParseError(f"Invalid payload. {key} is required.") from e

def is_payload_valid(self):
"""
Expand Down Expand Up @@ -500,8 +496,8 @@ def handle_webhook(self):
try:
branch = self._normalize_ref(self.data["ref"])
return self.get_response_push(self.project, [branch])
except KeyError:
raise ParseError('Parameter "ref" is required')
except KeyError as exc:
raise ParseError('Parameter "ref" is required') from exc

return None

Expand Down Expand Up @@ -581,7 +577,7 @@ def get_external_version_data(self):
return data
except KeyError as e:
key = e.args[0]
raise ParseError(f"Invalid payload. {key} is required.")
raise ParseError(f"Invalid payload. {key} is required.") from e

def handle_webhook(self):
"""
Expand Down Expand Up @@ -624,8 +620,8 @@ def handle_webhook(self):
try:
branch = self._normalize_ref(data["ref"])
return self.get_response_push(self.project, [branch])
except KeyError:
raise ParseError('Parameter "ref" is required')
except KeyError as exc:
raise ParseError('Parameter "ref" is required') from exc

if self.project.external_builds_enabled and event == GITLAB_MERGE_REQUEST:
if (
Expand Down Expand Up @@ -726,8 +722,8 @@ def handle_webhook(self):
)
log.debug("Triggered sync_versions.")
return self.sync_versions_response(self.project)
except KeyError:
raise ParseError('Invalid request')
except KeyError as exc:
raise ParseError("Invalid request") from exc
return None

def is_payload_valid(self):
Expand Down Expand Up @@ -809,8 +805,8 @@ def handle_webhook(self):
self.update_default_branch(default_branch)

return self.get_response_push(self.project, branches)
except TypeError:
raise ParseError('Invalid request')
except TypeError as exc:
raise ParseError("Invalid request") from exc

def is_payload_valid(self):
"""
Expand Down
12 changes: 10 additions & 2 deletions readthedocs/builds/tasks.py
Expand Up @@ -292,8 +292,16 @@ def sync_versions_task(project_pk, tags_data, branches_data, **kwargs):
Creates new Version objects for tags/branches that aren't tracked in the database,
and deletes Version objects for tags/branches that don't exists in the repository.

:param tags_data: List of dictionaries with ``verbose_name`` and ``identifier``.
:param branches_data: Same as ``tags_data`` but for branches.
:param tags_data: List of dictionaries with ``verbose_name`` and ``identifier``
Example: [
{"verbose_name": "v1.0.0",
"identifier": "67a9035990f44cb33091026d7453d51606350519"},
].
:param branches_data: Same as ``tags_data`` but for branches (branch name, branch identifier).
Example: [
{"verbose_name": "latest",
"identifier": "main"},
].
:returns: `True` or `False` if the task succeeded.
"""
project = Project.objects.get(pk=project_pk)
Expand Down
3 changes: 2 additions & 1 deletion readthedocs/doc_builder/director.py
Expand Up @@ -94,6 +94,7 @@ def setup_vcs(self):
environment=self.vcs_environment,
verbose_name=self.data.version.verbose_name,
version_type=self.data.version.type,
version_identifier=self.data.version.identifier,
)

# We can't do too much on ``pre_checkout`` because we haven't
Expand Down Expand Up @@ -209,7 +210,7 @@ def build(self):
def checkout(self):
"""Checkout Git repo and load build config file."""

log.info("Cloning repository.")
log.info("Cloning and fetching.")
self.vcs_repository.update()

identifier = self.data.build_commit or self.data.version.identifier
Expand Down
23 changes: 19 additions & 4 deletions readthedocs/projects/models.py
Expand Up @@ -995,7 +995,12 @@ def has_htmlzip(self, version_slug=LATEST, version_type=None):
)

def vcs_repo(
self, environment, version=LATEST, verbose_name=None, version_type=None
self,
environment,
version=LATEST,
verbose_name=None,
version_type=None,
version_identifier=None,
):
"""
Return a Backend object for this project able to handle VCS commands.
Expand All @@ -1014,8 +1019,12 @@ def vcs_repo(
repo = None
else:
repo = backend(
self, version, environment=environment,
verbose_name=verbose_name, version_type=version_type
self,
version,
environment=environment,
verbose_name=verbose_name,
version_type=version_type,
version_identifier=version_identifier,
)
return repo

Expand Down Expand Up @@ -1938,6 +1947,7 @@ def add_features(sender, **kwargs):
INDEX_FROM_HTML_FILES = 'index_from_html_files'

# Build related features
GIT_CLONE_FETCH_CHECKOUT_PATTERN = "git_clone_fetch_checkout"
benjaoming marked this conversation as resolved.
Show resolved Hide resolved
DONT_CREATE_INDEX = "dont_create_index"
HOSTING_INTEGRATIONS = "hosting_integrations"
NO_CONFIG_FILE_DEPRECATED = "no_config_file"
Expand Down Expand Up @@ -2066,7 +2076,12 @@ def add_features(sender, **kwargs):
"sources"
),
),

(
GIT_CLONE_FETCH_CHECKOUT_PATTERN,
_(
"Build: Use simplified and optimized git clone + git fetch + git checkout patterns"
),
),
(
DONT_CREATE_INDEX,
_(
Expand Down
133 changes: 127 additions & 6 deletions readthedocs/vcs_support/backends/git.py
Expand Up @@ -8,7 +8,7 @@
from git.exc import BadName, InvalidGitRepositoryError, NoSuchPathError
from gitdb.util import hex_to_bin

from readthedocs.builds.constants import EXTERNAL
from readthedocs.builds.constants import BRANCH, EXTERNAL, TAG
from readthedocs.config import ALL
from readthedocs.projects.constants import (
GITHUB_BRAND,
Expand All @@ -35,6 +35,11 @@ class Backend(BaseVCS):
repo_depth = 50

def __init__(self, *args, **kwargs):
# The version_identifier is a Version.identifier value passed from the build process.
# It has a special meaning since it's unfortunately not consistent, you need to be aware of
# exactly how and where to use this.
# See more in the .get_remote_fetch_reference() docstring
benjaoming marked this conversation as resolved.
Show resolved Hide resolved
self.version_identifier = kwargs.pop("version_identifier")
super().__init__(*args, **kwargs)
self.token = kwargs.get('token')
self.repo_url = self._get_clone_url()
Expand All @@ -56,8 +61,26 @@ def set_remote_url(self, url):
return self.run('git', 'remote', 'set-url', 'origin', url)

def update(self):
"""Clone or update the repository."""
"""
Clone or update the repository.

:param identifier: This is the optional identifier for git fetch - a branch or tag name.
PR references are generated automatically for certain Git providers.
benjaoming marked this conversation as resolved.
Show resolved Hide resolved
:return:
"""
super().update()
from readthedocs.projects.models import Feature
benjaoming marked this conversation as resolved.
Show resolved Hide resolved

# TODO: Remove the 'not'
if not self.project.has_feature(Feature.GIT_CLONE_FETCH_CHECKOUT_PATTERN):
# New behavior: Clone is responsible for calling .repo_exists() and
# .make_clean_working_dir()
self.clone_ng()

# TODO: We are still using return values in this function that are legacy.
# This should be either explained or removed.
return self.fetch_ng()

if self.repo_exists():
self.set_remote_url(self.repo_url)
return self.fetch()
Expand All @@ -68,6 +91,104 @@ def update(self):
return self.fetch()
return self.clone()

def get_remote_fetch_reference(self):
"""
Gets a valid remote reference for the identifier.

This method sits on top of a lot of legacy design.
It decides how to treat the incoming ``Version.identifier`` from
knowledge of how the caller (the build process) uses build data.

Version.identifier = a branch name (branches)
Version.identifier = commit (tags)
Version.identifier = commit (external versions)
Version.verbose_name = branch alias, e.g. latest (branches)
Version.verbose_name = tag name (tags)
Version.verbose_name = PR number (external versions)

:return: A reference valid for fetch operation
"""
# Branches have the branch identifier set by the caller who instantiated the
# Git backend
if self.version_type == BRANCH:
return self.version_identifier
# Tags
if self.version_type == TAG and self.verbose_name:
return f"refs/tags/{self.verbose_name}:refs/tags/{self.verbose_name}"

if self.version_type == EXTERNAL:
# TODO: We should be able to resolve this without looking up in oauth registry
git_provider_name = self.project.git_provider_name

# Remote reference for Git providers where pull request builds are supported
if git_provider_name == GITHUB_BRAND:
return GITHUB_PR_PULL_PATTERN.format(id=self.verbose_name)
if self.project.git_provider_name == GITLAB_BRAND:
return GITLAB_MR_PULL_PATTERN.format(id=self.verbose_name)
benjaoming marked this conversation as resolved.
Show resolved Hide resolved

def clone_ng(self):
benjaoming marked this conversation as resolved.
Show resolved Hide resolved
# If the repository is already cloned, we don't do anything.
# This is legacy from when cached the repository on disk,
# but we may want to be able to call .update()
# several times in the same build
if self.repo_exists():
return

# This seems to be required for test cases to work
self.make_clean_working_dir()

# --no-checkout: Makes it explicit what we are doing here. Nothing is checked out
# until it's explicitly done.
benjaoming marked this conversation as resolved.
Show resolved Hide resolved
# --depth 1: Shallow clone, fetch as little data as possible.
cmd = ["git", "clone", "--no-checkout", "--depth", "1", self.repo_url, "."]
try:
# TODO: Explain or remove the return value
code, stdout, stderr = self.run(*cmd)
return code, stdout, stderr
except RepositoryError as exc:
raise RepositoryError(RepositoryError.CLONE_ERROR()) from exc

def fetch_ng(self):
"""Implementation for new clone+fetch+checkout pattern."""

# --force: Likely legacy, it seems to be irrelevant to this usage
# --prune: Likely legacy, we don't expect a previous fetch command to have run
# --prune-tags: Likely legacy, we don't expect a previous fetch command to have run
# --tags: This flag was used in the previous approach such that all tags were fetched
benjaoming marked this conversation as resolved.
Show resolved Hide resolved
# in order to checkout a tag afterwards.
benjaoming marked this conversation as resolved.
Show resolved Hide resolved
# --depth: This flag should be made unnecessary, it's downloading commit data that's
# never used.
benjaoming marked this conversation as resolved.
Show resolved Hide resolved
cmd = [
"git",
"fetch",
"origin",
"--force",
"--prune",
"--prune-tags",
"--depth",
str(self.repo_depth),
]
remote_reference = self.get_remote_fetch_reference()

if remote_reference:
# TODO: We are still fetching the latest 50 commits.
# A PR might have another commit added after the build has started...
cmd.append(remote_reference)
else:
# We are doing a fetch without knowing the remote reference.
# This is expensive, so log the event.
log.info(
benjaoming marked this conversation as resolved.
Show resolved Hide resolved
"Git fetch: Could not decide a remote reference for version. Is it an empty default branch?",
project=getattr(self.project, "id", "unknown"),
benjaoming marked this conversation as resolved.
Show resolved Hide resolved
verbose_name=self.verbose_name,
version_type=self.version_type,
identifier=self.version_identifier,
benjaoming marked this conversation as resolved.
Show resolved Hide resolved
)

# TODO: Explain or remove the return value
code, stdout, stderr = self.run(*cmd)
return code, stdout, stderr

def repo_exists(self):
try:
self._repo
Expand Down Expand Up @@ -178,10 +299,10 @@ def checkout_revision(self, revision):
try:
code, out, err = self.run('git', 'checkout', '--force', revision)
return [code, out, err]
except RepositoryError:
except RepositoryError as exc:
raise RepositoryError(
RepositoryError.FAILED_TO_CHECKOUT.format(revision),
)
) from exc

def clone(self):
"""Clones the repository."""
Expand Down Expand Up @@ -213,8 +334,8 @@ def clone(self):
# )

return code, stdout, stderr
except RepositoryError:
raise RepositoryError(RepositoryError.CLONE_ERROR())
except RepositoryError as exc:
raise RepositoryError(RepositoryError.CLONE_ERROR()) from exc

def lsremote(self, include_tags=True, include_branches=True):
"""
Expand Down
12 changes: 6 additions & 6 deletions readthedocs/vcs_support/base.py
Expand Up @@ -100,16 +100,16 @@ def run(self, *cmd, **kwargs):

try:
build_cmd = self.environment.run(*cmd, **kwargs)
except BuildCancelled:
except BuildCancelled as exc:
# Catch ``BuildCancelled`` here and re raise it. Otherwise, if we
# raise a ``RepositoryError`` then the ``on_failure`` method from
# Celery won't treat this problem as a ``BuildCancelled`` issue.
raise BuildCancelled
except BuildUserError as e:
raise BuildCancelled from exc
except BuildUserError as exc:
# Re raise as RepositoryError to handle it properly from outside
if hasattr(e, "message"):
raise RepositoryError(e.message)
raise RepositoryError
if hasattr(exc, "message"):
raise RepositoryError(exc.message) from exc
raise RepositoryError from exc

# Return a tuple to keep compatibility
return (build_cmd.exit_code, build_cmd.output, build_cmd.error)
Expand Down