Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Git backend: Improve heuristic checking if GITLAB_MR_PULL_PATTERN is needed #10397

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
31 changes: 27 additions & 4 deletions readthedocs/vcs_support/backends/git.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,28 @@ def use_shallow_clone(self):
from readthedocs.projects.models import Feature
return not self.project.has_feature(Feature.DONT_SHALLOW_CLONE)

@staticmethod
def _is_trigger_integration_gitlab(project):
# If there is only one integration associated with the project, we
# can unambiguously conclude that the external build was triggered by
# a merge request.
#
# https://github.com/readthedocs/readthedocs.org/issues/9464

# To avoid error like the following, the value corresponding to
# Integration.GITLAB_WEBHOOK is hardcoded.
#
# ImportError: cannot import name 'Project' from partially initialized module
# 'readthedocs.projects.models' (most likely due to a circular import)
_gitlab_webhook = "gitlab_webhook" # Integration.GITLAB_WEBHOOK
Comment on lines +164 to +166
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's a circular import issue, we could import this constant from inside this static method to solve this problem.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I will update accordingly


if (
project.integrations.count() == 1
and project.integrations.first().integration_type == _gitlab_webhook
):
Comment on lines +168 to +171
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure to understand the context behind checking for .count() == 1 here? Isn't there another way to be 100% sure it was triggered by a merge request? Can't we check the body of the original request to decide this or similar?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have access to this webhook data during the build process?

The current behavior will guess the provider type by pattern matching the repo URL with supported oauth integrations.

It does seem a bit strange to have to look up which webhook integrations exist in order to resolve the Git provider's type in each and every build. But I'm not sure if there's currently much else available for the build process.

It could seem like a good approach to store the type of Git provider either when the webhook is added or when it's called. Or have a manually configured Project setting.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review 🙏

To provide some background, the current approach was added to minimize the amount of changes, avoid breaking existing workflow and slightly improve the support for self-hosted GitLab instance.

What is needed ?

Add support for "triggering context"

To properly support this, we would probably need to support querying details about the triggering context.

This information would then then be used to:

  • customize constants like GITLAB_URL, GITLAB_COMMIT_URL, GITLAB_MERGE_REQUEST_COMMIT_URL and GITLAB_MERGE_REQUEST_URL currently set in projects/constants.py 1
  • query the source of the trigger in vcs_support/backends/git.py 2

Support for updating build status

To support updating status on instance different from the official GitLab.com and GitHub.com, I think we following could be done:

  • the Integration model could be updated to have an additional field called api_token3 to be used in in oauth.services.base.(create_session|get_session) 4 itself called from oauth.services.gitlab.send_build_status 5
  • the integration details UI could then be updated to include a new "API Token" field

Footnotes

  1. https://github.com/readthedocs/readthedocs.org/blob/9.13.2/readthedocs/projects/constants.py#L368-L383

  2. https://github.com/readthedocs/readthedocs.org/blob/9.13.2/readthedocs/vcs_support/backends/git.py#L153-L172

  3. https://github.com/readthedocs/readthedocs.org/blob/9.13.2/readthedocs/integrations/models.py#L264-L327

  4. https://github.com/readthedocs/readthedocs.org/blob/9.13.2/readthedocs/oauth/services/base.py#L77-L116

  5. https://github.com/readthedocs/readthedocs.org/blob/9.13.2/readthedocs/oauth/services/gitlab.py#L522-L587

Copy link
Author

@jcfr jcfr Jun 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I 'm not sure to understand the context behind checking for .count() == 1 here?

Since it seems there is no way to have details about the triggering context1, the proposed patch is done by assuming that:

  • if there is only one integration
  • and it the integration version type is EXTERNAL
  • and if the integration type is Integration.GITLAB_WEBHOOK
  • and if it is NOT coming from the gitlab.com provider

In that specific case, it is then "safe" to consider it was triggered by a merge request coming from a GitLab hosted instance.

Hence the function _is_trigger_integration_gitlab in this PR

Isn't there another way to be 100% sure it was triggered by a merge request?

Given the current implementation, I didn't find a way.

Can't we check the body of the original request to decide this or similar?

This could indeed be done where the webhook is handled2 by looking for X-Gitlab-Instance (e.g X-Gitlab-Instance: https://gitlab.kitware.com).

Possible patch to readthedocs/api/v2/views/integrations.py for adding get_external_provider_url()
diff --git a/readthedocs/api/v2/views/integrations.py b/readthedocs/api/v2/views/integrations.py
index 3ff940dd9..c610b7e71 100644
--- a/readthedocs/api/v2/views/integrations.py
+++ b/readthedocs/api/v2/views/integrations.py
@@ -50,6 +50,7 @@ GITLAB_MERGE_REQUEST_MERGE = "merge"
 GITLAB_MERGE_REQUEST_OPEN = "open"
 GITLAB_MERGE_REQUEST_REOPEN = "reopen"
 GITLAB_MERGE_REQUEST_UPDATE = "update"
+GITLAB_INSTANCE_HEADER= "HTTP_X_GITLAB_INSTANCE"
 GITLAB_TOKEN_HEADER = "HTTP_X_GITLAB_TOKEN"
 GITLAB_PUSH = "push"
 GITLAB_NULL_HASH = "0" * 40
@@ -138,6 +139,10 @@ class WebhookMixin:
         """Handle webhook payload."""
         raise NotImplementedError
 
+    def get_external_provider_url(self):
+        """Get External provider URL from payload header."""
+        raise NotImplementedError
+
     def get_external_version_data(self):
         """Get External Version data from payload."""
         raise NotImplementedError
@@ -237,6 +242,7 @@ class WebhookMixin:
         :param project: Project instance
         :type project: readthedocs.projects.models.Project
         """
+        provider_url = self.get_external_provider_url()
         version_data = self.get_external_version_data()
         # create or get external version object using `verbose_name`.
         external_version = get_or_create_external_version(
@@ -569,6 +575,10 @@ class GitLabWebhookView(WebhookMixin, APIView):
             return False
         return token == secret
 
+    def get_external_provider_url(self):
+        """Get External provider URL from payload header."""
+        return self.request.META.get(GITLAB_INSTANCE_HEADER, "https://gitlab.com")
+
     def get_external_version_data(self):
         """Get commit SHA and merge request number from payload."""
         try:

What is not clear is how to pass the all the way down to the git backend. Hence the discussion above to support passing down the triggering context details.

Footnotes

  1. That information does not seem to be associated with any of the objects currently available from vcs_support.backends.git.fetch() (see https://github.com/readthedocs/readthedocs.org/blob/9.13.2/readthedocs/vcs_support/base.py#L57-L75 and https://github.com/readthedocs/readthedocs.org/blob/9.13.2/readthedocs/vcs_support/backends/git.py)

  2. https://github.com/readthedocs/readthedocs.org/blob/9.13.2/readthedocs/api/v2/views/integrations.py.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, it would be great to have some tests cases for this.

Ditto. Not being familiar with the code base, I would appreciate some guidance (e.g similar to test to use an example)

return True
return False

def fetch(self):
# --force lets us checkout branches that are not fast-forwarded
# https://github.com/readthedocs/readthedocs.org/issues/6097
Expand All @@ -166,10 +188,11 @@ def fetch(self):
GITHUB_PR_PULL_PATTERN.format(id=self.verbose_name)
)

if self.project.git_provider_name == GITLAB_BRAND:
cmd.append(
GITLAB_MR_PULL_PATTERN.format(id=self.verbose_name)
)
if (
self.project.git_provider_name == GITLAB_BRAND
or self._is_trigger_integration_gitlab(self.project)
):
cmd.append(GITLAB_MR_PULL_PATTERN.format(id=self.verbose_name))

code, stdout, stderr = self.run(*cmd)
return code, stdout, stderr
Expand Down