Skip to content

workflow: optimize external link checks#22894

Merged
ti-chi-bot[bot] merged 3 commits into
pingcap:masterfrom
qiancai:optimize-link-check
May 15, 2026
Merged

workflow: optimize external link checks#22894
ti-chi-bot[bot] merged 3 commits into
pingcap:masterfrom
qiancai:optimize-link-check

Conversation

@qiancai
Copy link
Copy Markdown
Collaborator

@qiancai qiancai commented May 15, 2026

What is changed, added or deleted? (Required)

This PR optimizes the lychee link-check workflows as follows:

  • Updates the weekly full-repository link check to focus on external URLs and exclude file:// internal links.
  • Adds shared scripts to extract site href URLs and changed Markdown lines with link candidates.
  • Converts non-HTTP href values such as href="/tidbcloud/tidb-cloud-quickstart" into URLs based on DOCS_SITE_BASE_URL before checking them.
  • Changes the PR link check to scan only added/modified lines that contain link candidates, instead of every changed Markdown file.
  • Keeps lychee cache for the weekly full scan, but caches only successful 2xx responses so failed links are rechecked in later runs.
  • Adds ignore rules for bot-unfriendly or auth-gated external sites reported in recent link-check issues.

Benefits:

  • Reduces false positives from docs site route links that lychee previously treated as missing local files.
  • Makes PR checks much faster for broad edits that do not add or modify links, such as deleting aliases across many files.
  • Keeps full scheduled scans reasonably fast while still rechecking previously failed links.
  • Centralizes the href extraction logic so the workflow can be reused more easily in the Chinese docs repository by changing DOCS_SITE_BASE_URL.

Which TiDB version(s) do your changes apply to? (Required)

  • master (the latest development version)
  • v9.0 (TiDB 9.0 versions)
  • v8.5 (TiDB 8.5 versions)
  • v8.1 (TiDB 8.1 versions)
  • v7.5 (TiDB 7.5 versions)
  • v7.1 (TiDB 7.1 versions)
  • v6.5 (TiDB 6.5 versions)
  • v6.1 (TiDB 6.1 versions)

What is the related PR or file link(s)?

Do your changes match any of the following descriptions?

  • Delete files
  • Change aliases
  • Need modification after applied to another branch
  • Might cause conflicts after applied to another branch

@ti-chi-bot ti-chi-bot Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 15, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 15, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot Bot added missing-translation-status This PR does not have translation status info. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 15, 2026
@qiancai qiancai marked this pull request as ready for review May 15, 2026 02:44
@ti-chi-bot ti-chi-bot Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 15, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces two Perl scripts, extract-changed-markdown-lines.pl and extract-site-hrefs.pl, designed to automate the extraction and validation of links within documentation. It also updates the .lycheeignore file to include several new external domains and corrects a regex pattern for TiDB operator versions. Feedback on the extract-site-hrefs.pl script identifies critical issues including a global record separator conflict that affects file reading, incorrect resolution of relative links in subdirectories, and the need to ignore the file: protocol to prevent invalid URL generation.

Comment thread .github/scripts/extract-site-hrefs.pl Outdated
@qiancai qiancai changed the title optimize external link checks workflow: optimize external link checks May 15, 2026
@Oreoxmt
Copy link
Copy Markdown
Collaborator

Oreoxmt commented May 15, 2026

/cc @Oreoxmt

@ti-chi-bot ti-chi-bot Bot requested a review from Oreoxmt May 15, 2026 02:58
@qiancai qiancai self-assigned this May 15, 2026
@qiancai qiancai added the translation/no-need No need to translate this PR. label May 15, 2026
@ti-chi-bot ti-chi-bot Bot removed the missing-translation-status This PR does not have translation status info. label May 15, 2026
@qiancai qiancai added the needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. label May 15, 2026
@ti-chi-bot ti-chi-bot Bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label May 15, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 15, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-05-15 06:41:51.143239874 +0000 UTC m=+420679.676019183: ☑️ agreed by Oreoxmt.

@qiancai
Copy link
Copy Markdown
Collaborator Author

qiancai commented May 15, 2026

/approve

@qiancai qiancai added the lgtm label May 15, 2026
@ti-chi-bot ti-chi-bot Bot added the approved label May 15, 2026
@qiancai qiancai added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. and removed approved labels May 15, 2026
@ti-chi-bot ti-chi-bot Bot added the approved label May 15, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 15, 2026

@qiancai: You cannot manually add or delete the cherry pick approval state labels, only I and the tursted members have permission to do so. You can approve it in internal platform.

Details

In response to removing label named approved.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

Comment thread .github/workflows/link-fail-fast.yaml Outdated
@qiancai
Copy link
Copy Markdown
Collaborator Author

qiancai commented May 15, 2026

/approve

@qiancai qiancai removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 15, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 15, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: qiancai

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@qiancai qiancai added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels May 15, 2026
@ti-chi-bot ti-chi-bot Bot merged commit 145d861 into pingcap:master May 15, 2026
12 checks passed
@ti-chi-bot
Copy link
Copy Markdown
Member

In response to a cherrypick label: new pull request created to branch release-8.5: #22896.
But this PR has conflicts, please resolve them!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm needs-1-more-lgtm Indicates a PR needs 1 more LGTM. needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. translation/no-need No need to translate this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants