Skip to content

scripts: refactor tag parsing and code comment filtering#22825

Merged
ti-chi-bot[bot] merged 2 commits intopingcap:masterfrom
lilin90:update-tag-script
Apr 28, 2026
Merged

scripts: refactor tag parsing and code comment filtering#22825
ti-chi-bot[bot] merged 2 commits intopingcap:masterfrom
lilin90:update-tag-script

Conversation

@lilin90
Copy link
Copy Markdown
Member

@lilin90 lilin90 commented Apr 28, 2026

What is changed, added or deleted? (Required)

  • Improve HTML/tag detection and content filtering for Markdown files.
  • Add a TAG_PATTERN regex and use it for scanning; simplify stack_tag closing logic to only pop matching top-of-stack entries.
  • Replace ad-hoc frontmatter handling with a single regex, add filter_html_comments, and refactor filter_backticks into dedicated filter_fenced_code_blocks and filter_inline_code_spans helpers that preserve line counts and abort on unclosed fences.
  • Remove the old tag_is_wrapped logic and update the main loop to use the new filters and patterns to avoid false positives inside code blocks, inline code, and HTML comments.

Which TiDB version(s) do your changes apply to? (Required)

Tips for choosing the affected version(s):

By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.

For details, see tips for choosing the affected versions.

  • master (the latest development version)
  • v9.0 (TiDB 9.0 versions)
  • v8.5 (TiDB 8.5 versions)
  • v8.1 (TiDB 8.1 versions)
  • v7.5 (TiDB 7.5 versions)
  • v7.1 (TiDB 7.1 versions)
  • v6.5 (TiDB 6.5 versions)
  • v6.1 (TiDB 6.1 versions)

What is the related PR or file link(s)?

  • This PR is translated from:
  • Other reference link(s):

Do your changes match any of the following descriptions?

  • Delete files
  • Change aliases
  • Need modification after applied to another branch
  • Might cause conflicts after applied to another branch

Improve HTML/tag detection and content filtering for Markdown files. Add a TAG_PATTERN regex and use it for scanning; simplify stack_tag closing logic to only pop matching top-of-stack entries. Replace ad-hoc frontmatter handling with a single regex, add filter_html_comments, and refactor filter_backticks into dedicated filter_fenced_code_blocks and filter_inline_code_spans helpers that preserve line counts and abort on unclosed fences. Remove the old tag_is_wrapped logic and update the main loop to use the new filters and patterns to avoid false positives inside code blocks, inline code, and HTML comments.
@lilin90 lilin90 requested a review from qiancai April 28, 2026 07:46
@lilin90 lilin90 self-assigned this Apr 28, 2026
@ti-chi-bot ti-chi-bot Bot added missing-translation-status This PR does not have translation status info. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 28, 2026
@lilin90 lilin90 added the translation/no-need No need to translate this PR. label Apr 28, 2026
@ti-chi-bot ti-chi-bot Bot removed the missing-translation-status This PR does not have translation status info. label Apr 28, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors scripts/check-tags.py to improve the robustness of HTML tag checking in Markdown files. It replaces manual parsing with regex-based filtering for frontmatter and HTML comments and introduces a more sophisticated mechanism for handling fenced code blocks and inline code spans. Feedback includes ensuring closing code fences match the length of opening fences according to CommonMark specifications, allowing multi-line HTML tags in the regex pattern, and refining error messages to accurately identify the specific fence character used in unclosed blocks.

Comment thread scripts/check-tags.py
Comment thread scripts/check-tags.py Outdated
Comment thread scripts/check-tags.py Outdated
Make tag parsing more robust: tighten TAG_PATTERN to not span newlines in attributes, add TAG_NAME_PATTERN, and refactor stack_tag to extract tag names, correctly handle self-closing tags, and simplify open/close logic. Also relax fenced code block closing detection to accept any fence character repeated at least three times (preserving historical behavior) and update the error message to show the actual fence marker. Minor cleanup and retained debug traces.
@lilin90 lilin90 mentioned this pull request Apr 28, 2026
13 tasks
@ti-chi-bot ti-chi-bot Bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Apr 28, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 28, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-04-28 08:15:02.050421515 +0000 UTC m=+2672107.255781562: ☑️ agreed by qiancai.

@lilin90 lilin90 added the lgtm label Apr 28, 2026
@lilin90
Copy link
Copy Markdown
Member Author

lilin90 commented Apr 28, 2026

/approve

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 28, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lilin90

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added the approved label Apr 28, 2026
@ti-chi-bot ti-chi-bot Bot merged commit a16adf2 into pingcap:master Apr 28, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm needs-1-more-lgtm Indicates a PR needs 1 more LGTM. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. translation/no-need No need to translate this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants