scripts: refactor tag parsing and code comment filtering#22825
scripts: refactor tag parsing and code comment filtering#22825ti-chi-bot[bot] merged 2 commits intopingcap:masterfrom
Conversation
Improve HTML/tag detection and content filtering for Markdown files. Add a TAG_PATTERN regex and use it for scanning; simplify stack_tag closing logic to only pop matching top-of-stack entries. Replace ad-hoc frontmatter handling with a single regex, add filter_html_comments, and refactor filter_backticks into dedicated filter_fenced_code_blocks and filter_inline_code_spans helpers that preserve line counts and abort on unclosed fences. Remove the old tag_is_wrapped logic and update the main loop to use the new filters and patterns to avoid false positives inside code blocks, inline code, and HTML comments.
There was a problem hiding this comment.
Code Review
This pull request refactors scripts/check-tags.py to improve the robustness of HTML tag checking in Markdown files. It replaces manual parsing with regex-based filtering for frontmatter and HTML comments and introduces a more sophisticated mechanism for handling fenced code blocks and inline code spans. Feedback includes ensuring closing code fences match the length of opening fences according to CommonMark specifications, allowing multi-line HTML tags in the regex pattern, and refining error messages to accurately identify the specific fence character used in unclosed blocks.
Make tag parsing more robust: tighten TAG_PATTERN to not span newlines in attributes, add TAG_NAME_PATTERN, and refactor stack_tag to extract tag names, correctly handle self-closing tags, and simplify open/close logic. Also relax fenced code block closing detection to accept any fence character repeated at least three times (preserving historical behavior) and update the error message to show the actual fence marker. Minor cleanup and retained debug traces.
[LGTM Timeline notifier]Timeline:
|
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lilin90 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What is changed, added or deleted? (Required)
Which TiDB version(s) do your changes apply to? (Required)
Tips for choosing the affected version(s):
By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.
For details, see tips for choosing the affected versions.
What is the related PR or file link(s)?
Do your changes match any of the following descriptions?