Add checks for broken docs urls by carlosabadia · Pull Request #6448 · reflex-dev/reflex

carlosabadia · 2026-05-04T11:36:35Z

No description provided.

codspeed-hq · 2026-05-04T11:40:08Z

Merging this PR will not alter performance

✅ 17 untouched benchmarks
⏩ 2 skipped benchmarks¹

_{Comparing carlos/docs-links-ci (873b592) with main (70ab07e)}

2 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

greptile-apps · 2026-05-04T11:46:01Z

Greptile Summary

This PR adds a new GitHub Actions workflow and Python script that validate /docs/* Markdown links against the Reflex app's generated sitemap.xml, catching broken URLs and underscore-in-path violations before they reach production. The implementation is well-structured, correctly strips fragments/query strings before the underscore check, and ships good test coverage — including the fragment-underscore false-positive regression case from prior review.

The LINK_RE regex only handles double-quoted Markdown link titles (\"...\"), not the single-quoted ('...') or parenthesised ((...)) forms. Links like [text](/docs/foo 'My Title') would have the title text absorbed into raw, causing every such link to report a spurious "not found in sitemap" error.

Confidence Score: 4/5

Safe to merge after addressing the single-quoted title regex gap; otherwise the tool works correctly.

One P1 logic issue: single-quoted Markdown link titles are not stripped from the captured URL, causing false-positive "not found in sitemap" errors. All other logic (fragment/query stripping for the underscore check, sitemap prefix normalization, skip-dirs) is correct and well-tested.

docs/app/scripts/check_doc_links.py — specifically the LINK_RE constant on line 25.

Important Files Changed

Filename	Overview
.github/workflows/check_doc_links.yml	New CI workflow that builds the Reflex frontend to generate sitemap.xml, then runs the link-checker script; triggers on docs/*/.md, the script, and this file itself.
docs/app/scripts/check_doc_links.py	New script scanning .md files for /docs/* links and validating them against sitemap.xml; correctly strips fragment/query before underscore check, handles both /docs-prefixed and non-prefixed sitemaps.
docs/app/tests/test_doc_links.py	Comprehensive unit tests covering valid links, missing links, underscore detection, fragment handling, skip-dirs, and both sitemap prefix styles; includes the fragment-underscore false-positive regression test.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[GitHub Actions Trigger\npull_request / push to main\nwith docs path filter] --> B[Checkout & Setup Build Env\npython 3.14 + uv sync]
    B --> C[uv run reflex export\n--frontend-only --no-zip\nGenerates .web/public/sitemap.xml]
    C --> D[uv run python\nscripts/check_doc_links.py]
    D --> E[load_sitemap_paths\nParse sitemap.xml → set of normalized paths]
    D --> F[iter_md_files\nrglob *.md, skip SKIP_DIRS]
    F --> G[iter_md_links\nMatch LINK_RE on each line]
    G --> H{For each raw URL}
    H --> I{Underscore in path_only?}
    I -- Yes --> J[Append underscore error]
    I -- No --> K{sitemap_key in valid_paths?}
    J --> K
    K -- No --> L[Append not-found error]
    K -- Yes --> M[OK]
    L --> N{Any errors?}
    J --> N
    M --> N
    N -- Yes --> O[Print errors to stderr\nExit 1 → CI fails]
    N -- No --> P[Print success\nExit 0]

_{Reviews (2): Last reviewed commit: "updates" | Re-trigger Greptile}

masenf · 2026-05-04T17:45:33Z

@greptile-apps re-review

greptile-apps · 2026-05-04T17:49:28Z

+from pathlib import Path
+from urllib.parse import urlparse
+
+LINK_RE = re.compile(r"\]\(\s*(/docs(?=[/)#?\s])[^)]*?)(?:\s+\"[^\"]*\")?\s*\)")


The regex captures single-quoted Markdown link titles into raw. For a link like [text](/docs/foo 'My Title') the optional title group (?:\s+"[^"]*")? requires double quotes, so it won't strip the '...' text. Instead, [^)]*? absorbs the trailing space + title, making raw = "/docs/foo 'My Title'". The subsequent sitemap lookup then tries _strip_docs_prefix(_normalize("/docs/foo 'My Title'")) → /foo 'My Title' which is never in the sitemap, producing a spurious "not found" error for every single-quoted-title link in the docs.

Suggested change

LINK_RE = re.compile(r"\]\(\s*(/docs(?=[/)#?\s])[^)]*?)(?:\s+\"[^\"]*\")?\s*\)")

LINK_RE = re.compile(r"\]\(\s*(/docs(?=[/)#?\s])[^)]*?)(?:\s+(?:\"[^\"]*\"|'[^']*'|\([^)]*\)))?\s*\)")

Add checks for broken docs urls

2187840

carlosabadia requested review from a team and Alek99 as code owners May 4, 2026 11:36

carlosabadia mentioned this pull request May 4, 2026

ENG-9414: Remove hardcoded docs urls #6395

Closed

carlosabadia added the documentation Improvements or additions to documentation label May 4, 2026

greptile-apps Bot reviewed May 4, 2026

View reviewed changes

Comment thread docs/app/scripts/check_doc_links.py Outdated

Comment thread docs/app/tests/test_doc_links.py

updates

873b592

greptile-apps Bot reviewed May 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add checks for broken docs urls#6448

Add checks for broken docs urls#6448
carlosabadia wants to merge 2 commits intomainfrom
carlos/docs-links-ci

carlosabadia commented May 4, 2026

Uh oh!

codspeed-hq Bot commented May 4, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

masenf commented May 4, 2026

Uh oh!

greptile-apps Bot May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	LINK_RE = re.compile(r"\]\(\s(/docs(?=[/)#?\s])[^)]?)(?:\s+\"[^\"]\")?\s\)")
	LINK_RE = re.compile(r"\]\(\s(/docs(?=[/)#?\s])[^)]?)(?:\s+(?:\"[^\"]\"\|'[^']'\|\([^)]\)))?\s\)")

Conversation

carlosabadia commented May 4, 2026

Uh oh!

codspeed-hq Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Footnotes

Uh oh!

greptile-apps Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

masenf commented May 4, 2026

Uh oh!

greptile-apps Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codspeed-hq Bot commented May 4, 2026 •

edited

Loading

greptile-apps Bot commented May 4, 2026 •

edited

Loading