-
Notifications
You must be signed in to change notification settings - Fork 258
Add comprehensive historical alias verification and fix 8 missing aliases #16295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This adds a new set of scripts to verify ALL historical paths in git history have proper aliases, catching missing aliases that the branch-diff verification misses (pre-reorg moves, multi-hop moves, cross-branch moves). New scripts: - verify-all-historical-aliases.py: Scans complete git history (6 months) for all file moves, checks both frontmatter aliases and S3 redirects - generate-historical-fixes.py: Parses verification output and generates fix data in JSON format - apply-historical-fixes.py: Applies fixes by updating frontmatter aliases Updated README.md with comprehensive documentation of the new workflow, including when to use historical verification vs branch verification. Fixes missing aliases from pre-reorg moves that happened on master before the documentation reorganization branch was created. Related: #16292, #16119 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
The original script missed files that were moved and significantly rewritten in the same commit (recorded as DELETE+ADD instead of RENAME). This updates the script to use git's -M30% flag, which detects renames even when files are only 30% similar. This now catches the missing OIDC files from the May 2024 documentation revamp where content was significantly changed during the move: - /docs/pulumi-cloud/access-management/oidc/provider/aws/ - /docs/pulumi-cloud/access-management/oidc/provider/azure/ - /docs/pulumi-cloud/access-management/oidc/provider/gcp/ - /docs/pulumi-cloud/access-management/oidc/provider/ (index) Results: - Before: 82 files with 82 missing aliases - After: 92 files with 93 missing aliases (+10 files, +11 aliases) Updated README to document the 30% similarity detection feature and why it's necessary for catching content revamps. Fixes #16292 (partially - provides tooling to detect the issue) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
The script was checking ALL branches (--all flag) including development branches that were never merged to master. This caused 70% false positives from paths that only existed during development and were never published. Example: The CamSoper/content-reorg development branch had paths like /docs/get-started/iac/gcp/ which were restructured to /docs/iac/get-started/gcp/ before merging. These dev-only paths don't need aliases because no external links were ever created to them. Changed from `--all` to `origin/master` to only check published history. Results: - Before: 93 missing aliases (65 false positives from /docs/get-started/*) - After: 21 missing aliases (all legitimate) - 77% reduction in false positives The 21 remaining are real issues including the OIDC provider files from the May 2024 documentation revamp. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Applied fixes from the comprehensive historical alias verification, adding 8 missing aliases that the branch-diff verification missed: Key fixes: - OIDC provider files (AWS, Azure, GCP, index) from May 2024 documentation revamp where files were significantly rewritten (33-50% similarity) - OIDC client kubernetes-eks file - Developer portals templates file (including expected path that never existed) - Deployments post-automation and API reference files These were found using git's 30% similarity detection to catch low-similarity renames (files moved and significantly rewritten in the same commit). Also updated README.md to document the critical importance of reviewing generated fixes for false positives before committing, including common patterns to watch for and how to remove incorrect aliases. Note: The original report found 21 missing aliases, but 13 were false positives from similarity matching (Doppler/Infisical files, CLI command pseudo-renames, etc.) and were manually removed. Related: #16292, #16119 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Pull Request ReviewThis PR adds comprehensive historical alias verification tooling and fixes 8 files with missing aliases. The implementation is solid and addresses real SEO issues. I've identified several minor issues that should be addressed before merging. Issues Found1. Grammar error in frontmatter (line 5)File: content/docs/idp/developer-portals/templates/_index.md:5 The meta_desc field contains a typo:
2. Inconsistent numbered list formatting in documentation (line 220)File: scripts/alias-verification/README.md:220 Per AGENTS.md, ordered lists must use "1." for all items to minimize diff noise. Lines 220-225 currently use sequential numbering (1, 1, 1, 1, 1, 1) but should consistently use "1." format as specified in the style guide. 3. Missing newline at end of filesFiles: All three new Python scripts Per AGENTS.md absolute prohibitions: "New files: Must always end with a newline." Please verify that these files end with a newline character:
Strengths
Recommendations
The core implementation is sound and the tooling will be valuable for preventing future SEO issues. The issues identified are minor and easily addressed. |
|
Your site preview for commit fd4af6a is ready! 🎉 http://www-testing-pulumi-docs-origin-pr-16295-fd4af6a6.s3-website.us-west-2.amazonaws.com. |
|
@claude Fix the issues you found, please. |
|
Claude finished @CamSoper's task —— View job Fixed the identified issues ✓All issues from the review have been addressed:
SummaryFixed:
Already correct (no changes needed):
Note about linting: Commit: 3b54b9c Job: https://github.com/pulumi/docs/actions/runs/12346789 |
Co-authored-by: Cam Soper <CamSoper@users.noreply.github.com>
|
Your site preview for commit 3b54b9c is ready! 🎉 http://www-testing-pulumi-docs-origin-pr-16295-3b54b9ce.s3-website.us-west-2.amazonaws.com. |
|
Your site preview for commit 1b24435 is ready! 🎉 http://www-testing-pulumi-docs-origin-pr-16295-1b24435c.s3-website.us-west-2.amazonaws.com. |
… must always end with a newline
….com/pulumi/docs into add-comprehensive-alias-verification
|
Your site preview for commit bf4a397 is ready! 🎉 http://www-testing-pulumi-docs-origin-pr-16295-bf4a3978.s3-website.us-west-2.amazonaws.com. |
|
Your site preview for commit f8763bf is ready! 🎉 http://www-testing-pulumi-docs-origin-pr-16295-f8763bf3.s3-website.us-west-2.amazonaws.com. |
Summary
Adds a comprehensive historical alias verification toolset that catches missing aliases the branch-diff verification missed, and applies fixes for 8 files with missing historical aliases.
Problem
After the documentation reorganization (PR #16119), we discovered missing aliases that the branch-diff verification script didn't catch:
/docs/pulumi-cloud/developer-portals/templates/missing/docs/pulumi-cloud/access-management/oidc/provider/aws/(and Azure, GCP) missingThe branch-diff verification script only checks files renamed in the current branch vs master, but misses:
Solution
Created a comprehensive historical verification toolset with 3 new scripts:
1.
verify-all-historical-aliases.pygit log --follow -M30%to track renames even when content changed significantlyorigin/masterto avoid false positives from development branchesKey features:
2.
generate-historical-fixes.py3.
apply-historical-fixes.pyResults
Verification Results:
Fixes Applied:
Files Fixed
OIDC Provider Files (May 2024 revamp - caught by 30% similarity)
deployments/deployments/oidc/_index.mddeployments/deployments/oidc/aws.md← Original issuedeployments/deployments/oidc/azure.mddeployments/deployments/oidc/gcp.mdOIDC Client File
administration/access-identity/oidc-client/kubernetes-eks.mdDeveloper Portals Templates ← Issue Link in Pulumi Cloud broken #16292
idp/developer-portals/templates/_index.mdDeployments/Reference Files
deployments/deployments/using/post-automation.mdreference/cloud-rest-api/deployments/_index.mdDocumentation
Updated
scripts/alias-verification/README.mdwith:Testing
Breaking Changes
None - only adds missing aliases (redirects)
Related Issues
🤖 Generated with Claude Code