Skip to content

Conversation

@CamSoper
Copy link
Contributor

Summary

Adds a comprehensive historical alias verification toolset that catches missing aliases the branch-diff verification missed, and applies fixes for 8 files with missing historical aliases.

Problem

After the documentation reorganization (PR #16119), we discovered missing aliases that the branch-diff verification script didn't catch:

  1. Issue Link in Pulumi Cloud broken #16292: /docs/pulumi-cloud/developer-portals/templates/ missing
  2. OIDC files: /docs/pulumi-cloud/access-management/oidc/provider/aws/ (and Azure, GCP) missing

The branch-diff verification script only checks files renamed in the current branch vs master, but misses:

  • Pre-reorg moves: Files moved on master before the reorg branch was created
  • Multi-hop moves: Files moved multiple times (A→B→C where only B was aliased)
  • Low-similarity renames: Files significantly rewritten during moves (DELETE+ADD operations that are actually renames)

Solution

Created a comprehensive historical verification toolset with 3 new scripts:

1. verify-all-historical-aliases.py

  • Scans complete git history of every file (limited to past 6 months on master)
  • Uses git log --follow -M30% to track renames even when content changed significantly
  • Checks both frontmatter aliases AND S3 redirect files
  • Only checks origin/master to avoid false positives from development branches

Key features:

  • 30% similarity detection: Catches files moved and rewritten in same commit (May 2024 OIDC revamp: 33% similar)
  • Master-only checking: Prevents false positives from dev-only paths
  • S3 redirect awareness: Doesn't flag aliases already in redirect files

2. generate-historical-fixes.py

  • Parses verification output
  • Generates JSON with current + missing aliases for each file

3. apply-historical-fixes.py

  • Reads fix data
  • Updates frontmatter in place
  • Prompts for confirmation

Results

Verification Results:

  • Total markdown files scanned: 699
  • Files with historical moves: 227
  • Files with complete aliases: ✓ 206
  • Files with missing aliases: ❌ 21

Fixes Applied:

  • 21 missing aliases identified
  • 13 false positives removed (Doppler/Infisical, CLI command pseudo-renames)
  • 8 legitimate aliases added across 8 files

Files Fixed

  1. OIDC Provider Files (May 2024 revamp - caught by 30% similarity)

    • deployments/deployments/oidc/_index.md
    • deployments/deployments/oidc/aws.mdOriginal issue
    • deployments/deployments/oidc/azure.md
    • deployments/deployments/oidc/gcp.md
  2. OIDC Client File

    • administration/access-identity/oidc-client/kubernetes-eks.md
  3. Developer Portals TemplatesIssue Link in Pulumi Cloud broken #16292

    • idp/developer-portals/templates/_index.md
    • Added expected path that never existed in git (typo fix)
  4. Deployments/Reference Files

    • deployments/deployments/using/post-automation.md
    • reference/cloud-rest-api/deployments/_index.md

Documentation

Updated scripts/alias-verification/README.md with:

  • Complete workflow for comprehensive historical verification
  • Explanation of 30% similarity detection
  • Master-only checking rationale
  • Critical section on reviewing for false positives with examples
  • Comparison table: branch verification vs historical verification

Testing

  • ✅ Script successfully identified all missing aliases including the reported issues
  • ✅ 30% similarity detection caught the OIDC files (33-50% similar after revamp)
  • ✅ Master-only checking eliminated 70% false positives (65 dev-only paths)
  • ✅ All fixes applied successfully
  • ✅ Linting passed

Breaking Changes

None - only adds missing aliases (redirects)

Related Issues

🤖 Generated with Claude Code

CamSoper and others added 4 commits October 16, 2025 16:08
This adds a new set of scripts to verify ALL historical paths in git history
have proper aliases, catching missing aliases that the branch-diff verification
misses (pre-reorg moves, multi-hop moves, cross-branch moves).

New scripts:
- verify-all-historical-aliases.py: Scans complete git history (6 months)
  for all file moves, checks both frontmatter aliases and S3 redirects
- generate-historical-fixes.py: Parses verification output and generates
  fix data in JSON format
- apply-historical-fixes.py: Applies fixes by updating frontmatter aliases

Updated README.md with comprehensive documentation of the new workflow,
including when to use historical verification vs branch verification.

Fixes missing aliases from pre-reorg moves that happened on master before
the documentation reorganization branch was created.

Related: #16292, #16119

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
The original script missed files that were moved and significantly rewritten
in the same commit (recorded as DELETE+ADD instead of RENAME). This updates
the script to use git's -M30% flag, which detects renames even when files
are only 30% similar.

This now catches the missing OIDC files from the May 2024 documentation
revamp where content was significantly changed during the move:
- /docs/pulumi-cloud/access-management/oidc/provider/aws/
- /docs/pulumi-cloud/access-management/oidc/provider/azure/
- /docs/pulumi-cloud/access-management/oidc/provider/gcp/
- /docs/pulumi-cloud/access-management/oidc/provider/ (index)

Results:
- Before: 82 files with 82 missing aliases
- After: 92 files with 93 missing aliases (+10 files, +11 aliases)

Updated README to document the 30% similarity detection feature and
why it's necessary for catching content revamps.

Fixes #16292 (partially - provides tooling to detect the issue)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
The script was checking ALL branches (--all flag) including development
branches that were never merged to master. This caused 70% false positives
from paths that only existed during development and were never published.

Example: The CamSoper/content-reorg development branch had paths like
/docs/get-started/iac/gcp/ which were restructured to /docs/iac/get-started/gcp/
before merging. These dev-only paths don't need aliases because no external
links were ever created to them.

Changed from `--all` to `origin/master` to only check published history.

Results:
- Before: 93 missing aliases (65 false positives from /docs/get-started/*)
- After: 21 missing aliases (all legitimate)
- 77% reduction in false positives

The 21 remaining are real issues including the OIDC provider files from
the May 2024 documentation revamp.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Applied fixes from the comprehensive historical alias verification, adding
8 missing aliases that the branch-diff verification missed:

Key fixes:
- OIDC provider files (AWS, Azure, GCP, index) from May 2024 documentation
  revamp where files were significantly rewritten (33-50% similarity)
- OIDC client kubernetes-eks file
- Developer portals templates file (including expected path that never existed)
- Deployments post-automation and API reference files

These were found using git's 30% similarity detection to catch low-similarity
renames (files moved and significantly rewritten in the same commit).

Also updated README.md to document the critical importance of reviewing
generated fixes for false positives before committing, including common
patterns to watch for and how to remove incorrect aliases.

Note: The original report found 21 missing aliases, but 13 were false
positives from similarity matching (Doppler/Infisical files, CLI command
pseudo-renames, etc.) and were manually removed.

Related: #16292, #16119

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@claude
Copy link
Contributor

claude bot commented Oct 16, 2025

Pull Request Review

This PR adds comprehensive historical alias verification tooling and fixes 8 files with missing aliases. The implementation is solid and addresses real SEO issues. I've identified several minor issues that should be addressed before merging.


Issues Found

1. Grammar error in frontmatter (line 5)

File: content/docs/idp/developer-portals/templates/_index.md:5

The meta_desc field contains a typo:

  • Current: "Lean how to build template projects"
  • Should be: "Learn how to build template projects"

2. Inconsistent numbered list formatting in documentation (line 220)

File: scripts/alias-verification/README.md:220

Per AGENTS.md, ordered lists must use "1." for all items to minimize diff noise. Lines 220-225 currently use sequential numbering (1, 1, 1, 1, 1, 1) but should consistently use "1." format as specified in the style guide.


3. Missing newline at end of files

Files: All three new Python scripts

Per AGENTS.md absolute prohibitions: "New files: Must always end with a newline." Please verify that these files end with a newline character:

  • scripts/alias-verification/verify-all-historical-aliases.py
  • scripts/alias-verification/generate-historical-fixes.py
  • scripts/alias-verification/apply-historical-fixes.py

Strengths

  • Excellent documentation: The README.md additions clearly explain when and how to use the historical verification tools
  • Well-structured code: Python scripts follow good practices with proper error handling and user confirmations
  • Smart design choices:
    • 30% similarity detection catches content revamps
    • Master-only checking prevents false positives
    • S3 redirect awareness
  • Critical fixes: Addresses real SEO issues (Issue Link in Pulumi Cloud broken #16292) and catches missing aliases the branch-diff tool missed
  • Good .gitignore hygiene: Properly excludes generated files
  • Proper alias formatting: All updated markdown files use consistent YAML list formatting with proper indentation

Recommendations

  1. Fix the typo in meta_desc field (content/docs/idp/developer-portals/templates/_index.md:5)
  2. Verify that all new Python files end with newlines
  3. Consider running "make lint" to catch any additional formatting issues

The core implementation is sound and the tooling will be valuable for preventing future SEO issues. The issues identified are minor and easily addressed.

@pulumi-bot
Copy link
Collaborator

@CamSoper
Copy link
Contributor Author

@claude Fix the issues you found, please.

@claude
Copy link
Contributor

claude bot commented Oct 16, 2025

Claude finished @CamSoper's task —— View job


Fixed the identified issues ✓

All issues from the review have been addressed:

  • Read affected files to understand current state
  • Fix grammar error: "Lean" → "Learn" in meta_desc
  • Verify Python files for missing newlines (already correct)
  • Check numbered list formatting in README.md (already correct)
  • Commit and push changes

Summary

Fixed:

  • ✓ Grammar error in content/docs/idp/developer-portals/templates/_index.md:5: Changed "Lean how to build" to "Learn how to build"

Already correct (no changes needed):

  • ✓ All 3 Python files (verify-all-historical-aliases.py, generate-historical-fixes.py, apply-historical-fixes.py) already end with newlines
  • ✓ Numbered list in scripts/alias-verification/README.md already uses correct "1." format as specified in AGENTS.md

Note about linting:
I don't have permission to run make lint, but the change is minimal (single word fix) and follows the repository's style guide. The PR author can run linting if needed.

Commit: 3b54b9c


Job: https://github.com/pulumi/docs/actions/runs/12346789
Branch: add-comprehensive-alias-verification

Co-authored-by: Cam Soper <CamSoper@users.noreply.github.com>
@pulumi-bot
Copy link
Collaborator

@pulumi-bot
Copy link
Collaborator

@pulumi-bot
Copy link
Collaborator

@pulumi-bot
Copy link
Collaborator

@CamSoper CamSoper merged commit 8949e3a into master Oct 16, 2025
8 checks passed
@CamSoper CamSoper deleted the add-comprehensive-alias-verification branch October 16, 2025 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Link in Pulumi Cloud broken

4 participants