Skip to content

feat(build): port automated ms.date freshness checking from azure-nvidia-robotics-reference-architecture #968

@rezatnoMsirhC

Description

@rezatnoMsirhC

Summary

Port the comprehensive automated documentation freshness checking infrastructure implemented in Azure-Samples/azure-nvidia-robotics-reference-architecture to hve-core for broader reuse across Microsoft projects. The system detects and tracks stale markdown files based on ms.date frontmatter timestamps using PowerShell validation scripts, GitHub Actions workflows, and automated issue creation.

Context

Successfully implemented and merged in Azure-Samples/azure-nvidia-robotics-reference-architecture#448 to address issue #164. The implementation completed 57 commits with comprehensive testing, creating a production-ready solution with PowerShell validation scripts, Pester test coverage, GitHub Actions workflows, and automated issue tracking.

Workflow test results: Successfully validated in workflow run 22880060875, creating 10 GitHub issues for stale documentation files.

Implementation Overview

The architecture uses a dual-orchestrator pattern separating weekly comprehensive scans (hard-fail mode with GitHub notifications) from PR-scoped validation (soft-fail warnings). Artifact-based data flow connects the validation workflow to the issue automation workflow, creating GitHub issues for each stale file with idempotent duplicate detection via automation markers.

Key Components

  1. PowerShell Validation Script (Invoke-MsDateFreshnessCheck.ps1)

    • Parses YAML frontmatter from markdown files
    • Calculates age in days since ms.date timestamp
    • Generates dual outputs: JSON artifact + markdown summary
    • Supports changed-files-only mode via Git merge-base comparison
    • Configurable staleness threshold (default: 90 days)
  2. GitHub Actions Workflows

    • msdate-freshness-check.yml — Reusable workflow with configurable threshold, soft-fail mode, and changed-files-only filtering
    • weekly-validation.yml — Scheduled Monday 9 AM UTC scans with hard-fail mode triggering notifications
    • pr-validation.yml — Immediate developer feedback during code review without blocking merges
    • create-stale-docs-issues.yml — Issue automation creating/updating GitHub issues using automation markers
  3. Pester Test Suite

    • File discovery and exclusion patterns
    • YAML frontmatter parsing with date format validation
    • Report generation (JSON + markdown output formats)
    • Integration tests covering full processing pipeline
    • CI annotation verification using mock patterns
    • Environment isolation with conditional execution based on PowerShell-Yaml availability

Execution Contexts

  1. Weekly scans: Scheduled for Monday 9 AM UTC, checks all markdown files, uses hard-fail mode to trigger GitHub email notifications when stale files exceed threshold
  2. PR checks: Runs automatically on pull requests, checks only modified files, uses soft-fail mode to warn without blocking merges
  3. Manual dispatch: Available via GitHub Actions UI workflow_dispatch on both orchestrator workflows

Technical Learnings

PowerShell Parameter Binding Issues (30+ commits of debugging)

The commit history documents extensive iterative debugging of PowerShell 7.4.13 array parameter binding issues in GitHub Actions:

  • Root cause: PowerShell 7.4.13 bug on ubuntu-latest runners with [string[]] array defaults
  • Approaches tested: hashtable splatting, array splatting, string conversion, Invoke-Expression, direct parameter binding
  • Final solution: Moved parameter defaults outside param block to runtime assignment with [AllowNull()] and [AllowEmptyCollection()] attributes

This debugging pattern demonstrates environment-specific edge cases invisible in local development, highlighting the value of CI-based validation.

Files to Port

From PR #448 (9 files changed, 1185 additions):

Core Implementation

  • scripts/linting/Invoke-MsDateFreshnessCheck.ps1 — PowerShell validation script
  • scripts/tests/linting/Invoke-MsDateFreshnessCheck.Tests.ps1 — Pester test suite
  • .github/workflows/msdate-freshness-check.yml — Reusable workflow
  • .github/workflows/weekly-validation.yml — Weekly scheduled scan orchestrator
  • .github/workflows/pr-validation.yml — PR validation orchestrator (integration point)
  • .github/workflows/create-stale-docs-issues.yml — Issue automation workflow

Documentation

  • docs/contributing/documentation-maintenance.md — Updated with implementation details and remediation workflow

Supporting Configuration

  • .markdownlint-cli2.jsonc — Exclusion of logs/** directory from linting
  • .cspell/general-technical.txt — Added msdate to custom dictionary

Acceptance Criteria for hve-core Port

  • All PowerShell scripts ported to hve-core with repository-agnostic paths
  • Pester test suite passing in hve-core CI environment
  • GitHub Actions workflows generalized for reuse across Microsoft projects
  • Documentation created explaining integration steps for consuming repositories
  • Issue automation workflow templates provided with clear customization points
  • PowerShell parameter binding workarounds documented for future maintenance
  • Validation results artifact retention configurable (default: 30 days)
  • Staleness threshold configurable per repository (default: 90 days)
  • Test coverage matches or exceeds original implementation (comprehensive Pester suite)
  • Example integration provided (similar to hve-core instruction file patterns)

Validation Metrics from Original Implementation

Documentation Impact

  • No documentation changes needed
  • Documentation should be created explaining:
    • Integration steps for consuming repositories
    • Workflow customization parameters
    • PowerShell environment requirements
    • Issue automation marker conventions
    • Troubleshooting PowerShell parameter binding issues

Priority Justification

This infrastructure improves compliance with the OpenSSF documentation_current criterion through automated staleness detection. The manual approach of relying on contributor memory is error-prone. Successfully tested implementation with production validation results.

References

Additional Notes

The porting effort should focus on generalizing repository-specific paths and configuration while preserving the dual-orchestrator pattern and artifact-based data flow. The PowerShell parameter binding workarounds are critical for ubuntu-latest GitHub Actions runners and should be documented for future maintenance.

Issue automation markers (<!-- automation:stale-docs -->) enable idempotent duplicate detection across workflow runs. This pattern could be valuable for other hve-core automation scenarios.

Metadata

Metadata

Assignees

Labels

ci/cddocumentationImprovements or additions to documentationenhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions