Skip to content

kb-validate: declarative excludes or warnings for optional/external markdown link targets #188

@omar-diop

Description

@omar-diop

Story Statement

As a knowledge base maintainer or developer running pair-cli / kb-validate on a KB-only repository
I want a declarative way (config file, CLI flags, and/or glob rules) to treat relative markdown links that point outside the KB tree as optional or external
So that the same KB can be validated in isolation and, when placed next to the full codebase, we do not have to disable broad checks (e.g. skipping entire registries) and lose validation on everything else

Where: CLI kb-validate, resolving internal markdown links under the directory passed to --path.

Epic Context

Parent Epic: TBD — link when known
Status: Refined
Priority: P1 (Should-Have)

Status Workflow

  • Refined: Story is detailed, estimated, and ready for development
  • In Progress: Story is actively being developed
  • Done: Story delivered and accepted

Acceptance Criteria

Functional Requirements

Given-When-Then Format:

  1. Given a pair.config.json (or config.json) with a link_validation.optional_link_patterns array containing glob patterns (e.g. ["../../apps/**", "../../packages/**"])
    When kb-validate encounters a relative internal link whose resolved path matches one of those patterns AND the target file does not exist on disk
    Then the link is reported as a warning (not an error), the overall validation run does NOT fail, and the CLI output clearly labels it as "optional link (pattern-matched)"

  2. Given a --optional-link-patterns CLI flag with comma-separated globs (e.g. --optional-link-patterns "../../apps/**,../../packages/**")
    When kb-validate encounters a missing internal link matching one of those globs
    Then behavior is identical to AC-1 (warning, not error), and CLI-provided patterns are merged with any config-file patterns

  3. Given a relative internal link that does NOT match any optional pattern
    When the target is missing under the KB root
    Then current behavior is preserved: the link is reported as an error and the run fails

  4. Given optional patterns configured (config and/or CLI) and --strict mode enabled
    When a pattern-matched link target is missing
    Then the link is treated as an error (strict overrides optional), preserving strict mode's guarantee of zero tolerance

  5. Given a link target that matches an optional pattern but the file DOES exist on disk
    When validation runs
    Then the link is treated as valid (no warning, no error) — the pattern only affects missing targets

  6. Given no link_validation config section and no --optional-link-patterns CLI flag
    When validation runs
    Then behavior is fully backward-compatible: all missing internal links are errors

Business Rules

  • Config-file patterns and CLI patterns are merged (union), not overridden
  • Patterns use glob syntax consistent with existing include patterns in pair.config.json
  • --strict mode always overrides optional treatment (errors, not warnings)
  • The link_validation config key is a new top-level key in pair.config.json / config.json, coexisting with asset_registries

Edge Cases and Error Handling

  • Overlapping patterns: Multiple patterns matching the same link produce a single warning (no duplicates). First-match semantics; documented.
  • Invalid glob syntax: CLI warns about malformed patterns and skips them (does not crash the run)
  • Empty pattern list: Treated as "no optional patterns" — fully backward-compatible
  • Anchor-only links: Continue to be skipped (existing behavior unchanged)
  • External (http/https) links: Not affected by optional patterns (existing behavior unchanged)

Definition of Done Checklist

Development Completion

  • All 6 acceptance criteria implemented and verified
  • Code follows project coding standards (ESLint, Prettier, TypeScript strict)
  • Code review completed and approved
  • Unit tests written and passing for link-checker optional pattern logic
  • Integration tests: handler passes optional patterns through to link-checker
  • CLI help text updated for --optional-link-patterns flag
  • kb-validate metadata updated with new option
  • Security: no path traversal beyond baseDir (glob matching on resolved paths)

Quality Assurance

  • All acceptance criteria tested with dedicated test cases
  • Edge cases tested (overlapping patterns, invalid globs, empty list, strict override)
  • Existing link-checker tests still pass (backward compatibility)
  • pnpm quality-gate passes for @pair/pair-cli
  • Smoke test scenario updated or extended

Deployment and Release

  • Feature merged to main
  • CLI command reference docs updated (apps/website/content/docs/reference/cli/commands.mdx)
  • Changeset entry created for version bump

Story Sizing and Sprint Readiness

Refined Story Points

Final Story Points: M (3 points)
Confidence Level: High
Sizing Justification: The link-checker is well-isolated (~200 LOC). Changes touch 4 files (link-checker, parser, handler, metadata) plus tests. Glob matching is a known pattern in the codebase (include fields). No new dependencies needed (minimatch or simple prefix matching). Config loader already supports arbitrary top-level keys via [key: string]: unknown.

Sprint Capacity Validation

Sprint Fit Assessment: Yes, fits in a single sprint
Development Time Estimate: 2-3 days
Testing Time Estimate: 1 day
Total Effort Assessment: Fits within sprint capacity: Yes

Story Splitting Recommendations

Not needed — story is M-sized and self-contained.

Dependencies and Coordination

Story Dependencies

Prerequisite Stories: None
Dependent Stories: None identified
Shared Components: link-checker.ts (owned by this story), config/loader.ts (read-only usage of existing merge logic)

Team Coordination

Development Roles Involved:

  • Backend/CLI: Link-checker logic, parser, handler, config loading
  • QA: Unit + integration tests, smoke test update

External Dependencies

Third-party Integrations: None
Infrastructure Requirements: None
Compliance Requirements: None

Validation and Testing Strategy

Acceptance Testing Approach

Testing Methods: Unit tests (vitest) for link-checker with InMemoryFileSystemService; integration test for handler wiring; existing smoke test scenario extended
Test Data Requirements: In-memory filesystem fixtures with missing links matching/not matching patterns
Environment Requirements: Standard dev environment (pnpm test --filter @pair/pair-cli)

User Validation

User Feedback Collection: Validate with KB-only repo scenario (knowledge-hub dataset validated standalone)
Success Metrics: Zero false-positive link errors on KB-only checkout with patterns configured; all real broken links still caught
Rollback Plan: Remove link_validation config key and --optional-link-patterns flag; backward-compatible by default

Notes and Additional Context

Refinement Session Insights: The Config interface already has [key: string]: unknown index signature, so adding link_validation requires no breaking change. The mergeConfigs function in loader.ts merges non-asset_registries keys via shallow copy, which works for this use case. Glob matching can use minimatch (already common in Node.js) or a simple prefix/startsWith check for the MVP — decision deferred to implementation.

Future Considerations: If more link validation options emerge (e.g., ignore-by-source-file, severity levels), the link_validation config section can grow. Consider extracting a LinkValidationConfig type.

Technical Analysis

Implementation Approach

Technical Strategy: Add a link_validation.optional_link_patterns config section to pair.config.json and a --optional-link-patterns CLI flag. The link-checker's validateInternalLink function gains pattern-matching logic: when a link target is missing and matches an optional pattern, it emits a warning instead of an error. Strict mode (--strict) overrides this to emit errors.

Key Components:

  • link-checker.ts: Core logic change — validateInternalLink checks optional patterns before classifying missing links
  • parser.ts: New optionalLinkPatterns field in KbValidateCommandConfig
  • handler.ts: Passes optional patterns from config + CLI to validateLinks
  • metadata.ts: New --optional-link-patterns option definition
  • config/loader.ts: No changes needed (existing merge handles arbitrary keys)

Data Flow:

  1. CLI parses --optional-link-patterns flag into string[]
  2. Handler loads config, extracts link_validation.optional_link_patterns (if present)
  3. Handler merges CLI patterns with config patterns (union)
  4. Merged patterns passed to validateLinks via extended LinkValidationOptions
  5. validateInternalLink resolves the link path, checks existence, and if missing, checks patterns
  6. Pattern match + missing = warning (or error if strict); no match + missing = error

Integration Points: pair.config.json / config.json (existing config loading), CLI Commander options

Technical Requirements

  • Glob matching for optional patterns (minimatch or picomatch — both zero-dep-ish, well-known)
  • Backward compatible: no link_validation key = existing behavior
  • --strict overrides optional treatment
  • Pattern matching on the resolved relative path from baseDir (not raw link text)

Technical Risks and Mitigation

Risk Impact Probability Mitigation Strategy
Glob library adds bundle size Low Low Use picomatch (~3KB) or simple prefix matching for MVP
Pattern matching on wrong path form (relative vs absolute) Medium Medium Match on path relative to baseDir; document and test both ./ and ../ patterns
Config merge drops nested objects Low Low link_validation is a simple object; existing shallow merge handles it; add test

Spike Requirements

Required Spikes: None — all technologies are well-understood in the codebase.


Refinement Completed By: AI agent (pair-process-refine-story)
Refinement Date: 2026-04-11
Review and Approval: Pending product owner review


Task Breakdown

  • T-1: Add optionalLinkPatterns to parser and CLI metadata
  • T-2: Extend LinkValidationOptions and validateInternalLink with optional pattern matching
  • T-3: Wire optional patterns through handler (config + CLI merge)
  • T-4: Unit tests for link-checker optional pattern logic
  • T-5: Update CLI docs and smoke test

Dependency Graph

T-1 ──┬── T-3 ── T-5
      │
T-2 ──┘
      │
T-4 ──┘

T-1 and T-2 are independent (parser vs link-checker). T-3 depends on both (wires them together). T-4 depends on T-2 (tests the link-checker). T-5 depends on T-3 (docs reflect final behavior).

AC Coverage

AC Tasks
AC-1 (config optional patterns → warning) T-2, T-3, T-4
AC-2 (CLI flag → merge with config) T-1, T-3, T-4
AC-3 (unmatched missing link → error) T-2, T-4
AC-4 (strict overrides optional) T-2, T-4
AC-5 (existing target + pattern → valid) T-2, T-4
AC-6 (no config → backward compatible) T-2, T-3, T-4

T-1: Add optionalLinkPatterns to parser and CLI metadata

Priority: P0 | Estimated Hours: 2h | Bounded Context: CLI command parsing

Summary: Add --optional-link-patterns CLI flag to kb-validate command parser and metadata.

Type: Feature Implementation

Description: Extend KbValidateCommandConfig with an optionalLinkPatterns?: string[] field. Add the --optional-link-patterns <patterns> option to Commander via metadata.ts. Parse the comma-separated string into an array in parser.ts, following the same pattern as --skip-registries.

Acceptance Criteria:

  • Primary deliverable: optionalLinkPatterns field in config type, parsed from CLI
  • Quality standard: TypeScript strict, follows existing parser conventions
  • Integration requirement: Field available for handler to consume
  • Verification method: Unit test in parser.test.ts

Technical Requirements:

  • Functionality: Comma-separated glob patterns parsed into string[]
  • Performance: N/A (parsing is trivial)
  • Security: No user input executed; patterns are data only

Implementation Approach:

  • Technical Design: Mirror skipRegistries pattern in parser
  • Bounded Context & Modules: CLI command parsing
  • Files to Modify/Create:
    • apps/pair-cli/src/commands/kb-validate/parser.ts — add field + parsing
    • apps/pair-cli/src/commands/kb-validate/metadata.ts — add option definition
    • apps/pair-cli/src/commands/kb-validate/parser.test.ts — add test cases
  • Technical Standards: code-design-guidelines

Dependencies:

  • Technical: None
  • Tasks: None (independent)

Implementation Steps:

  1. Add optionalLinkPatterns?: string[] to KbValidateCommandConfig interface
  2. Add optionalLinkPatterns?: string to ParseKbValidateOptions
  3. Parse comma-separated string into array (mirror parseSkipRegistriesOption pattern)
  4. Add --optional-link-patterns <patterns> to kbValidateMetadata.options
  5. Add unit tests for parsing (empty, single, multiple, whitespace trimming)

Testing Strategy:

  • Unit Tests: Parser correctly splits comma-separated patterns; empty string yields undefined
  • Integration Tests: N/A (parser is pure function)

Notes: Follow exact same pattern as --skip-registries for consistency.


T-2: Extend LinkValidationOptions and validateInternalLink with optional pattern matching

Priority: P0 | Estimated Hours: 4h | Bounded Context: Link validation engine

Summary: Add optional pattern matching to the link-checker so missing links matching patterns produce warnings instead of errors. Strict mode overrides.

Type: Feature Implementation

Description: Extend LinkValidationOptions with optionalLinkPatterns?: string[]. In validateInternalLink (or the caller validateFileLinks), when a link target is missing and its resolved path (relative to baseDir) matches an optional pattern, push to warnings instead of errors. When strict is true, always push to errors regardless of pattern match. Add a glob matching utility (picomatch or minimatch).

Acceptance Criteria:

  • Primary deliverable: Missing links matching patterns are warnings; unmatched are errors; strict overrides
  • Quality standard: Pure function, no side effects, glob matching on relative path from baseDir
  • Integration requirement: LinkValidationOptions extended; LinkValidationResult unchanged (already has warnings array)
  • Verification method: Unit tests in link-checker.test.ts

Technical Requirements:

  • Functionality: Glob pattern matching on resolved relative paths; strict override; union semantics
  • Performance: Pattern matching is O(patterns * links) — acceptable for KB sizes
  • Security: Pattern matching on resolved paths only; no path traversal beyond baseDir

Implementation Approach:

  • Technical Design: After validateInternalLink returns { valid: false }, check if the link's resolved path (relative to baseDir) matches any optional pattern. Use picomatch for glob matching. If match and not strict: warning. If match and strict: error. If no match: error (existing behavior).
  • Bounded Context & Modules: Link validation engine
  • Files to Modify/Create:
    • apps/pair-cli/src/commands/kb-validate/link-checker.ts — extend options, add pattern matching logic
  • Technical Standards: testing-strategy

Dependencies:

  • Technical: picomatch (or minimatch) — add as dependency to apps/pair-cli
  • Tasks: None (independent of T-1)

Implementation Steps:

  1. Add optionalLinkPatterns?: string[] to LinkValidationOptions
  2. Add picomatch (or minimatch) as a dependency
  3. In validateFileLinks, after detecting a missing internal link, resolve the link path relative to baseDir
  4. Check if the relative path matches any optional pattern via glob
  5. If matched and not strict: push to warnings with "Optional link (pattern-matched): ..."
  6. If matched and strict: push to errors with "Broken internal link: ..." (existing message)
  7. If not matched: push to errors (existing behavior)

Testing Strategy:

  • Unit Tests: See T-4 for comprehensive test cases
  • Integration Tests: N/A (tested via unit tests with InMemoryFileSystemService)

Notes: Pattern matching should be on the path relative to baseDir (e.g., ../../apps/foo/bar.md) not the absolute resolved path. This matches how users think about their links.


T-3: Wire optional patterns through handler (config + CLI merge)

Priority: P0 | Estimated Hours: 3h | Bounded Context: Command handler orchestration

Summary: Load link_validation.optional_link_patterns from config, merge with CLI-provided patterns, and pass to validateLinks.

Type: Feature Implementation

Description: In handleKbValidateCommand, after loading config via loadConfigWithOverrides, extract link_validation.optional_link_patterns (if present). Merge with config.optionalLinkPatterns from CLI (union, deduplicated). Pass merged patterns to validateLinks via the extended LinkValidationOptions.

Acceptance Criteria:

  • Primary deliverable: Config + CLI patterns merged and passed to link-checker
  • Quality standard: Defensive coding (missing config section = empty array); no duplicates
  • Integration requirement: Connects T-1 (parser) with T-2 (link-checker)
  • Verification method: Unit test in handler.test.ts

Technical Requirements:

  • Functionality: Read link_validation.optional_link_patterns from loaded config; merge with CLI patterns; pass to validateLinks
  • Performance: N/A
  • Security: Patterns are data only; no execution

Implementation Approach:

  • Technical Design: Extract config patterns via optional chaining on loaded config object. CLI patterns come from config.optionalLinkPatterns. Merge with [...new Set([...configPatterns, ...cliPatterns])]. Pass as optionalLinkPatterns in validateLinks call.
  • Bounded Context & Modules: Command handler
  • Files to Modify/Create:
    • apps/pair-cli/src/commands/kb-validate/handler.ts — extract config patterns, merge, pass through
    • apps/pair-cli/src/commands/kb-validate/handler.test.ts — add test for pattern merging
  • Technical Standards: code-design-guidelines

Dependencies:

  • Technical: None
  • Tasks: T-1 (parser provides CLI patterns), T-2 (link-checker accepts patterns)

Implementation Steps:

  1. After loadConfigWithOverrides, extract config.link_validation?.optional_link_patterns (typed as string[] | undefined)
  2. Get CLI patterns from config.optionalLinkPatterns (from parser)
  3. Merge: [...new Set([...(configPatterns || []), ...(cliPatterns || [])])]
  4. Pass optionalLinkPatterns to validateLinks call
  5. Add handler test: config patterns + CLI patterns merged correctly

Testing Strategy:

  • Unit Tests: Handler merges config + CLI patterns; empty config = CLI only; empty CLI = config only; both empty = undefined
  • Integration Tests: End-to-end flow tested via handler test with mocked fs and config

Notes: The Config interface has [key: string]: unknown index signature, so link_validation is accessible without type changes to the config loader.


T-4: Unit tests for link-checker optional pattern logic

Priority: P0 | Estimated Hours: 3h | Bounded Context: Testing

Summary: Comprehensive unit tests for all 6 acceptance criteria in link-checker.test.ts.

Type: Testing

Description: Add a new describe('optional link patterns') block in link-checker.test.ts covering: pattern-matched missing links produce warnings; unmatched missing links produce errors; strict overrides patterns; existing files with pattern match are valid; no patterns = backward-compatible; CLI + config merge (tested at handler level).

Acceptance Criteria:

  • Primary deliverable: All 6 ACs covered by at least one test each
  • Quality standard: InMemoryFileSystemService fixtures; clear test names mapping to ACs
  • Integration requirement: Tests validate the public validateLinks API
  • Verification method: pnpm --filter @pair/pair-cli test passes

Technical Requirements:

  • Functionality: Test cases for each AC and each edge case
  • Performance: Tests should be fast (in-memory FS)

Implementation Approach:

  • Technical Design: New describe block with sub-describes per AC
  • Bounded Context & Modules: Testing
  • Files to Modify/Create:
    • apps/pair-cli/src/commands/kb-validate/link-checker.test.ts — add optional pattern tests
  • Technical Standards: testing-strategy

Dependencies:

  • Technical: None
  • Tasks: T-2 (implementation to test)

Implementation Steps:

  1. Add describe('optional link patterns') block
  2. Test: missing link matching pattern → warning, not error (AC-1)
  3. Test: missing link NOT matching pattern → error (AC-3)
  4. Test: strict + matching pattern → error (AC-4)
  5. Test: existing file + matching pattern → valid, no warning (AC-5)
  6. Test: no patterns provided → backward compatible (AC-6)
  7. Test: multiple patterns, only one matches → warning
  8. Test: overlapping patterns → single warning (no duplicates)
  9. Test: invalid/empty patterns → no crash

Testing Strategy:

  • Unit Tests: This IS the unit test task
  • Integration Tests: Handler-level tests in T-3

Notes: Use the existing InMemoryFileSystemService and test helper patterns from the current test file.


T-5: Update CLI docs and smoke test

Priority: P1 | Estimated Hours: 2h | Bounded Context: Documentation and CI

Summary: Update CLI command reference docs and extend the kb-validate smoke test scenario.

Type: Documentation

Description: Update the website CLI commands reference (commands.mdx) to document --optional-link-patterns. Update the smoke test script to exercise the new flag. Create a changeset entry for the version bump.

Acceptance Criteria:

  • Primary deliverable: Docs reflect new flag; smoke test exercises it; changeset created
  • Quality standard: Docs match implementation; smoke test passes in CI
  • Integration requirement: Smoke test runs in CI pipeline
  • Verification method: pnpm smoke-tests passes; docs build succeeds

Technical Requirements:

  • Functionality: Accurate documentation of new flag behavior
  • Compatibility: Docs consistent with metadata.ts option definition

Implementation Approach:

  • Technical Design: Add --optional-link-patterns to CLI reference; add smoke test case with a KB fixture that has an intentionally-missing link matching a pattern
  • Bounded Context & Modules: Documentation, CI
  • Files to Modify/Create:
    • apps/website/content/docs/reference/cli/commands.mdx — add flag documentation
    • scripts/smoke-tests/scenarios/kb-validate.sh — add optional patterns test case
    • Changeset file via pnpm changeset
  • Technical Standards: collaboration templates

Dependencies:

  • Technical: None
  • Tasks: T-3 (feature complete before documenting)

Implementation Steps:

  1. Add --optional-link-patterns to CLI commands reference in commands.mdx
  2. Document behavior: glob patterns, comma-separated, merge semantics, strict override
  3. Add smoke test case: create temp KB with missing link + pattern → validate passes
  4. Run pnpm changeset to create version bump entry
  5. Verify pnpm smoke-tests passes

Testing Strategy:

  • Unit Tests: N/A
  • Integration Tests: Smoke test validates end-to-end CLI behavior
  • Manual Testing: Review docs page renders correctly

Notes: Smoke test should cover the happy path (pattern matches, warning not error) and the backward-compatible path (no patterns, error as before).

Metadata

Metadata

Assignees

No one assigned

    Labels

    user storyWork item representing a user story

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions