Skip to content

Refactor sanitizer APIs to separate artifact identifiers from code identifiers#27584

Merged
pelikhan merged 5 commits intomainfrom
copilot/refactor-semantic-function-clustering-another-one
Apr 21, 2026
Merged

Refactor sanitizer APIs to separate artifact identifiers from code identifiers#27584
pelikhan merged 5 commits intomainfrom
copilot/refactor-semantic-function-clustering-another-one

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 21, 2026

This issue identified ambiguous sanitizer naming across pkg/workflow and pkg/stringutil, where similarly named functions serve different domains (hyphenated artifact/user-agent IDs vs underscore-based programming identifiers). This change clarifies those boundaries with explicit API names and docs.

  • Workflow sanitizer naming made domain-explicit

    • Renamed workflow.SanitizeIdentifier to workflow.SanitizeArtifactIdentifier.
    • Updated workflow-side call sites and tests to use the new name.
    • Clarified package-level and function-level docs to describe artifact/user-agent semantics.
  • Stringutil identifier sanitizer promoted and documented

    • Exported stringutil.sanitizeIdentifierName as stringutil.SanitizeIdentifierName.
    • Documented intended usage for programming-language identifiers and contrasted with workflow artifact sanitization.
    • Kept existing behavior intact for SanitizeParameterName and SanitizePythonVariableName by routing them through the exported helper.
  • Focused test updates

    • Renamed workflow sanitizer test to match the new API.
    • Added dedicated tests for stringutil.SanitizeIdentifierName to lock in underscore behavior, numeric-prefix handling, and optional extra allowed characters.
// workflow domain (hyphenated artifact/user-agent ID)
userAgent := workflow.SanitizeArtifactIdentifier("Weekly v2.0")
// "weekly-v2-0"

// code identifier domain (underscore-based variable/param ID)
param := stringutil.SanitizeIdentifierName("my-param", nil)
// "my_param"

Copilot AI changed the title [WIP] Refactor opportunities identified in pkg/ based on semantic clustering Refactor sanitizer APIs to separate artifact identifiers from code identifiers Apr 21, 2026
Copilot AI requested a review from pelikhan April 21, 2026 14:27
@pelikhan pelikhan marked this pull request as ready for review April 21, 2026 15:00
Copilot AI review requested due to automatic review settings April 21, 2026 15:00
@github-actions github-actions Bot mentioned this pull request Apr 21, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors sanitizer APIs to disambiguate workflow artifact/user-agent identifiers (hyphenated) from programming-language identifiers (underscore-based), with updated call sites, docs, and tests.

Changes:

  • Renamed workflow.SanitizeIdentifier to workflow.SanitizeArtifactIdentifier and updated workflow renderer + tests.
  • Exported stringutil.SanitizeIdentifierName (formerly private) and routed existing helpers through it.
  • Added unit tests and README documentation for the newly exported identifier sanitizer.
Show a summary per file
File Description
pkg/workflow/strings.go Renames and documents the workflow-domain sanitizer as SanitizeArtifactIdentifier.
pkg/workflow/mcp_renderer_github.go Updates user-agent fallback to use SanitizeArtifactIdentifier.
pkg/workflow/codex_engine_test.go Renames the workflow sanitizer test and updates assertions to the new API.
pkg/stringutil/sanitize.go Exports SanitizeIdentifierName and reroutes existing sanitizers through it.
pkg/stringutil/sanitize_test.go Adds dedicated tests for SanitizeIdentifierName.
pkg/stringutil/README.md Documents SanitizeIdentifierName in the package README.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (2)

pkg/workflow/strings.go:419

  • There are still references to the old name SanitizeIdentifier in repository docs/comments (e.g., pkg/workflow/strings.go:135,196, pkg/workflow/README.md, and the linked scratchpad docs). Since this function is now SanitizeArtifactIdentifier, these references should be updated to avoid confusing API guidance/search results.

This issue also appears on line 448 of the same file.

// SanitizeArtifactIdentifier sanitizes a workflow name to create a safe identifier
// suitable for use as a user agent string or similar context.
//
// This is a SANITIZE function (character validity pattern). Use this when creating
// identifiers that must be purely alphanumeric with hyphens, with no special characters
// preserved. Unlike SanitizeWorkflowName which preserves dots and underscores, this

pkg/workflow/strings.go:456

  • The debug log messages still say "identifier" even though the exported API is now domain-specific (SanitizeArtifactIdentifier). Consider updating the log messages to mention "artifact identifier" (and using %q for the input) to make debug output unambiguous and resilient to whitespace/newlines in names.
func SanitizeArtifactIdentifier(name string) string {
	stringsLog.Printf("Sanitizing identifier: %s", name)
	result := SanitizeName(name, &SanitizeOptions{
		PreserveSpecialChars: []rune{},
		TrimHyphens:          true,
		DefaultValue:         "github-agentic-workflow",
	})
	if result != name {
		stringsLog.Printf("Sanitized identifier: %s -> %s", name, result)
  • Files reviewed: 6/6 changed files
  • Comments generated: 3

Comment thread pkg/stringutil/sanitize.go Outdated
Comment thread pkg/stringutil/README.md Outdated
Comment thread pkg/workflow/mcp_renderer_github.go Outdated
@github-actions
Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

Test Quality Score: 90/100

Excellent test quality

Metric Value
New/modified tests analyzed 2
✅ Design tests (behavioral contracts) 2 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 2 (100%)
Duplicate test clusters 0
Test inflation detected ⚠️ Yes (sanitize_test.go: 35 added vs sanitize.go: 12 added, ratio 2.9:1)
🚨 Coding-guideline violations 0

Test Classification Details

View All Test Classifications
Test File Classification Notes
TestSanitizeIdentifierName pkg/stringutil/sanitize_test.go ✅ Design Table-driven, 3 cases including edge cases (numeric prefix, extra allowed chars)
TestSanitizeArtifactIdentifier pkg/workflow/codex_engine_test.go ✅ Design Renamed from TestSanitizeIdentifier; 10 cases including empty input and special-char-only edge cases

Test Detail Notes

TestSanitizeIdentifierName — New test covering the newly extracted SanitizeIdentifierName function. All three table rows exercise a distinct contract: default replacement, numeric-prefix handling (edge case), and opt-in extra-allowed characters. Verifies observable return values, not internals.

TestSanitizeArtifactIdentifier — Pure rename following the API refactor. Existing 10-row table remains intact with solid edge-case coverage: empty string, whitespace-only, special-character-only, and mixed separator inputs. No behavioral regression risk.


Flagged Items — Informational Only

⚠️ Test Inflation: pkg/stringutil/sanitize_test.go (informational)

35 lines added vs 12 lines in sanitize.go (ratio ≈ 2.9:1, threshold: 2:1). However, this is a table-driven test covering multiple distinct edge cases, which naturally produces verbose test code. The extra lines represent real behavioral coverage (3 separate contract-relevant scenarios), not padding. No action required.


Language Support

Tests analyzed:

  • 🐹 Go (*_test.go): 2 tests — both unit (//go:build !integration) ✅ build tags present on all files
  • 🟨 JavaScript (*.test.cjs, *.test.js): 0 tests changed

Verdict

Check passed. 0% of new tests are implementation tests (threshold: 30%). Both new/modified tests enforce behavioral contracts with good edge-case coverage.


📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

  • Assert on observable outputs, return values, or state changes
  • Cover error paths and boundary conditions
  • Would catch a behavioral regression if deleted
  • Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

  • Assert on internal function calls (mocking internals)
  • Only test the happy path with typical inputs
  • Break during legitimate refactoring even when behavior is correct
  • Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

References: §24729751175

🧪 Test quality analysis by Test Quality Sentinel · ● 395.8K ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 90/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). Both changed tests enforce behavioral contracts with solid edge-case coverage. Minor test inflation noted in sanitize_test.go (2.9:1 ratio) but justified by meaningful table-driven edge cases.

pelikhan and others added 3 commits April 21, 2026 08:12
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@pelikhan pelikhan merged commit d30542e into main Apr 21, 2026
15 of 17 checks passed
@pelikhan pelikhan deleted the copilot/refactor-semantic-function-clustering-another-one branch April 21, 2026 15:13
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 21, 2026

Smoke CI completed successfully!

@github-actions
Copy link
Copy Markdown
Contributor

✅ smoke-ci: safeoutputs CLI comment only run (24730350270)

Generated by Smoke CI for issue #27584 ·

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[refactor] Semantic Function Clustering: Refactoring Opportunities in pkg/

3 participants