Skip to content

refactor: eliminate near-duplicate string utilities, promote general-purpose helpers to pkg/stringutil#30249

Merged
pelikhan merged 4 commits intomainfrom
copilot/refactor-semantic-function-clustering-another-one
May 5, 2026
Merged

refactor: eliminate near-duplicate string utilities, promote general-purpose helpers to pkg/stringutil#30249
pelikhan merged 4 commits intomainfrom
copilot/refactor-semantic-function-clustering-another-one

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 5, 2026

Addresses three findings from semantic function clustering analysis: one confirmed near-duplicate, two general-purpose utilities misplaced in the workflow package.

Changes

Remove extractStringSlice duplicate (compiler_experiments.go)

  • Replaced both extractStringSlice(raw) calls with parseStringSliceAny(raw, nil) — the canonical version already in parse_helpers.go
  • Deleted extractStringSlice; the canonical version also handles nil input and pre-allocates the result slice

Promote formatListstringutil.FormatList

  • Moved from private workflow/strings.go to exported pkg/stringutil/stringutil.go
  • Fixed a latent recursion bug: 3-item input previously produced "a and b, and c" instead of "a, b, and c"; replaced recursion with strings.Join:
    // before (recursive, buggy for 3+ items)
    return fmt.Sprintf("%s, and %s", formatList(items[:len(items)-1]), items[len(items)-1])
    
    // after (correct Oxford comma)
    return strings.Join(items[:len(items)-1], ", ") + ", and " + items[len(items)-1]
  • Updated tools_validation_github_toolsets.go to use stringutil.FormatList

Promote normalizeLeadingWhitespacestringutil.NormalizeLeadingWhitespace

  • Moved from private unified_prompt_step.go to exported pkg/stringutil/stringutil.go
  • Fixed indentation measurement to use strings.TrimLeft(line, " \t") instead of strings.TrimLeft(line, " ") so that tab-indented and mixed tab/space indented content is correctly dedented
  • Updated both call sites in unified_prompt_step.go and the existing test

Tests

  • Added TestFormatList and TestNormalizeLeadingWhitespace to pkg/stringutil/stringutil_test.go, including tab-indentation and mixed-indentation test cases

No change needed

  • close_entity_helpers.go already contains the architectural comment explaining the registry-vs-per-file choice

Copilot AI and others added 2 commits May 5, 2026 01:09
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/df78e305-93b0-4605-b9bc-451845689bfc

Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>
…malizeLeadingWhitespace to pkg/stringutil

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/df78e305-93b0-4605-b9bc-451845689bfc

Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>
Copilot AI changed the title [WIP] Refactor semantic function clustering analysis for outliers and duplicates refactor: eliminate near-duplicate string utilities, promote general-purpose helpers to pkg/stringutil May 5, 2026
Copilot AI requested a review from gh-aw-bot May 5, 2026 01:24
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

Hey @copilot-swe-agent 👋 — great work on the string-utility refactoring! Eliminating the extractStringSlice duplicate, fixing the latent Oxford-comma recursion bug in formatList, and promoting both helpers to pkg/stringutil are all solid, well-described changes. Tests are included too — nice.

One thing worth tidying up before merge:

  • Unrelated files in the diff — the PR includes changes to .github/workflows/metrics-collector.lock.yml, docs/src/content/docs/agent-factory-status.mdx, and docs/src/content/docs/reference/frontmatter-full.md, none of which are explained in the PR body. If these were auto-generated side-effects of running the pipeline, a brief note in the description would clarify intent; if they were picked up unintentionally, they should be reverted or split into a separate PR.

If you'd like a hand cleaning this up, here's a ready-to-run prompt:

In PR #30249 (refactor: eliminate near-duplicate string utilities), three files appear to be unrelated to the string-utility changes:
- .github/workflows/metrics-collector.lock.yml
- docs/src/content/docs/agent-factory-status.mdx
- docs/src/content/docs/reference/frontmatter-full.md

Investigate whether these files were changed intentionally as part of this refactor:
1. If they are auto-generated artifacts that were unintentionally picked up, revert them to their state on the base branch.
2. If they are intentional updates unrelated to the refactor, move them to a separate PR.
3. If they are legitimately coupled to the refactor, add a brief explanation to the PR body describing why.

Generated by Contribution Check · ● 1.4M ·

@pelikhan pelikhan marked this pull request as ready for review May 5, 2026 01:27
Copilot AI review requested due to automatic review settings May 5, 2026 01:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors duplicate/general-purpose string helpers into pkg/stringutil, updates workflow call sites, and removes one duplicate experiment slice parser. The diff also includes some unrelated docs/workflow metadata edits outside the stated refactor scope.

Changes:

  • Moved formatList and normalizeLeadingWhitespace into pkg/stringutil and updated workflow usages/tests.
  • Replaced duplicate extractStringSlice calls in experiment parsing with the canonical parseStringSliceAny.
  • Also changed OTLP reference docs, agent-factory status docs, and one generated workflow lockfile.
Show a summary per file
File Description
pkg/workflow/unified_prompt_step.go Swaps local whitespace normalization calls to stringutil.NormalizeLeadingWhitespace.
pkg/workflow/unified_prompt_step_test.go Updates the prompt-step test to call the moved helper.
pkg/workflow/tools_validation_github_toolsets.go Replaces local list formatting with stringutil.FormatList in validation errors.
pkg/workflow/strings.go Removes the old private formatList implementation.
pkg/workflow/compiler_experiments.go Reuses parseStringSliceAny instead of the deleted duplicate slice parser.
pkg/stringutil/stringutil.go Adds exported FormatList and NormalizeLeadingWhitespace.
pkg/stringutil/stringutil_test.go Adds direct tests for the promoted string utilities.
docs/src/content/docs/reference/frontmatter-full.md Expands OTLP endpoint reference docs; appears unrelated to the refactor.
docs/src/content/docs/agent-factory-status.mdx Adds three agent-factory table rows; appears unrelated to the refactor.
.github/workflows/metrics-collector.lock.yml Adds one workflow env version line in a generated lockfile.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 10/10 changed files
  • Comments generated: 3

Comment thread pkg/stringutil/stringutil.go Outdated
Comment on lines +95 to +101
// Find minimum leading whitespace (excluding empty lines)
minLeadingSpaces := -1
for _, line := range lines {
if strings.TrimSpace(line) == "" {
continue // Skip empty lines
}
leadingSpaces := len(line) - len(strings.TrimLeft(line, " "))
Comment on lines +6031 to +6072
# OTLP endpoint configuration. Accepts a plain URL string (backward-compat), a
# single {url, headers} object, or an array of {url, headers} objects for
# multi-endpoint concurrent fan-out. Encoded as GH_AW_OTLP_ENDPOINTS (JSON array).
# (optional)
# This field supports multiple formats (oneOf):

# Option 1: OTLP collector endpoint URL (e.g. 'https://traces.example.com:4317').
# Supports GitHub Actions expressions such as ${{ secrets.OTLP_ENDPOINT }}. When a
# static URL is provided, its hostname is automatically added to the network
# firewall allowlist.
endpoint: "example-value"

# Option 2: A single OTLP endpoint with a URL and optional per-endpoint headers.
endpoint:
# OTLP collector endpoint URL (e.g. 'https://traces.example.com:4317'). Supports
# GitHub Actions expressions such as ${{ secrets.OTLP_ENDPOINT }}. When a static
# URL is provided, its hostname is automatically added to the network firewall
# allowlist.
url: "example-value"

# (optional)
# This field supports multiple formats (oneOf):

# Option 1: Map of HTTP header names to values. Values support GitHub Actions
# expressions such as ${{ secrets.TOKEN }}.
headers:
{}

# Option 2: Deprecated: use the map form instead. Comma-separated list of
# key=value HTTP headers (e.g. 'Authorization=Bearer <token>'). Supports GitHub
# Actions expressions such as ${{ secrets.OTLP_HEADERS }}.
headers: "example-value"

# Option 3: Multiple OTLP collector endpoints to export traces to concurrently.
# Each entry has its own URL and optional per-endpoint headers.
endpoint: []
# Array items: A single OTLP endpoint with a URL and optional per-endpoint
# headers.

# HTTP headers for the backward-compat string endpoint form. Only used when
# endpoint is a plain string; object/array endpoint entries carry their own
# per-endpoint headers.
| [Daily File Diet](https://github.com/github/gh-aw/blob/main/.github/workflows/daily-file-diet.md) | copilot | [![Daily File Diet](https://github.com/github/gh-aw/actions/workflows/daily-file-diet.lock.yml/badge.svg)](https://github.com/github/gh-aw/actions/workflows/daily-file-diet.lock.yml) | `daily around 13:00 on weekdays` | - |
| [Daily Firewall Logs Collector and Reporter](https://github.com/github/gh-aw/blob/main/.github/workflows/daily-firewall-report.md) | copilot | [![Daily Firewall Logs Collector and Reporter](https://github.com/github/gh-aw/actions/workflows/daily-firewall-report.lock.yml/badge.svg)](https://github.com/github/gh-aw/actions/workflows/daily-firewall-report.lock.yml) | - | - |
| [Daily Go Function Namer](https://github.com/github/gh-aw/blob/main/.github/workflows/daily-function-namer.md) | claude | [![Daily Go Function Namer](https://github.com/github/gh-aw/actions/workflows/daily-function-namer.lock.yml/badge.svg)](https://github.com/github/gh-aw/actions/workflows/daily-function-namer.lock.yml) | - | - |
| [Daily Grafana OTel Instrumentation Advisor](https://github.com/github/gh-aw/blob/main/.github/workflows/daily-grafana-otel-instrumentation-advisor.md) | claude | [![Daily Grafana OTel Instrumentation Advisor](https://github.com/github/gh-aw/actions/workflows/daily-grafana-otel-instrumentation-advisor.lock.yml/badge.svg)](https://github.com/github/gh-aw/actions/workflows/daily-grafana-otel-instrumentation-advisor.lock.yml) | - | - |
@pelikhan
Copy link
Copy Markdown
Collaborator

pelikhan commented May 5, 2026

@copilot review all comments

…elated doc/lockfile changes

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/27539f97-fb71-4d10-a2de-fb1a279cd0a9

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 5, 2026

@copilot review all comments

Addressed in 9a7a99e:

  1. Tab indentation in NormalizeLeadingWhitespace: Fixed the indentation counting to use strings.TrimLeft(line, " \t") instead of strings.TrimLeft(line, " "), so tabs and mixed tab/space indentation are now handled correctly. Added test cases for both.

  2. Unrelated docs/lockfile changes: Reverted docs/src/content/docs/agent-factory-status.mdx, docs/src/content/docs/reference/frontmatter-full.md, and .github/workflows/metrics-collector.lock.yml to their original state — these were accidentally included from other in-flight changes in the branch.

Copilot AI requested a review from pelikhan May 5, 2026 02:25
@pelikhan pelikhan merged commit d64d32e into main May 5, 2026
@pelikhan pelikhan deleted the copilot/refactor-semantic-function-clustering-another-one branch May 5, 2026 02:26
@github-actions github-actions Bot mentioned this pull request May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[refactor] Semantic Function Clustering Analysis: Outliers, Near-Duplicates, and Pattern Inconsistencies

4 participants