Skip to content

[refactor] Semantic Function Clustering Analysis #5340

@github-actions

Description

@github-actions

Analysis of repository: github/gh-aw-mcpg

Executive Summary

Analyzed 127 non-test Go source files across 23 packages in internal/, cataloging 801 functions and methods. The codebase is generally well-organized with clear separation of concerns. This analysis identified 4 notable outlier functions and 1 near-duplicate rate-limit parsing pattern. No major structural issues were found — the findings below are targeted, actionable improvements.


Full Report

Function Inventory

By Package

Package Files Primary Purpose
internal/auth 1 Auth header parsing
internal/cmd 11 CLI (Cobra) commands & flags
internal/config 12 Config parsing, validation, guard policy
internal/difc 8 Decentralized Information Flow Control
internal/envutil 4 Environment variable utilities
internal/guard 10 Security guards (Noop, Wasm, WriteSink)
internal/httputil 2 Shared HTTP helpers
internal/launcher 4 Backend process management
internal/logger 14 Debug logging framework
internal/mcp 10 MCP protocol types & transport
internal/middleware 1 jq schema processing middleware
internal/oidc 1 GitHub Actions OIDC provider
internal/proxy 6 GitHub API filtering proxy
internal/server 17 HTTP server (routed/unified modes)
internal/strutil 4 String & formatting utilities
internal/syncutil 1 Concurrency utilities
internal/sys 2 System utilities
internal/testutil/mcptest 4 Test utilities
internal/tracing 2 OpenTelemetry tracing
internal/tty 1 Terminal detection
internal/version 1 Version management

Identified Issues

1. Outlier: SessionSuffix in internal/logger/session_helpers.go

File: internal/logger/session_helpers.go
Function: SessionSuffix(sessionID string) string
Issue: This is a pure string formatting utility with no dependency on any logger types. It formats a session ID into a log suffix string (" for session '<id>'") and resides in the logger package only because it's used in log messages, but it has no logger-specific knowledge.

// Current location: internal/logger/session_helpers.go
func SessionSuffix(sessionID string) string {
    if sessionID == "" {
        return ""
    }
    return fmt.Sprintf(" for session '%s'", sessionID)
}

Callers (all in non-logger packages):

  • internal/mcp/errors.go
  • internal/launcher/log_helpers.go (3 usages)

Recommendation: Move SessionSuffix to internal/strutil/ as a general string formatting utility, or inline at the 4 call sites (the function body is trivial). If kept in logger, the file is appropriately named but callers outside the package create a cross-package dependency on a string-only utility.

Estimated Impact: Low — purely organizational, no functional change.


2. Near-Duplicate Domain: Rate-Limit Reset Parsing

Files:

  • internal/httputil/github_http.goParseRateLimitResetHeader(value string) time.Time
  • internal/server/circuit_breaker.goparseRateLimitResetFromText(text string) time.Time

Issue: Rate-limit parsing logic is split across two packages. Both functions translate GitHub rate limit signals into time.Time values but handle different input formats:

// internal/httputil/github_http.go — parses Unix timestamp from HTTP header
func ParseRateLimitResetHeader(value string) time.Time { ... }

// internal/server/circuit_breaker.go — parses "rate reset in Ns" text pattern
func parseRateLimitResetFromText(text string) time.Time { ... }

Recommendation: Consider moving parseRateLimitResetFromText to internal/httputil/github_http.go (or a new internal/httputil/ratelimit.go) to centralize all rate limit time parsing in one place. They serve the same conceptual purpose and centralizing them improves discoverability.

Estimated Impact: Medium — improves cohesion of rate limit utilities.


3. Outlier: ensureTracingConfig in internal/cmd/root.go

File: internal/cmd/root.go
Function: ensureTracingConfig(cfg *config.Config) *config.TracingConfig
Issue: A tracing-specific helper living in root.go alongside CLI root command wiring. The cmd package already has internal/cmd/tracing.go specifically for tracing-related setup logic (initTracingProviderWithFallback, registerTracingFlags, shutdownTracingProviderWithTimeout).

// Current location: internal/cmd/root.go
func ensureTracingConfig(cfg *config.Config) *config.TracingConfig {
    if cfg.Gateway.Tracing == nil {
        cfg.Gateway.Tracing = &config.TracingConfig{}
    }
    return cfg.Gateway.Tracing
}

Recommendation: Move ensureTracingConfig to internal/cmd/tracing.go where the other tracing initialization functions reside.

Estimated Impact: Low — small organization improvement, no functional change.


4. Outlier: clientAddr in internal/cmd/proxy.go

File: internal/cmd/proxy.go
Function: clientAddr(addr string) string
Issue: A general-purpose address normalization utility (converts 0.0.0.0:port to localhost:port) embedded in the proxy command file. This is a network string utility with no inherent coupling to proxy command logic.

func clientAddr(addr string) string {
    host, port, err := net.SplitHostPort(addr)
    ...
    switch host {
    case "", "0.0.0.0", "::", "[::]":
        return net.JoinHostPort("localhost", port)
    }
    ...
}

Recommendation: If the utility could be useful elsewhere, move to internal/httputil/httputil.go. If it's truly proxy-command-specific, it's acceptable where it is.

Estimated Impact: Low — minor organization concern.


Clustering Results

Well-Organized Clusters ✓

The following clusters are well-organized and require no changes:

  • Validation cluster (internal/config/validation*.go, guard_policy_validation.go): 27 validate* functions cleanly distributed across purpose-specific files.
  • Parsing cluster (internal/config/guard_policy_parse.go): Parse* functions for guard policies co-located.
  • DIFC cluster (internal/difc/): Labels, agents, evaluator, capabilities all in separate, clearly-named files.
  • Strutil cluster (internal/strutil/): Truncate, FormatDuration, DeduplicateStrings, RandomHex — each in its own file, all pure string utilities.
  • Logger cluster (internal/logger/): Extensive but well-separated: file_logger.go, jsonl_logger.go, markdown_logger.go, rpc_formatter.go, rpc_helpers.go, etc.
  • Server cluster (internal/server/): Clear purpose per file — auth.go, circuit_breaker.go, hmac.go, session.go, transport.go, etc.

No Significant Duplicates Detected

  • ParseRateLimitResetHeader and parseRateLimitResetFromText parse different input formats — near-duplicate in domain but not in implementation.
  • requireSession in internal/mcp/connection.go and internal/server/session.go serve different layers (MCP protocol vs. HTTP session management) — not duplicates.
  • Truncate* functions: strutil.Truncate (general), TruncateSessionID (auth-specific), TruncateSecret (sanitization) — each has a distinct concern.
  • Multiple init() functions (11 total) — idiomatic Go for flag registration; not an issue.

Refactoring Recommendations

Priority 1: Low Effort, Organizational Clarity

  1. Move ensureTracingConfig from cmd/root.gocmd/tracing.go

    • All tracing lifecycle functions in one place
    • Estimated effort: 5 minutes
  2. Move parseRateLimitResetFromText from server/circuit_breaker.gohttputil/github_http.go

    • Centralizes rate limit parsing utilities
    • Estimated effort: 15 minutes

Priority 2: Evaluate and Decide

  1. SessionSuffix location: Decide whether to keep in logger (acceptable if only used for logging), move to strutil, or inline at call sites.

  2. clientAddr location: If proxy command is the only consumer, acceptable where it is. If useful elsewhere, move to httputil.


Implementation Checklist

  • Move ensureTracingConfig to internal/cmd/tracing.go
  • Move parseRateLimitResetFromText to internal/httputil/github_http.go (or new ratelimit.go)
  • Evaluate SessionSuffix placement — inline, move to strutil, or accept in logger
  • Evaluate clientAddr placement — accept in proxy.go or move to httputil
  • Run make agent-finished after any changes

Analysis Metadata

Metric Value
Total Go Files Analyzed 127
Total Functions/Methods Cataloged 801
Packages Analyzed 23
Function Clusters Identified 7 major clusters
Outliers Found 4
Duplicates Detected 0 exact, 1 near-duplicate domain
Detection Method Naming pattern analysis + cross-package reference tracing
Analysis Date 2026-05-08

Generated by Semantic Function Refactoring · ● 977.3K ·

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions