Skip to content

[refactor] Semantic function clustering analysis: 4 refactoring opportunities identified #6767

@github-actions

Description

@github-actions

Analysis of repository: github/gh-aw-mcpg

Executive Summary

Analysis of 135 non-test Go files (845 functions) across 27 packages in internal/. The codebase is generally well-organized with clear file-level separation of concerns. This analysis identified 4 concrete refactoring opportunities: a near-duplicate pagination helper used in both production and test code, map[string]interface{} string-extraction spread across three packages, owner/repo argument extraction implemented in two different ways in the same proxy package, and rate-limit timestamp parsing split across two packages with different strategies.

No egregious outlier functions or large-scale structural problems were found. The logger package's intentional parallel structure (5 logger types × 4 functions each) is a design pattern, not duplication. The internal/server package's splitting into focused files (auth, hmac, circuit_breaker, difc_log, etc.) is exemplary.

Function Inventory

By Package (selected)

Package Files Functions Notes
internal/logger 16 106 Well-structured, parallel logger design
internal/server 19 131 Well-split across focused files
internal/difc 9 114 Well-organized label/evaluator split
internal/config 14 92 Well-organized per-concern files
internal/guard 11 69 WASM/noop/write-sink well-separated
internal/mcp 7 68 Good split: connection/methods/transport
internal/proxy 8 42 Minor duplication identified

Identified Issues

1. Near-Duplicate Pagination Generic Function

Severity: Low-Medium — logic duplication with identical guard patterns

Issue: Two independently implemented generic paginate[T] functions exist — one in production code, one in test utilities — with nearly identical page-count and cycle-detection logic.

Occurrence 1: internal/mcp/pagination.go:paginateAll[T any]

func paginateAll[T any](
    serverID, itemKind string,
    fetch func(cursor string) (paginatedPage[T], error),
) ([]T, error) {
    // page limit: paginateAllMaxPages (100)
    // cycle detection: seenCursors map
    ...
}

Occurrence 2: internal/testutil/mcptest/validator.go:paginate[T any]

func paginate[T any](ctx context.Context,
    fetch func(ctx context.Context, cursor string) ([]T, string, error),
) ([]T, error) {
    // page limit: validatorPaginationMaxPages
    // cycle detection: seenCursors map
    ...
}

Similarity: Both implement identical safety invariants (cycle detection via seenCursors map[string]struct{}, max page guard), just with different function signatures (callback shapes differ).

Root Cause: The test validator cannot import internal/mcp without creating a circular dependency, so it re-implements the pagination logic.

Recommendation:

  • Extract the pagination safety logic into internal/syncutil or a new internal/pagination package that neither mcp nor testutil imports from, enabling both to share it.
  • Alternatively, document the intentional duplication with a comment linking the two implementations so future changes to one prompt updating the other.

Estimated Impact: Low risk, reduces maintenance surface for pagination invariants.


2. map[string]interface{} String Field Extraction Pattern in Three Packages

Severity: Low — repeated inline pattern, no shared utility

Issue: Extracting string values from map[string]interface{} is done via inline type assertions in three separate locations with no shared helper.

Occurrence 1internal/server/difc_log.go:getFilteredItemStringField (public helper):

func getFilteredItemStringField(m map[string]interface{}, fields ...string) string {
    for _, field := range fields {
        if v, ok := m[field].(string); ok && v != "" {
            return v
        }
    }
    return ""
}

Occurrence 2internal/proxy/proxy.go:extractOwnerRepoNumber (inline):

owner, _ = argsMap[ownerKey].(string)
repo, _ = argsMap[repoKey].(string)

Occurrence 3internal/proxy/collaborator_permission.go:ParseCollaboratorPermissionArgs (inline):

owner, _ = argsMap["owner"].(string)
repo, _ = argsMap["repo"].(string)

Recommendation: Promote getFilteredItemStringField (or a renamed version like StringFromMap) into internal/strutil and use it consistently across server and proxy packages. The strutil package already holds string utility functions (Truncate, DeduplicateStrings, RandomHex, etc.).

Estimated Impact: Minor reduction in boilerplate; improves testability of the extraction logic.


3. extractOwnerRepo Implemented Twice in internal/proxy

Severity: Low — same package, different signatures for same conceptual operation

Issue: Two functions in internal/proxy extract owner and repo from different input types, but share the same underlying intent and some code.

Occurrence 1internal/proxy/graphql.go:extractOwnerRepo(variables map[string]interface{}, query string) (string, string)

  • Extracts from GraphQL variables map (tries "owner", "name", "repo") or falls back to regex on the query string.

Occurrence 2internal/proxy/proxy.go:extractOwnerRepoNumber(argsMap map[string]interface{}, ownerKey, repoKey, numberKey, toolName string) (string, string, string, error)

  • Extracts from REST tool args map; also handles float64 → string coercion for numeric fields.

Both functions do map[string]interface{} → string extraction for owner/repo with slightly different key names and fallback behavior.

Recommendation:

  • Extract a shared stringFromArgs(m map[string]interface{}, key string) string helper (could be the same StringFromMap from finding Lpcox/initial implementation #2).
  • Both functions can delegate to this helper for the map lookup while keeping their own signature and fallback logic.

Estimated Impact: Minor. Reduces repetition, makes intent clearer.


4. Rate-Limit Reset Time Parsing Split Across Two Packages

Severity: Low-Medium — functionally related, different parsing strategies in separate packages

Issue: Two distinct rate-limit reset time parsers exist in different packages with no shared abstraction:

Occurrence 1internal/httputil/github_http.go:ParseRateLimitResetHeader

  • Parses a Unix timestamp from the HTTP X-RateLimit-Reset header value.

Occurrence 2internal/server/circuit_breaker.go:parseRateLimitResetFromText

  • Parses a relative duration from MCP tool response text (e.g., "rate reset in 30s").

These solve different sub-problems (header vs text body), but both produce time.Time rate-limit reset values. The circuit breaker also uses ParseRateLimitResetHeader indirectly via httputil.

Current state: httputil.ParseRateLimitResetHeader is already in the right place (HTTP utilities). parseRateLimitResetFromText is embedded in circuit_breaker.go and is only called from within that file.

Recommendation:

  • Extract parseRateLimitResetFromText to its own file internal/server/rate_limit.go alongside isRateLimitText and isRateLimitToolResult, separating rate-limit detection logic from circuit-breaker state machine logic in circuit_breaker.go.
  • This would make circuit_breaker.go focused on state transitions and improve discoverability of the rate-limit parsing logic.

Estimated Impact: Low risk, improves circuit_breaker.go focus and readability.


Refactoring Recommendations

Priority 1: Medium Impact — Extract Rate-Limit Helpers to Dedicated File

Action: Move isRateLimitText, parseRateLimitResetFromText, isRateLimitToolResult, and extractRateLimitErrorText from internal/server/circuit_breaker.go into a new internal/server/rate_limit.go file.

Rationale: circuit_breaker.go is already 340+ lines. Separating the rate-limit text parsing logic from the state machine logic follows the existing pattern in this codebase (e.g., connection_logging.go was split from connection.go in internal/mcp).

Estimated effort: 30 minutes — move functions, no logic changes.

Priority 2: Low Impact — Shared Map String Helper in strutil

Action: Add GetStringFromMap(m map[string]interface{}, key string) string to internal/strutil and update proxy package usages.

Estimated effort: 1–2 hours including test updates.

Priority 3: Low Impact — Document or Centralize Pagination Logic

Action: Either add a cross-reference comment in both paginate implementations, or extract core pagination invariants to a shared package.

Estimated effort: 30 minutes (comment) or 2–3 hours (extraction).


Implementation Checklist

  • Move rate-limit helpers from circuit_breaker.gorate_limit.go in internal/server
  • Add GetStringFromMap to internal/strutil, update proxy usages
  • Add cross-reference comments between mcp/pagination.go and testutil/mcptest/validator.go
  • Consider extractOwnerRepo consolidation during next proxy refactor

Analysis Metadata

  • Total Go Files Analyzed: 135 (excluding test files)
  • Total Functions Cataloged: 845
  • Packages Inspected: 27
  • Outliers Found: 0 (all functions appear in appropriate files)
  • Duplicate/Near-Duplicate Clusters Identified: 4
  • Detection Method: Naming pattern analysis + code similarity review across all internal/ packages
  • Analysis Date: 2026-05-30

Generated by Semantic Function Refactoring · sonnet46 3.6M ·

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions