Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# ADR-34797: Fast-Path YAML Bypass for Permission Scope-Name Validation

**Date**: 2026-05-26
**Status**: Draft
**Deciders**: pelikhan (copilot-swe-agent)

---

## Part 1 — Narrative (Human-Friendly)

### Context

A validation microbenchmark regressed to ~29.7µs/op (+17.9% vs historical baseline). Profiling identified `ValidatePermissionScopeNames` on the hot path of workflow compilation, dominated by two costs: (a) per-call construction of the full known-scope list (`GetAllPermissionScopes` + `GetAllGitHubAppOnlyScopes` + meta-key map + scope preview string), and (b) `yaml.Unmarshal` on the permissions block, which allocates a generic `map[string]any` even when the input is the trivial common case — a flat block-style mapping of scope → access level. Because permissions blocks in real workflows are nearly always written in block style with a small number of keys, the YAML decoder is doing far more work than the validation actually needs.

### Decision

We will short-circuit `ValidatePermissionScopeNames` with a fast extraction path that reads top-level scope keys directly from block-style YAML without invoking `yaml.Unmarshal`, falling back to the existing YAML-based parsing only when the content uses flow mappings, sequences, or other non-trivial YAML structures (detected by presence of `{`, `}`, `[`, `]`). Reusable validation state — the full scope list, the meta-key set (`all`, `read-all`, `write-all`, `none`), and the preformatted scope preview string used in error messages — is hoisted to package-level `var` initialization so it is built once at startup rather than on every call. Single-key validation logic is centralized in `validateSinglePermissionScopeKey` so the fast and fallback paths share identical error semantics.

### Alternatives Considered

#### Alternative 1: Keep `yaml.Unmarshal` and cache only the package-level state

Hoisting the scope list, meta-key set, and preview string to `var` initialization alone would eliminate the per-call list construction (a meaningful chunk of the regression) without touching the YAML parsing path. This was considered because it is a strictly smaller change with no behavioral risk: every input continues through the same `yaml.Unmarshal` code path. It was rejected because profiling showed `yaml.Unmarshal` of the permissions block was itself a material cost (allocating `map[string]any` and decoder state on every call), and the common-case input is so structurally simple — a flat list of `key: level` lines — that parsing it as full YAML is disproportionate work.

#### Alternative 2: Replace YAML parsing entirely with a hand-rolled parser

Removing `yaml.Unmarshal` from this function altogether and parsing every input — including flow mappings and quoted keys with embedded colons — with custom code would maximize the speed-up. This was rejected because the GitHub Actions YAML spec allows several non-block forms (flow mappings like `{contents: read, issues: write}`, anchors, multi-line scalars) that are tedious and error-prone to reimplement. The hybrid approach — fast path for the easy common case, real YAML decoder for everything else — captures most of the speed-up while preserving exact behavior for edge cases that already work today.

#### Alternative 3: Cache parsed results across calls

A keyed cache from permissions-YAML string to validation result would skip both the YAML parse and the validation work on repeat inputs. This was not chosen because the validator is called from a compile pipeline that already memoizes higher-level work (see ADR-28560), and the inputs vary enough per workflow that a cache hit rate would not justify the cache plumbing, eviction policy, and concurrent-access surface area.

### Consequences

#### Positive
- The hot common case (block-style permissions with a handful of keys) avoids `yaml.Unmarshal` entirely, removing the dominant per-call allocation and decoder cost.
- Package-level cached scope list, meta-key set, and preview string remove the repeated `GetAllPermissionScopes`/`GetAllGitHubAppOnlyScopes` traversals and slice allocations from every validation call.
- `validateSinglePermissionScopeKey` centralizes the meta-key check, exact-match check, case-only suggestion, and fuzzy-match error message — both the fast path and the YAML fallback emit identical errors, removing a class of drift bugs.
- Flow-mapping inputs (`{contents: read, issues: write}`) and quoted-key inputs (`"contents": read`) are covered by added regression tests, so the dual-path design has explicit behavioral pinning.

#### Negative
- There are now two code paths through `ValidatePermissionScopeNames` (hand-rolled fast extraction + YAML fallback). Future changes to validation semantics must be made in both `validateSinglePermissionScopeKey` callers and the fast path's structural assumptions.
- The fast-path detector keys off the literal presence of `{`, `}`, `[`, `]` anywhere in the content. A block-style permissions value containing these characters in a comment (e.g., `# example: {contents: read}`) will unnecessarily fall back to YAML parsing. This is a correctness-preserving false positive, not a behavior change, but it means the speed-up does not apply to inputs with such comments.
- Package-level initialization runs at import time, including `safeAllocationCapacity` and the join over the preview slice. Any future panic in those builders would now manifest at package init rather than first call.

#### Neutral
- The function continues to return `nil` for inputs that are not maps (shorthand strings like `read-all`), matching prior behavior.
- The fast path explicitly accepts quoted keys by trimming `"` and `'` from extracted keys; quoted-key inputs that previously went through `yaml.Unmarshal` are now handled by the fast path with no observable difference.
- `countLeadingSpacesAndTabs` and `extractTopLevelPermissionScopeKeys` are new package-private helpers; their scope is intentionally narrow to this one validator and they are not exported.

---

## Part 2 — Normative Specification (RFC 2119)

> The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHALL**, **SHALL NOT**, **SHOULD**, **SHOULD NOT**, **RECOMMENDED**, **MAY**, and **OPTIONAL** in this section are to be interpreted as described in [RFC 2119](https://www.rfc-editor.org/rfc/rfc2119).

### Fast-Path Extraction

1. `ValidatePermissionScopeNames` **MUST** attempt fast-path extraction of top-level scope keys via `extractTopLevelPermissionScopeKeys` before invoking `yaml.Unmarshal`.
2. The fast path **MUST** decline (return `parsedFast == false`) when the content contains any of the characters `{`, `}`, `[`, or `]`, so flow mappings and sequences are routed to the YAML fallback.
3. The fast path **MUST** decline when any non-empty, non-comment line does not contain a `:` character at column > 0, or begins with `-` after trimming, so non-mapping inputs are routed to the YAML fallback.
4. When the fast path succeeds, it **MUST** trim surrounding `"` and `'` from each extracted key before validation, so quoted keys are accepted as equivalent to their unquoted form.
5. When the fast path succeeds and all extracted keys validate, `ValidatePermissionScopeNames` **MUST** return `nil` without invoking `yaml.Unmarshal`.

### YAML Fallback Path

1. When the fast path declines, `ValidatePermissionScopeNames` **MUST** fall back to `yaml.Unmarshal` of the content into a `map[string]any`.
2. If `yaml.Unmarshal` returns an error, the function **MUST** return `nil` (treating the input as a non-map shorthand), preserving prior behavior.
3. The fallback **MUST** invoke `validateSinglePermissionScopeKey` for each key in the resulting map, so error messages and suggestions are identical to the fast path.

### Centralized Single-Key Validation

1. Both the fast path and the YAML fallback **MUST** route per-key validation through `validateSinglePermissionScopeKey`; neither path may inline its own meta-key, exact-match, or fuzzy-match logic.
2. `validateSinglePermissionScopeKey` **MUST** accept any key present in `permissionScopeValidationMeta` (currently `all`, `read-all`, `write-all`, `none`) without error.
3. `validateSinglePermissionScopeKey` **MUST** accept any key present in `validPermissionScopes` without error.
4. For unknown keys that differ from a valid scope only by letter case, `validateSinglePermissionScopeKey` **MUST** return an error suggesting the lowercase form.
5. For unknown keys with no case-only match, `validateSinglePermissionScopeKey` **MUST** consult `stringutil.FindClosestMatches` against `permissionScopeValidationAllScopes` with a limit of 3 suggestions, and **MUST** return `nil` (silent accept) when no suggestions are found.

### Package-Level Cached State

1. `permissionScopeValidationAllScopes`, `permissionScopeValidationMeta`, and `permissionScopeValidationValidScopesPreview` **MUST** be initialized once at package load time as package-level `var`s.
2. `ValidatePermissionScopeNames` and `validateSinglePermissionScopeKey` **MUST NOT** rebuild the full scope list, meta-key map, or preview string on each call.
3. The preview string **MUST** contain at most the first 10 entries of `permissionScopeValidationAllScopes` joined by `", "`, followed by `"..."`.

### Conformance

An implementation is considered conformant with this ADR if it satisfies all **MUST** and **MUST NOT** requirements above. Failure to meet any **MUST** or **MUST NOT** requirement constitutes non-conformance.

---

*This is a DRAFT ADR generated by the [Design Decision Gate](https://github.com/github/gh-aw/actions/runs/26425418484) workflow. The PR author must review, complete, and finalize this document before the PR can merge.*
11 changes: 11 additions & 0 deletions pkg/workflow/permissions_scope_validation_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,17 @@ pull-requests: read`,
issues: write`,
wantErr: false,
},
{
name: "quoted scope keys are valid",
yaml: `"contents": read
"pull-requests": write`,
wantErr: false,
},
{
name: "flow mapping shorthand remains valid",
yaml: `{contents: read, issues: write}`,
wantErr: false,
},
{
name: "typo in scope name suggests correction",
yaml: `contnts: read`,
Expand Down
177 changes: 127 additions & 50 deletions pkg/workflow/permissions_validation.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,20 @@ import (
"github.com/goccy/go-yaml"
)

var (
permissionScopeValidationAllScopes = buildPermissionScopeValidationAllScopes()
permissionScopeValidationMeta = map[string]struct{}{
"all": {},
"read-all": {},
"write-all": {},
"none": {},
}
permissionScopeValidationValidScopesPreview = strings.Join(
permissionScopeValidationAllScopes[:min(10, len(permissionScopeValidationAllScopes))],
", ",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"..." is unconditionally appended to the scopes preview even when all scopes fit within the 10-item cap.

💡 Details and fix

If buildPermissionScopeValidationAllScopes() ever returns ≤ 10 entries — possible in tests or if the scope list is trimmed — the error message claims there are more valid scopes than shown, which is misleading.

// Fix:
if len(permissionScopeValidationAllScopes) > 10 {
    permissionScopeValidationValidScopesPreview = strings.Join(
        permissionScopeValidationAllScopes[:10], ", ",
    ) + "..."
} else {
    permissionScopeValidationValidScopesPreview = strings.Join(
        permissionScopeValidationAllScopes, ", ",
    )
}

) + "..."
)

// PermissionsValidationResult contains the result of permissions validation
type PermissionsValidationResult struct {
MissingPermissions map[PermissionScope]PermissionLevel // Permissions required but not granted
Expand Down Expand Up @@ -402,27 +416,6 @@ func ValidatePermissionScopeNames(permissionsYAML string) error {

permissionsValidationLog.Print("Validating permission scope names")

// Collect all valid scope names for fuzzy matching
ghTokenScopes := GetAllPermissionScopes()
appOnlyScopes := GetAllGitHubAppOnlyScopes()
// +1 for copilot-requests which is not in GetAllPermissionScopes
allScopes := make([]string, 0, safeAllocationCapacity(len(ghTokenScopes), len(appOnlyScopes), 1))
for _, scope := range ghTokenScopes {
allScopes = append(allScopes, string(scope))
}
for _, scope := range appOnlyScopes {
allScopes = append(allScopes, string(scope))
}
// copilot-requests is valid even though not in GetAllPermissionScopes
allScopes = append(allScopes, string(PermissionCopilotRequests))
// "all" is a meta-key that is always valid in shorthand contexts
validMeta := map[string]bool{
"all": true,
"read-all": true,
"write-all": true,
"none": true,
}

// Strip optional "permissions:" prefix so we can parse just the map content
content := strings.TrimSpace(permissionsYAML)
if strings.HasPrefix(content, "permissions:") {
Expand All @@ -435,50 +428,134 @@ func ValidatePermissionScopeNames(permissionsYAML string) error {
}
}

// Try to parse the content as a YAML map of scope → level
scopeKeys, parsedFast := extractTopLevelPermissionScopeKeys(content)
if parsedFast {
for _, scopeKey := range scopeKeys {
if err := validateSinglePermissionScopeKey(scopeKey); err != nil {
return err
}
}
return nil
}

// Fallback to YAML map parsing for non-standard formats (flow mappings, etc.).
var permsMap map[string]any
if err := yaml.Unmarshal([]byte(content), &permsMap); err != nil {
// Not a map (e.g., a shorthand like "read-all"); nothing to validate
return nil
}

for scopeKey := range permsMap {
if validMeta[scopeKey] {
continue
if err := validateSinglePermissionScopeKey(scopeKey); err != nil {
return err
}
}

return nil
}

func validateSinglePermissionScopeKey(scopeKey string) error {
if _, ok := permissionScopeValidationMeta[scopeKey]; ok {
return nil
}
if _, ok := validPermissionScopes[scopeKey]; ok {
return nil
}

// Unknown scope key — check for a case-only difference first (e.g. "Contents" → "contents")
lowerScopeKey := strings.ToLower(scopeKey)
if lowerScopeKey != scopeKey {
if _, ok := validPermissionScopes[lowerScopeKey]; ok {
return fmt.Errorf(
"unknown permission scope %q.\n\nDid you mean: %s?\n\nValid permission scopes include: %s\n\nSee: %s",
scopeKey,
lowerScopeKey,
permissionScopeValidationValidScopesPreview,
constants.DocsPermissionsURL,
)
}
if _, ok := validPermissionScopes[scopeKey]; ok {
}

// Check for a close fuzzy match
permissionsValidationLog.Printf("Unknown permission scope key: %q", scopeKey)
suggestions := stringutil.FindClosestMatches(scopeKey, permissionScopeValidationAllScopes, 3)
if len(suggestions) == 0 {
return nil // too different to be a typo, ignore silently
}

return fmt.Errorf(
"unknown permission scope %q.\n\nDid you mean: %s?\n\nValid permission scopes include: %s\n\nSee: %s",
scopeKey,
strings.Join(suggestions, ", "),
permissionScopeValidationValidScopesPreview,
constants.DocsPermissionsURL,
)
}

func extractTopLevelPermissionScopeKeys(content string) ([]string, bool) {
if content == "" {
return nil, false
}
// Flow-style mappings ("{contents: read}") and sequence values are uncommon here.
// Let the YAML decoder handle those to preserve behavior.
if strings.Contains(content, "{") || strings.Contains(content, "}") || strings.Contains(content, "[") || strings.Contains(content, "]") {
return nil, false
}

lines := strings.Split(content, "\n")
keys := make([]string, 0, 4)
baseIndent := -1

for _, line := range lines {
trimmed := strings.TrimSpace(line)
if trimmed == "" || strings.HasPrefix(trimmed, "#") {
continue
}
colonIdx := strings.IndexByte(trimmed, ':')
if colonIdx <= 0 {
return nil, false
}

// Unknown scope key — check for a case-only difference first (e.g. "Contents" → "contents")
lowerScopeKey := strings.ToLower(scopeKey)
if lowerScopeKey != scopeKey {
if _, ok := validPermissionScopes[lowerScopeKey]; ok {
return fmt.Errorf(
"unknown permission scope %q.\n\nDid you mean: %s?\n\nValid permission scopes include: %s\n\nSee: %s",
scopeKey,
lowerScopeKey,
strings.Join(allScopes[:min(10, len(allScopes))], ", ")+"...",
constants.DocsPermissionsURL,
)
}
indent := countLeadingSpacesAndTabs(line)
if baseIndent == -1 {
baseIndent = indent
}
if indent != baseIndent {
// Nested block content (value continuation), not a top-level scope key.
continue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fast path silently skips deeper-indented key-value lines instead of falling back, diverging from YAML semantics.

💡 Details and fix

When indent != baseIndent, the code does continue. This is fine for plain scalar continuations, but if a deeper-indented line is itself a key: value pair (e.g. from an accidentally-indented permission or a nested map), the fast path silently drops it and returns parsedFast=true. The YAML fallback — which would surface the nested key or return a parse error — never runs.

Example: contents: read\n badscope: write is invalid YAML (the YAML parser would error → ValidatePermissionScopeNames returns nil). The fast path returns ["contents"] and parsedFast=true, silently accepting the document.

Fix: when a deeper-indented line contains a colon, bail out to the YAML parser rather than skipping:

if indent != baseIndent {
    if colonIdx > 0 {
        // Looks like a nested map key — hand off to YAML parser.
        return nil, false
    }
    continue // plain scalar value continuation, safe to skip
}

}

// Check for a close fuzzy match
permissionsValidationLog.Printf("Unknown permission scope key: %q", scopeKey)
suggestions := stringutil.FindClosestMatches(scopeKey, allScopes, 3)
if len(suggestions) == 0 {
continue // too different to be a typo, ignore silently
key := strings.TrimSpace(trimmed[:colonIdx])
if key == "" || strings.HasPrefix(key, "-") {
return nil, false
}
key = strings.Trim(key, `"'`)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strings.Trim strips all leading/trailing quote characters rather than one balanced pair, silently normalising malformed keys.

💡 Details and fix

strings.Trim(key, "\"'") removes every leading and trailing " or ' character. A malformed key like ""contents"" becomes contents and validates successfully. A key like "write-all (unbalanced quote, meaning a workflow authoring error) loses its leading quote and is validated as write-all — a valid meta key — masking the typo.

The intent is to unquote YAML keys that were written with explicit quotes, but only a single balanced pair should be removed:

if len(key) >= 2 {
    q := key[0]
    if (q == '"' || q == '\'' ) && key[len(key)-1] == q {
        key = key[1 : len(key)-1]
    }
}

If the quotes are unbalanced, leaving the key unchanged (or returning nil, false) lets the YAML fallback handle it correctly.


return fmt.Errorf(
"unknown permission scope %q.\n\nDid you mean: %s?\n\nValid permission scopes include: %s\n\nSee: %s",
scopeKey,
strings.Join(suggestions, ", "),
strings.Join(allScopes[:min(10, len(allScopes))], ", ")+"...",
constants.DocsPermissionsURL,
)
keys = append(keys, key)
}

return nil
return keys, len(keys) > 0
}

func countLeadingSpacesAndTabs(s string) int {
i := 0
for i < len(s) && (s[i] == ' ' || s[i] == '\t') {
i++
}
return i
}

func buildPermissionScopeValidationAllScopes() []string {
ghTokenScopes := GetAllPermissionScopes()
appOnlyScopes := GetAllGitHubAppOnlyScopes()
allScopes := make([]string, 0, safeAllocationCapacity(len(ghTokenScopes), len(appOnlyScopes), 1))
for _, scope := range ghTokenScopes {
allScopes = append(allScopes, string(scope))
}
for _, scope := range appOnlyScopes {
allScopes = append(allScopes, string(scope))
}
// copilot-requests is valid even though not in GetAllPermissionScopes
allScopes = append(allScopes, string(PermissionCopilotRequests))
return allScopes
}