Skip to content

feat: auto-extract shell injection expressions from run: steps into env vars#28998

Merged
pelikhan merged 5 commits intomainfrom
copilot/update-compiler-review-shell-sources
Apr 28, 2026
Merged

feat: auto-extract shell injection expressions from run: steps into env vars#28998
pelikhan merged 5 commits intomainfrom
copilot/update-compiler-review-shell-sources

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 28, 2026

Summary

When a workflow step's run: field contains GitHub Actions expressions like ${{ github.event.issue.title }} or ${{ steps.foo.outputs.bar }}, an attacker who controls those values can inject arbitrary shell commands — a technique known as template injection or script injection.

This PR updates the compiler to automatically fix this pattern instead of failing with a hard error. For every ${{ ... }} expression found directly in a run: script (outside heredoc blocks), the compiler now:

  1. Generates a deterministic GH_AW_<EXPR> environment variable name
  2. Moves the expression to the step's env: block
  3. Replaces the inline ${{ }} occurrence in the shell script with $GH_AW_...
  4. Emits a compiler warning explaining exactly what was changed and why

Example transformation:

Before (unsafe — shell injection risk):

- name: Print title
  run: echo "${{ github.event.issue.title }}"

After (safe — expression passed as env var):

- name: Print title
  env:
    GH_AW_GITHUB_EVENT_ISSUE_TITLE: ${{ github.event.issue.title }}
  run: echo "$GH_AW_GITHUB_EVENT_ISSUE_TITLE"

Warning emitted:

⚠ extracted ${{ github.event.issue.title }} from run: script in step "Print title" into env var GH_AW_GITHUB_EVENT_ISSUE_TITLE to prevent shell injection

Changes

File Change
pkg/workflow/run_step_sanitizer.go NewsanitizeRunStepExpressions() (step map → sanitized map), sanitizeCustomStepsYAML() (raw YAML string → sanitized string)
pkg/workflow/template_injection_utils.go NewreplaceOutsideQuotedHeredocs() helper that restricts expression replacement to non-quoted-heredoc regions
pkg/workflow/run_step_sanitizer_test.go New — comprehensive unit tests covering all code paths including collision handling, heredoc-preservation, sanitizeCustomStepsYAML, and property tests
pkg/workflow/run_step_sanitizer_fuzz_test.go NewFuzzSanitizeRunStepExpressions and FuzzSanitizeCustomStepsYAML fuzz tests with 30+ seed corpus entries
pkg/workflow/compiler_yaml_step_conversion.go Hook into renderStepFromMap() — struct-based step rendering path; map keys sorted for stable output
pkg/workflow/compiler_yaml_main_job.go Hook into addCustomStepsAsIs() and addCustomStepsWithRuntimeInsertion() via new sanitizeAndWarnCustomSteps() helper

Security Rationale

The safe pattern requires the GitHub Actions runner to evaluate the expression value and store it as data in the environment before the shell sees it. When ${{ ... }} appears inline in a run: script, the runner substitutes the raw string value directly into the shell command before execution — allowing an attacker who controls that value (e.g., via an issue title or PR description) to break out of quoted strings and execute arbitrary commands.

Collision Handling

If a step's env: block already defines a GH_AW_* key with the same expression value, the sanitizer reuses it without adding a duplicate entry. If the key exists with a different value, the sanitizer picks an alternate name (GH_AW_..._2, _3, …, capped at 100 attempts) rather than silently overwriting the user's variable.

Quoted Heredoc Safety

Replacements are performed only outside quoted heredoc regions (<< 'EOF'EOF). Quoted heredocs suppress shell variable expansion, so writing $GH_AW_VAR inside them would produce the literal string in the output file instead of the expression value. Unquoted heredocs (<< EOF) are still rewritten because the shell expands variables there.

Test Coverage

All unit tests pass. 203/203 workflow files recompile cleanly with no new warnings (existing workflows already use the safe env: pattern).

Fuzz Tests

Two fuzz tests are included in run_step_sanitizer_fuzz_test.go:

  • FuzzSanitizeRunStepExpressions — feeds arbitrary run: script content and verifies: no panic, non-heredoc portion of the sanitized script contains no ${{ }}, env block is populated with GH_AW_ keys, warnings mention "shell injection", and the input map is never mutated.
  • FuzzSanitizeCustomStepsYAML — feeds arbitrary YAML strings and verifies: no panic, malformed YAML returns the input unchanged, and warning messages contain "shell injection".

The fuzz tests caught a real issue in the initial test logic: expressions inside heredoc blocks are intentionally left in place (correct behaviour), so the postcondition check now correctly applies removeHeredocContent() before asserting the absence of ${{.

Copilot AI and others added 2 commits April 28, 2026 22:41
…nv vars

When the compiler encounters ${{ ... }} expressions directly in the run:
field of a step (outside heredoc blocks), it now automatically:

1. Generates a deterministic GH_AW_<EXPR> environment variable name
2. Moves the expression to the step's env: block
3. Replaces the inline ${{ }} reference in the shell script with $VAR
4. Emits a compiler warning explaining what was changed and why

This prevents shell injection attacks (template injection) where
user-controlled data in expressions like github.event.issue.title or
steps.*.outputs.* can be used to execute arbitrary commands.

The sanitizer is applied:
- In renderStepFromMap() for struct-based step rendering
- In addCustomStepsAsIs() for raw-YAML user-authored steps
- In addCustomStepsWithRuntimeInsertion() for steps with runtime injection

A new run_step_sanitizer.go file provides the core logic, and
run_step_sanitizer_test.go provides comprehensive test coverage.

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/6f0951d1-ca98-48dd-9f00-6596d812879b

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
- Rename renderStepFromMap parameter from 'yamlBuilder' to 'out' to remove
  variable aliasing (all internal uses updated to 'out')
- Clarify comment about heredoc replacement: expressions found exclusively
  inside heredocs are never added to 'ordered', so they're never replaced
- Add debug log via runStepSanitizerLog when sanitizeCustomStepsYAML skips
  due to YAML parse failure
- Rename test case to 'expression in heredoc not extracted' (expectChanged: false)

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/6f0951d1-ca98-48dd-9f00-6596d812879b

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
@pelikhan
Copy link
Copy Markdown
Collaborator

@copilot add more tests, add fuzz tests

Additional unit tests for sanitizeRunStepExpressions:
- PR/comment body expressions extracted
- Expression with extra whitespace
- Complex expressions → hash-based env var names
- Mixed heredoc + non-heredoc expressions
- Injection attempt pattern extracted safely
- Empty run string and no expression marker
- Three expressions in a single command
- Existing env map immutability check
- Broad property test: non-heredoc run portion is ${{ }}-free after sanitization

New unit tests for sanitizeCustomStepsYAML:
- Single/multiple steps, unchanged vs changed scenarios
- Expression already in env: block not double-extracted
- Malformed YAML returned unchanged without error
- Warning content verified

Fuzz tests (run_step_sanitizer_fuzz_test.go):
- FuzzSanitizeRunStepExpressions: 30+ seed entries covering safe, unsafe,
  heredoc, complex, duplicate, unicode, and injection-attempt patterns;
  verifies no panic, correct postconditions (non-heredoc portion is clean,
  env: populated, warnings reference "shell injection", input not mutated)
- FuzzSanitizeCustomStepsYAML: verifies no panic, correct behaviour on
  malformed YAML, warning text

The fuzz test found a real bug in the initial test logic: expressions inside
heredoc blocks are intentionally left in place (correct behavior), so the
postcondition check now uses removeHeredocContent() before checking for ${{.

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/3b0b6b26-d5cb-495e-9ada-888b9960a5b9

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 28, 2026

@copilot add more tests, add fuzz tests

Done in 6de63ba.

More unit tests added (run_step_sanitizer_test.go):

  • PR/comment body expressions, expressions with extra whitespace
  • Complex expressions with operators → hash-based GH_AW_EXPR_... names
  • Mixed heredoc + non-heredoc expressions in the same script
  • Injection attempt patterns extracted safely
  • Empty run string, no-expression-marker, 3-expressions-in-one-command
  • sanitizeCustomStepsYAML coverage: single/multiple steps, safe-already, malformed YAML, warnings content
  • Env map immutability check

Fuzz tests added (run_step_sanitizer_fuzz_test.go):

  • FuzzSanitizeRunStepExpressions — 30+ seed entries; verifies no panic, non-heredoc run portion is ${{-free after sanitization, env: is populated, warnings mention "shell injection", input map is not mutated
  • FuzzSanitizeCustomStepsYAML — verifies no panic, malformed YAML returns input unchanged

The fuzz test found an actual bug in the initial test logic: expressions inside heredoc blocks are intentionally left in place, so the postcondition now correctly uses removeHeredocContent() before checking for ${{.

@pelikhan pelikhan marked this pull request as ready for review April 28, 2026 23:19
Copilot AI review requested due to automatic review settings April 28, 2026 23:19
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an automatic mitigation for GitHub Actions template/script injection by rewriting ${{ ... }} expressions found inline in run: scripts into deterministic GH_AW_... env vars, and emitting compiler warnings describing the rewrite.

Changes:

  • Introduces sanitizeRunStepExpressions / sanitizeCustomStepsYAML to extract ${{ ... }} from run: into env: and replace inline occurrences with $GH_AW_....
  • Hooks the sanitizer into both the struct/map-based step rendering path and the custom-steps YAML insertion path, emitting formatted warnings and incrementing compiler warning counts.
  • Adds extensive unit + fuzz coverage for sanitizer behavior (including heredoc handling and immutability), and updates a repo workflow to the new safe pattern.
Show a summary per file
File Description
pkg/workflow/run_step_sanitizer.go Implements run-step and custom-steps YAML sanitization + warning message generation.
pkg/workflow/run_step_sanitizer_test.go Unit tests covering extraction behavior, heredoc cases, env immutability, and warning content.
pkg/workflow/run_step_sanitizer_fuzz_test.go Fuzz tests for sanitizer robustness and key safety properties.
pkg/workflow/compiler_yaml_step_conversion.go Emits warnings and applies sanitization before rendering map-based steps.
pkg/workflow/compiler_yaml_main_job.go Sanitizes custom-steps YAML before writing and emits warnings via a helper.
.github/workflows/design-decision-gate.lock.yml Updates an existing workflow step to use extracted env vars instead of inline expressions in run:.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 6/6 changed files
  • Comments generated: 5

Comment on lines +134 to +138
existingEnv, _ := step["env"].(map[string]any)
newEnv := make(map[string]any, len(existingEnv)+len(ordered))
maps.Copy(newEnv, existingEnv)
for _, s := range ordered {
newEnv[s.EnvVar] = s.Original
Comment on lines +186 to +190
// The returned string is a replacement for the input customSteps that is safe to
// write to the generated workflow YAML. The returned warnings slice contains one
// entry per unique expression that was extracted across all steps.
//
// When no expressions are found the original customSteps string is returned
Comment on lines +149 to +153
case map[string]any:
// For complex fields like "with" or "env"
fmt.Fprintf(yaml, "%s:\n", field)
fmt.Fprintf(out, "%s:\n", field)
for key, val := range v {
fmt.Fprintf(yaml, "%s %s: %v\n", indent, key, val)
fmt.Fprintf(out, "%s %s: %v\n", indent, key, val)
Comment on lines +251 to +252
// err is only non-nil for YAML re-serialisation failures; the function
// always returns a non-empty string (either original or sanitized).
Comment thread pkg/workflow/run_step_sanitizer.go Outdated
Comment on lines +141 to +145
// Replace every occurrence of each expression in the full run: string.
// This includes content inside heredoc blocks: the runner evaluates the
// env: block assignment before passing heredoc content to the shell, so
// the variable reference is resolved correctly even inside a heredoc.
// Note: only expressions detected in the non-heredoc portion of the run:
…pressions

Documents the architectural decision to automatically extract ${{ ... }}
expressions from run: step scripts into env: variables at compile time
to prevent shell injection attacks.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Commit pushed: c2c5be4

🏗️ ADR gate enforced by Design Decision Gate 🏗️

@github-actions
Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

Test Quality Score: 77/100

⚠️ Acceptable — with minor suggestions

Metric Value
New/modified tests analyzed 9 (+ 2 fuzz tests)
✅ Design tests (behavioral contracts) 9 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 5 (56%)
Duplicate test clusters 0
Test inflation detected YES — run_step_sanitizer_test.go 2.54:1 ratio (threshold: 2:1)
🚨 Coding-guideline violations 4 bare assertions without messages (minor)

Test Classification Details

View all 9 test classifications
Test File Classification Notes
TestSanitizeRunStepExpressions (21 table rows) run_step_sanitizer_test.go:13 ✅ Design Full behavioral coverage: extraction, dedup, heredoc exclusion, edge inputs, injection pattern, warnings
TestSanitizeRunStepExpressionsOriginalNotMutated run_step_sanitizer_test.go:326 ✅ Design Immutability contract — critical for callers that reuse the input map
TestSanitizeRunStepExpressionsExistingEnvNotMutated run_step_sanitizer_test.go:343 ✅ Design Immutability of caller-owned env map — behavioral side-effect contract
TestSanitizeRunStepExpressionsEnvVarNameGeneration (4 rows) run_step_sanitizer_test.go:360 ✅ Design Observable naming contract for 4 known expression patterns
TestSanitizeRunStepExpressions_ComplexExpressionHashName run_step_sanitizer_test.go:390 ✅ Design Edge case: hash-based fallback naming for complex/operator expressions
TestSanitizeRunStepExpressions_MultilineRun run_step_sanitizer_test.go:417 ✅ Design Multi-expression extraction across lines, correct warning count
TestSanitizeRunStepExpressions_SanitizedRunHasNoInlineExpressions run_step_sanitizer_test.go:445 ✅ Design Property test: no ${{ }} survives in non-heredoc output
TestSanitizeCustomStepsYAML (8 table rows) run_step_sanitizer_test.go:471 ✅ Design Full YAML-level coverage including malformed input, empty, multi-step, already-safe steps
TestSanitizeCustomStepsYAML_WarningsDescribeExtraction run_step_sanitizer_test.go:611 ✅ Design Warning message contract: expression name, env var name, step name, "shell injection" text

Flagged Items — Minor Suggestions

⚠️ Missing assertion messages — TestSanitizeRunStepExpressions_ComplexExpressionHashName and TestSanitizeRunStepExpressions_MultilineRun

Lines: run_step_sanitizer_test.go:381, 396, 402, 429

The project guideline requires a descriptive message on every assertion call. The following bare assertions were found:

require.True(t, ok)       // lines 381, 402 — no message
require.True(t, changed)  // lines 396, 429 — no message

Suggested fix:

require.True(t, ok, "env field should be map[string]any")
require.True(t, changed, "expected sanitizer to detect and extract the complex expression")

These are minor — they don't affect correctness — but make failure output harder to diagnose in CI.


Score Calculation

Component Weight Raw Score
Behavioral Coverage 40 pts 9/9 = 100% design tests 40
Error/Edge Case Coverage 30 pts 5/9 = 56% have edge cases 17
Low Duplication 20 pts 0 duplicate clusters 20
Proportional Growth 10 pts run_step_sanitizer_test.go 625 vs 246 lines = 2.54:1 0
Total 100 77

Note on inflation: The test/production ratio exceeds 2:1, but run_step_sanitizer.go is a security-critical shell-injection sanitizer with complex heredoc and expression parsing. The comprehensive coverage — 21 table rows plus 6 standalone tests plus 2 fuzz tests — is intentionally thorough and genuinely valuable here. The inflation flag is a mechanical threshold; the tests themselves are high quality.


Language Support

Tests analyzed:

  • 🐹 Go (*_test.go): 9 Test* functions, 2 Fuzz* functions — unit (//go:build !integration)
  • 🟨 JavaScript (*.test.cjs, *.test.js): 0 tests

Build tags: ✅ Both new test files correctly declare //go:build !integration on line 1.
Mock libraries: ✅ No gomock, testify/mock, .EXPECT(), or .On() usage detected.


Verdict

Check passed. 0% of new tests are implementation tests (threshold: 30%). All 9 analyzed tests verify behavioral contracts of the new run_step_sanitizer.go with strong coverage of heredoc edge cases, mutation safety, env-var naming contracts, and shell-injection property invariants. The 4 bare require.True calls without descriptive messages are a minor style nit worth fixing but do not affect test value.


📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

  • Assert on observable outputs, return values, or state changes
  • Cover error paths and boundary conditions
  • Would catch a behavioral regression if deleted
  • Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

  • Assert on internal function calls (mocking internals)
  • Only test the happy path with typical inputs
  • Break during legitimate refactoring even when behavior is correct
  • Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

References: §25082700826

🧪 Test quality analysis by Test Quality Sentinel · ● 912.8K ·

@github-actions
Copy link
Copy Markdown
Contributor

🏗️ Design Decision Gate — ADR Required

This PR makes significant changes to core business logic (1,208 new lines in pkg/) but does not have a linked Architecture Decision Record (ADR).

AI has analyzed the PR diff and generated a draft ADR to help you get started:

📄 Draft ADR: docs/adr/28998-auto-extract-shell-injection-expressions-into-env-vars.md

What to do next

  1. Review the draft ADR committed to your branch — it was generated from the PR diff
  2. Complete the missing sections — add context the AI couldn't infer, refine the decision rationale, and list real alternatives you considered
  3. Commit the finalized ADR to docs/adr/ on your branch
  4. Reference the ADR in this PR body by adding a line such as:

    ADR: ADR-28998: Auto-Extract Shell Injection Expressions from run: Steps into Env Vars

Once an ADR is linked in the PR body, this gate will re-run and verify the implementation matches the decision.

Why ADRs Matter

"AI made me procrastinate on key design decisions. Because refactoring was cheap, I could always say 'I'll deal with this later.' Deferring decisions corroded my ability to think clearly."

ADRs create a searchable, permanent record of why the codebase looks the way it does. Future contributors (and your future self) will thank you.


📋 Michael Nygard ADR Format Reference

An ADR must contain these four sections to be considered complete:

  • Context — What is the problem? What forces are at play?
  • Decision — What did you decide? Why?
  • Alternatives Considered — What else could have been done?
  • Consequences — What are the trade-offs (positive and negative)?

All ADRs are stored in docs/adr/ as Markdown files numbered by PR number (e.g., 28998-auto-extract-shell-injection-expressions-into-env-vars.md for PR #28998).

🔒 This PR cannot merge until an ADR is linked in the PR body.

References: §25082700747

🏗️ ADR gate enforced by Design Decision Gate 🏗️ · ● 157.9K ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 77/100. Test quality is acceptable — 0% of new tests are implementation tests (threshold: 30%). All 9 analyzed tests verify behavioral contracts with strong coverage of the new shell-injection sanitizer. Minor: 4 bare require.True calls lack descriptive messages (lines 381, 396, 402, 429).

@pelikhan
Copy link
Copy Markdown
Collaborator

@copilot review all comments

1. Collision handling (run_step_sanitizer.go:134-138)
   - When the generated GH_AW_ key already exists in env: with the same value,
     reuse it silently (no overwrite, no duplicate entry).
   - When the key exists with a different value, pick GH_AW_..._2 (then _3, …)
     rather than silently overwriting the user's env var.
   - Bound the collision search loop to 100 iterations to prevent any
     pathological infinite loop; skip extraction and log a debug message if
     all suffixes are exhausted.

2. Quoted-heredoc replacement (run_step_sanitizer.go:141-145)
   - Replace strings.ReplaceAll with new replaceOutsideQuotedHeredocs() helper
     (template_injection_utils.go) that skips quoted heredoc regions.
   - Quoted heredocs suppress shell variable expansion, so replacing
     ${{ expr }} with $GH_AW_VAR inside them would write the literal var name
     to the output file instead of the expression value.
   - Unquoted heredocs are still replaced (shell expands vars there).

3. Docstring (run_step_sanitizer.go:186-190)
   - Corrected: warnings are per-step (one per unique expression within each
     step); no global deduplication across steps.

4. Non-deterministic map ordering (compiler_yaml_step_conversion.go:149-153)
   - Sorted map keys with slices.Sorted(maps.Keys(v)) before writing with/env
     blocks to produce stable YAML output.

5. Inaccurate fuzz test comment (run_step_sanitizer_fuzz_test.go:251-252)
   - Corrected: the function may return an empty string when input is empty;
     it returns the original string unchanged only on re-serialisation error.

New tests:
- TestSanitizeRunStepExpressions_CollisionReusesSameValue
- TestSanitizeRunStepExpressions_CollisionDifferentValueGetsAlternateName
- TestSanitizeRunStepExpressions_ExpressionInQuotedHeredocNotReplaced
- TestReplaceOutsideQuotedHeredocs_NoHeredoc
- TestReplaceOutsideQuotedHeredocs_QuotedHeredocPreserved
- TestReplaceOutsideQuotedHeredocs_UnquotedHeredocIsReplaced

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/cfcea2c1-c8ba-4931-9e12-00a57eab22db

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 28, 2026

@copilot review all comments

All five reviewer comments addressed in a7db751:

  1. Collision handling — when the generated GH_AW_ key already exists in env: with the same value it's reused silently; when it has a different value, an alternate name (_2, _3, …, capped at 100) is used rather than overwriting. Two new tests cover both cases.

  2. strings.ReplaceAll inside quoted heredocs — replaced with replaceOutsideQuotedHeredocs() in template_injection_utils.go. Quoted heredoc bodies (<< 'EOF') are now preserved verbatim; unquoted heredocs still get the replacement (shell expands $VAR there). Three new tests (TestReplaceOutsideQuotedHeredocs_*) and a new integration test cover this.

  3. DocstringsanitizeCustomStepsYAML comment now correctly says warnings are per-step (one per unique expression within each step), not globally deduplicated.

  4. Non-deterministic map ordering — both map[string]any rendering blocks in renderStepFromMap now use slices.Sorted(maps.Keys(v)) for stable output.

  5. Inaccurate fuzz test comment — corrected to say the function may return an empty string when input is empty.

@pelikhan pelikhan merged commit 5f8c662 into main Apr 28, 2026
58 checks passed
@pelikhan pelikhan deleted the copilot/update-compiler-review-shell-sources branch April 28, 2026 23:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants