Skip to content

feat: standardise front matter sanitization via SanitizeConfig/SanitizeContent traits#210

Merged
jamesadevine merged 6 commits intomainfrom
feat/front-matter-config-trait
Apr 15, 2026
Merged

feat: standardise front matter sanitization via SanitizeConfig/SanitizeContent traits#210
jamesadevine merged 6 commits intomainfrom
feat/front-matter-config-trait

Conversation

@jamesadevine
Copy link
Copy Markdown
Collaborator

Summary

Replaces ad-hoc, per-field sanitization of front matter values with two dedicated, derivable traits that systematically cover all textual fields across the codebase.

Problem

Every time a new textual field was added to a front matter struct or safe-output config, developers had to remember to manually add sanitization — this was error-prone and led to repeated issues with unsanitized config values flowing into YAML templates and ADO API calls.

Solution

Two traits with distinct purposes

Trait Purpose Pipeline
SanitizeContent (renamed from Sanitize) Agent-generated content (safe-output results) Full: HTML escaping, @mention wrapping, bot triggers, URL protocols
SanitizeConfig (new) Operator config values (front matter, safe-output configs) Light: control chars + pipeline commands (##vso[) + content limits

Derive macros (ado-aw-derive crate)

Both traits are derivable via proc macros that auto-handle String, Option<String>, Option<Vec<String>>, Vec<String>, and HashMap<String, String> fields. Supported attributes:

  • #[sanitize_config(skip)] / #[sanitize_content(skip)] — skip a field
  • #[sanitize_config(nested)] / #[sanitize_content(nested)] — recurse into nested structs
  • #[sanitize_config(sanitize_keys)] — sanitize HashMap keys (config only)
  • #[sanitize_content(light)] — control char removal only (content only, e.g. wiki paths)

Compile-time enforcement

  • get_tool_config<T: SanitizeConfig>() — adding a new config struct without implementing the trait causes a compile error
  • compile_pipeline() — calls front_matter.sanitize_config_fields() before any template substitution

Changes

  • New crate: ado-aw-derive/ — proc macro crate with #[derive(SanitizeConfig)] and #[derive(SanitizeContent)]
  • src/sanitize.rs: New SanitizeConfig trait + sanitize_config() + sanitize_light() functions; renamed SanitizeSanitizeContent
  • src/compile/types.rs: SanitizeConfig on all front matter types (derive on structs, manual impls for 6 enums)
  • src/compile/mod.rs: Sanitization call in compile pipeline
  • src/safeoutputs/result.rs: SanitizeConfig bound on get_tool_config()
  • 18 src/safeoutputs/*.rs files: SanitizeSanitizeContent rename + SanitizeConfig derive on all Config structs

Testing

All 819 existing tests pass with no regressions. New unit tests added for sanitize_config() and sanitize_light().

jamesadevine and others added 5 commits April 15, 2026 09:26
…g structs

Add #[derive(SanitizeConfig)] from the ado_aw_derive crate to Config
structs across 9 safe-output modules:

- CreateWorkItemConfig (with #[sanitize_config(sanitize_keys)] on
  custom_fields and #[sanitize_config(nested)] on artifact_link)
- ArtifactLinkConfig
- CreateWikiPageConfig
- UpdateWikiPageConfig
- UpdateWorkItemConfig
- CreatePrConfig
- CommentOnWorkItemConfig
- AddPrCommentConfig
- QueueBuildConfig
- LinkWorkItemsConfig

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…choke-point

- Add SanitizeConfig impls for all FrontMatter sub-types (manual impls
  for enums: PoolConfig, ScheduleConfig, EngineConfig, McpConfig,
  CacheMemoryToolConfig, AzureDevOpsToolConfig; derive for structs)
- Add manual SanitizeConfig impl for FrontMatter delegating to all
  nested types (skips opaque serde_yaml::Value fields)
- Add SanitizeConfig bound to ExecutionContext::get_tool_config<T>()
  as a compile-time forcing function for config sanitization
- Call front_matter.sanitize_config_fields() in compile_pipeline()
  and check_pipeline() before template substitution
- Add Option<Vec<String>> support to both derive macros

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add SanitizeConfig trait impls for LeanRuntimeConfig (enum, manual),
LeanOptions (derive), and RuntimesConfig (manual). Update FrontMatter
impl to sanitize the new runtimes field.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Looks good overall — the architecture is clean and the compile-time forcing function via trait bounds is a solid pattern. One genuine footgun to fix, and one minor inconsistency.


Findings

🐛 Bugs / Logic Issues

  • ado-aw-derive/src/lib.rs (~line 260) — Hardcoded name skip is a silent footgun

    The derive_sanitize_content macro unconditionally skips any field named name on every struct it's applied to:

    // The `name` field on tool result structs is a tool identifier, not content.
    if field_name == "name" {
        continue;
    }

    This assumption is baked in globally rather than at the call site. Right now it happens to be correct because name in every tool_result!-generated struct is always a static constant (<Self as ToolResult>::NAME.to_string()). But anyone adding a future SanitizeContent struct with user-controlled content in a field called name would silently skip sanitization with no warning.

    Fix: Remove the global skip and instead annotate the name field in the tool_result! macro expansion with #[sanitize_content(skip)]. The macro already derives other things; one extra attribute keeps intent at the definition site rather than buried in the derive implementation.


⚠️ Suggestions

  • src/compile/types.rs (EngineOptions) — Manual impl where derive would suffice

    EngineOptions has a hand-rolled SanitizeConfig impl that only touches model. The other fields (max_turns, timeout_minutes) are Option<u32> and would be correctly ignored by the macro automatically. Using #[derive(SanitizeConfig)] on the struct directly would make this consistent with all the other option structs in the file and remove the risk of a future String field being silently unhandled.

  • ado-aw-derive/src/lib.rscrate::sanitize:: path coupling should be documented

    The generated code references crate::sanitize::sanitize_config, crate::sanitize::sanitize_light, and crate::sanitize::SanitizeConfig — hardcoded to the consumer crate's module layout. Since this is an internal proc macro the coupling is intentional, but a short doc comment on the crate (or in lib.rs) explaining this constraint would prevent confusion if the crate is ever considered for extraction.


✅ What Looks Good

  • The two-trait split (SanitizeConfig for operator-controlled config vs SanitizeContent for agent-generated content) correctly models the different trust boundaries and sanitization needs.
  • Compile-time enforcement via the SanitizeConfig bound on get_tool_config<T> is the right approach — forgetting to implement the trait is a compile error, not a silent gap.
  • FrontMatter::sanitize_config_fields coverage is comprehensive, and the comments explaining intentional skips (safe_outputs, steps/teardown) are clear.
  • The check_pipeline path also calls sanitize_config_fields(), keeping compile and check consistent.
  • sanitize_light intentionally omits ##vso[ neutralization for wiki paths (API payloads, not pipeline YAML), and the test makes this contract explicit.

Generated by Rust PR Reviewer for issue #210 · ● 1.2M ·

- Remove hardcoded 'name' field skip from SanitizeContent derive —
  the skip was a silent footgun for future structs with user-controlled
  'name' fields. Tool result structs use manual SanitizeContent impls
  that already explicitly list fields, so this skip was redundant.
- Replace manual EngineOptions SanitizeConfig impl with derive — the
  manual impl only touched model (Option<String>) which the derive
  handles automatically, and risked missing future String fields.
- Add doc comment to ado-aw-derive explaining the crate::sanitize::
  path coupling constraint for maintainability.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jamesadevine jamesadevine merged commit 85ac3ab into main Apr 15, 2026
4 checks passed
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Well-designed systematic improvement — a few maintenance footguns in the derive macro worth being aware of.

Findings

⚠️ Suggestions

  • ado-aw-derive/src/lib.rs:40-51 — Silent attribute typo acceptance

    parse_field_attrs uses let _ = attr.parse_nested_meta(|..| Ok(())) which silently accepts any unrecognized attribute ident. A developer who writes #[sanitize_config(skipp)] or #[sanitize_content(nesting)] gets a compile that succeeds but the intended skip/nesting silently never happens. The field ends up sanitized when it shouldn't be (or vice versa). Suggest returning an error for unrecognized idents:

    } else {
        return Err(meta.error("unknown sanitize_config attribute"));
    }
  • ado-aw-derive/src/lib.rs:298-326SanitizeContent derive silently skips HashMap fields

    derive_sanitize_content has no HashMap<String, String> branch (unlike derive_sanitize_config). If a future result struct adds a HashMap field and uses #[derive(SanitizeContent)], the values would be silently unsanitized. Currently safe because all affected impls (QueueBuildResult.parameters) are manual, but worth a comment or a diagnostic:

    // else: skip non-string types (HashMap not supported — implement manually if needed)
  • src/safeoutputs/result.rs:71 — Deserialization errors silently swallowed in get_tool_config()

    serde_json::from_value(...).ok().unwrap_or_default() means an operator typo in a config struct (e.g., wrong field type) silently falls back to defaults with no diagnostic. This is pre-existing, but now that SanitizeConfig is a required bound, it would be a natural moment to add a log::warn! on the Err branch:

    .and_then(|v| serde_json::from_value(v.clone())
        .map_err(|e| { log::warn!("Invalid config for '{}': {e}", tool_name); e })
        .ok())

✅ What Looks Good

  • Compile-time enforcement via SanitizeConfig bound on get_tool_config() is exactly the right pattern — adding a new config struct without implementing the trait produces a clear compile error, which directly addresses the original "easy to forget" problem.
  • Two-trait split (SanitizeContent / SanitizeConfig) is clean and well-motivated: the lighter config pipeline correctly avoids HTML-escaping identifiers like area paths and emails.
  • Manual FrontMatter impl correctly defers safe_outputs (opaque JSON) to Stage 2, with a clear comment explaining why. The McpOptions.env keys relying on is_valid_env_var_name() validation at generation time is also appropriate.
  • Generic support in the derive (String, Option(String), Vec(String), Option(Vec(String)), HashMap) covers the full surface area of existing structs.

Generated by Rust PR Reviewer for issue #210 · ● 1.7M ·

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant