Skip to content

Conversation

@k-fish
Copy link
Member

@k-fish k-fish commented Aug 13, 2025

Summary

Attributes weren't having object rules applied since objects expect a value as a direct descendant.

The main problem is that the PatternType::KeyValue in our default scrubbers rules (eg. password KV regex, strip-fields a.k.a. sensitive fields) expects to walk an Object and encounter the Value (eg. secret132) as a direct descendant; Attributes gets walked but it's values are Attribute, which doesn't match the KV rule expected structure, Attribute contains a flattened substructure AttributeValue under value. For default scrubber rules we still want to target that inner .value.

So we have two behaviours we need to support for any schema using attributes:

  1. Default scrubbers KV values work (eg. an attribute called 'password' should be scrubbed)
  2. Advanced rules allowing more specific selection for the nested substructure in Attributes

For advanced rules when walking Attributes we want to be able to avoid ambiguity, we don't want the attribute to be selectable at $log.attributes.'my attribute name' and instead only explicitly selectable at $log.attributes.'my attribute name'.value for value. .type for type, etc.

Possible ways to fix this:

  1. Extend old rules to also apply to new rules to include more explicit selection. Possibly introduce some form of deeper selectors (eg. PatternType::ParentKeyValue), update RedactPair and Password to also apply parent rules, then also always pass state.key() and parent_state.key() to apply_rules functions so ParentKeyValue can test against parent_state.key(). This still has problems with sensitive fields as users may define 'scary_attribute' as a sensitive field and expect an attribute to be filtered out, instead of having to do scary_attribute.value.
  2. Rewrite this all to have less branching conditions, hooks and instead use some existing standard for selectors (eg. jsonpath) 😄
  3. [Implemented in this PR] Special case default scrubbing rules (eg. password scrubber, sensitive fields etc.) vs. advanced data scrubbing rules (modal based rules) to change whether we apply the selector with an implicit attribute name or an explicit attribute name. Moving forward with logs and other TraceItem type data selectors we then should not expose the selector syntax in the UI for advanced data scrubbing, so we can instead always save attribute scrubbing as $log.attributes.X.value. We can always still do option 1 in the future on top of this approach to clean up the branching.

Attributes weren't having object rules applied since objects expect a value as a direct descendant, but attributes have a hidden value to hold the enum for TraceItem variants of attrs. This uses a custom ProcessValue impl to pull the sub .value out.
@k-fish k-fish requested a review from a team as a code owner August 13, 2025 23:33
cursor[bot]

This comment was marked as outdated.

Copy link
Member

@Dav1dde Dav1dde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for Codeowners, no review.

@k-fish k-fish added this pull request to the merge queue Aug 25, 2025
Merged via the queue into master with commit 8cd67f2 Aug 25, 2025
29 checks passed
@k-fish k-fish deleted the ref/ourlogs/improve-pii-scrubbing-attributes branch August 25, 2025 02:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants