Skip to content

Conversation

@olksdr
Copy link
Contributor

@olksdr olksdr commented Jul 28, 2025

  • Remove $logentry.formatted from DATASCRUBBER_IGNORE in convert.rs
  • Implement smart_scrub_logentry_formatted() function that:
    • Replaces emails with [email]
    • Replaces credit cards with [creditcard]
    • Replaces SSNs with [ssn]
    • Replaces IBANs with [iban]
    • Applies user-configured PII rules

Avoid replacing the entire message and preserve as much context as possible

Fixes INGEST-464

- Remove $logentry.formatted from DATASCRUBBER_IGNORE in convert.rs
- Implement smart_scrub_logentry_formatted() function that:
  - Replaces emails with [Email]
  - Replaces credit cards with [CreditCard]
  - Replaces SSNs with [SSN]
  - Replaces IBANs with [IBAN]
  - Applies user-configured PII rules
- Integrate smart scrubbing into PiiProcessor.process_string()
- Avoid replacing the entire message with [Filtered]
@olksdr olksdr self-assigned this Jul 28, 2025
@olksdr olksdr requested a review from a team as a code owner July 28, 2025 11:52
@linear
Copy link

linear bot commented Jul 28, 2025

@olksdr olksdr marked this pull request as draft July 28, 2025 11:58
olksdr added 3 commits July 28, 2025 14:30
Also added another test to make sure that logentry.formatted is not scrubbed
even when we have a word "password" in there.
@olksdr olksdr marked this pull request as ready for review July 29, 2025 09:51
@olksdr olksdr requested a review from a team July 29, 2025 09:51
cursor[bot]

This comment was marked as outdated.

@olksdr olksdr requested review from a team, Dav1dde and loewenheim July 29, 2025 11:22
cursor[bot]

This comment was marked as outdated.

@Dav1dde Dav1dde changed the title Implement smart PII scrubbing for logentry.formatted feat(pii): Implement smart PII scrubbing for logentry.formatted Jul 30, 2025
CHANGELOG.md Outdated
**Features**:

- Always emit a span usage metric, independent of span feature flags. ([#4976](https://github.com/getsentry/relay/pull/4976))
- Avoid replacing the entire value of `logentry.formatted` during PII scrubbing. ([#4985](https://github.com/getsentry/relay/pull/4985))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid replacing implies we have been replacing it already, but before it wasn't scrubbed at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, thanks to @loewenheim 👼
I will try to change this a bit more


let mut processor = PiiProcessor::new(config.compiled());
process_value(&mut event, &mut processor, ProcessingState::root()).unwrap();
assert_annotated_snapshot!(event);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a fan of having these snapshots inlined, makes it easier to review, but also nbd if you prefer this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can look into it, if that's easier to understand 👍

});

insta::assert_json_snapshot!(pii_config, @r###"
insta::assert_json_snapshot!(pii_config, @r#"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ### is the 'new' syntax insta wants and cargo insta --force-update-snapshots would change it back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always thought it's another way around, but I will run it and see how changes: 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, it seems like I do not have this option --force-update-snapshots in my local installation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, found 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it seems like i was right and running latest version of insta with cargo insta test --force-update-snapshots --all-features --workspace

cargo-insta 1.43.1

changes all the tests to single # or remove it all together when needed.

@Dav1dde should be do it in separate PR to match the latest insta and , make that change separate from this PR ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's strange, when I ran it, it did the opposite: https://github.com/getsentry/relay/pull/4908/files

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I am on an older version and they changed behavior in 1.41 -_-

yeah let's keep the insta stuff separate 👍

assert event["spans"][0]["sentry_tags"]["user.geo.subregion"] == "**"


def test_logentry_formatted_smart_scrubbing_email(mini_sentry, relay):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are very similar maybe worth consolidating into a single parameterized test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, thanks! Will do

@olksdr olksdr added this pull request to the merge queue Aug 4, 2025
Merged via the queue into master with commit b5fd7d3 Aug 4, 2025
29 checks passed
@olksdr olksdr deleted the fix/pii-logentry-formatted branch August 4, 2025 07:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants