-
Notifications
You must be signed in to change notification settings - Fork 104
feat(pii): Implement smart PII scrubbing for logentry.formatted
#4985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Remove $logentry.formatted from DATASCRUBBER_IGNORE in convert.rs - Implement smart_scrub_logentry_formatted() function that: - Replaces emails with [Email] - Replaces credit cards with [CreditCard] - Replaces SSNs with [SSN] - Replaces IBANs with [IBAN] - Applies user-configured PII rules - Integrate smart scrubbing into PiiProcessor.process_string() - Avoid replacing the entire message with [Filtered]
Also added another test to make sure that logentry.formatted is not scrubbed even when we have a word "password" in there.
logentry.formattedlogentry.formatted
CHANGELOG.md
Outdated
| **Features**: | ||
|
|
||
| - Always emit a span usage metric, independent of span feature flags. ([#4976](https://github.com/getsentry/relay/pull/4976)) | ||
| - Avoid replacing the entire value of `logentry.formatted` during PII scrubbing. ([#4985](https://github.com/getsentry/relay/pull/4985)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid replacing implies we have been replacing it already, but before it wasn't scrubbed at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, thanks to @loewenheim 👼
I will try to change this a bit more
relay-pii/src/processor.rs
Outdated
|
|
||
| let mut processor = PiiProcessor::new(config.compiled()); | ||
| process_value(&mut event, &mut processor, ProcessingState::root()).unwrap(); | ||
| assert_annotated_snapshot!(event); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a fan of having these snapshots inlined, makes it easier to review, but also nbd if you prefer this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can look into it, if that's easier to understand 👍
| }); | ||
|
|
||
| insta::assert_json_snapshot!(pii_config, @r###" | ||
| insta::assert_json_snapshot!(pii_config, @r#" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ### is the 'new' syntax insta wants and cargo insta --force-update-snapshots would change it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always thought it's another way around, but I will run it and see how changes: 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, it seems like I do not have this option --force-update-snapshots in my local installation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, found 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it seems like i was right and running latest version of insta with cargo insta test --force-update-snapshots --all-features --workspace
cargo-insta 1.43.1
changes all the tests to single # or remove it all together when needed.
@Dav1dde should be do it in separate PR to match the latest insta and , make that change separate from this PR ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's strange, when I ran it, it did the opposite: https://github.com/getsentry/relay/pull/4908/files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I am on an older version and they changed behavior in 1.41 -_-
yeah let's keep the insta stuff separate 👍
tests/integration/test_pii.py
Outdated
| assert event["spans"][0]["sentry_tags"]["user.geo.subregion"] == "**" | ||
|
|
||
|
|
||
| def test_logentry_formatted_smart_scrubbing_email(mini_sentry, relay): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are very similar maybe worth consolidating into a single parameterized test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, thanks! Will do
DATASCRUBBER_IGNOREinconvert.rssmart_scrub_logentry_formatted()function that:[email][creditcard][ssn][iban]Avoid replacing the entire message and preserve as much context as possible
Fixes INGEST-464