Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref(tx-processor): Apply transaction rules after UUID scrubbing #1964

Merged
merged 11 commits into from
Mar 29, 2023
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@

- Allow monitor checkins to paass `monitor_config` for monitor upserts. ([#1962](https://github.com/getsentry/relay/pull/1962))

**Internal**:

- Apply transaction clustering rules before UUID scrubbing rules. ([#1964](https://github.com/getsentry/relay/pull/1964))

## 23.3.1

**Features**:
Expand Down
1 change: 1 addition & 0 deletions py/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
- Add `thread.state` field to protocol. ([#1896](https://github.com/getsentry/relay/pull/1896))
- PII scrub `span.data` by default. ([#1953](https://github.com/getsentry/relay/pull/1953))
- Scrub sensitive cookies. ([#1951](https://github.com/getsentry/relay/pull/1951))
- Apply transaction clustering rules before UUID scrubbing rules. ([#1964](https://github.com/getsentry/relay/pull/1964))

## 0.8.19

Expand Down
159 changes: 131 additions & 28 deletions relay-general/src/store/transactions/processor.rs
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,13 @@ impl<'r> TransactionsProcessor<'r> {

if let Some((rule, result)) = result {
if *transaction != result {
meta.set_original_value(Some(transaction.clone()));
// If another rule was applied before, we don't want to
// rename the transaction name to keep the original one.
// We do want to continue adding remarks though, in
// order to keep track of all rules applied.
if meta.original_value().is_none() {
meta.set_original_value(Some(transaction.clone()));
}
Comment on lines +64 to +66
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to do this with get_or_insert_with, but I couldn't figure out how. Happy to change it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is nice and explicit.

// add also the rule which was applied to the transaction name
meta.add_remark(Remark::new(RemarkType::Substituted, rule));
*transaction = result;
Expand Down Expand Up @@ -342,26 +348,28 @@ impl Processor for TransactionsProcessor<'_> {
}

if self.name_config.scrub_identifiers {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should we rename the scrub_identifiers flag in config to normalize?

I've been thinking about naming for a while, I think we need unambiguous names to refer to the different parts of transaction name handling but haven't been able to come up with good ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the names could be better, but I don't have any suggestions (besides being very explicit and setting extremely long names, which doesn't really improve things I guess). Since I expect to remove these feature flags soonish, I didn't rename them. That said, happy to refactor the names if you come up with a good suggestion.

// Apply the rule if any found
self.apply_transaction_rename_rule(
&mut event.transaction,
event.transaction_info.value_mut(),
)?;

// Normalize transaction names for URLs and Sanitized transaction sources.
// This in addition to renaming rules can catch some high cardinality parts.
if matches!(
event.get_transaction_source(),
&TransactionSource::Url | &TransactionSource::Sanitized
) {
scrub_identifiers(&mut event.transaction)?;
if self.name_config.mark_scrubbed_as_sanitized {
event
.transaction_info
.get_or_insert_with(Default::default)
.source
.set_value(Some(TransactionSource::Sanitized));
}
}

self.apply_transaction_rename_rule(
iker-barriocanal marked this conversation as resolved.
Show resolved Hide resolved
&mut event.transaction,
event.transaction_info.value_mut(),
)?;

if matches!(event.get_transaction_source(), &TransactionSource::Url)
&& self.name_config.mark_scrubbed_as_sanitized
Comment on lines +365 to +366
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, we might want to add an an additional condition which is scrub_identifiers_changed_something OR rename_rules.len() > 0, to prevent projects that do not have any rules yet from marking everything as sanitized. But that is actually out of scope for this PR.

{
event
.transaction_info
.get_or_insert_with(Default::default)
.source
.set_value(Some(TransactionSource::Sanitized));
}
}

Expand Down Expand Up @@ -449,7 +457,7 @@ mod tests {
use super::*;
use crate::processor::process_value;
use crate::protocol::{Contexts, SpanId, TraceContext, TraceId, TransactionSource};
use crate::store::LazyGlob;
use crate::store::{LazyGlob, RedactionRule, RuleScope};
use crate::testutils::assert_annotated_snapshot;
use crate::types::Object;

Expand Down Expand Up @@ -1675,7 +1683,7 @@ mod tests {
let json = r#"
{
"type": "transaction",
"transaction": "/foo/2fd4e1c67a2d28fced849ee1bb76e7391b93eb12/user/123/0/",
"transaction": "/foo/rule-target/user/123/0/",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for these changes is to avoid UUID scrubbing rules running first in all fields. They are already running for 123 so that is good enough.

"transaction_info": {
"source": "url"
},
Expand Down Expand Up @@ -1722,8 +1730,8 @@ mod tests {
&mut event,
&mut TransactionsProcessor::new(TransactionNameConfig {
scrub_identifiers: true,
mark_scrubbed_as_sanitized: false, // ensure `source` is set by rule application
rules: rules.as_ref(),
..Default::default()
}),
ProcessingState::root(),
)
Expand Down Expand Up @@ -1758,12 +1766,18 @@ mod tests {
"transaction": {
"": {
"rem": [
[
"int",
"s",
22,
25
],
[
"/foo/*/user/*/**",
"s"
]
],
"val": "/foo/2fd4e1c67a2d28fced849ee1bb76e7391b93eb12/user/123/0/"
"val": "/foo/rule-target/user/123/0/"
}
}
}
Expand All @@ -1779,8 +1793,8 @@ mod tests {
&mut event,
&mut TransactionsProcessor::new(TransactionNameConfig {
scrub_identifiers: true,
mark_scrubbed_as_sanitized: false, // ensure `source` is set by rule application
rules: rules.as_ref(),
..Default::default()
}),
ProcessingState::root(),
)
Expand Down Expand Up @@ -1815,18 +1829,18 @@ mod tests {
"transaction": {
"": {
"rem": [
[
"/foo/*/**",
"s"
],
[
"int",
"s",
12,
15
22,
25
],
[
"/foo/*/**",
"s"
]
],
"val": "/foo/*/user/123/0/"
"val": "/foo/rule-target/user/123/0/"
}
}
}
Expand Down Expand Up @@ -1910,7 +1924,7 @@ mod tests {
let json = r#"
{
"type": "transaction",
"transaction": "/foo/2fd4e1c67a2d28fced849ee1bb76e7391b93eb12/user",
"transaction": "/foo/rule-target/user",
"transaction_info": {
"source": "url"
},
Expand Down Expand Up @@ -1984,7 +1998,7 @@ mod tests {
"s"
]
],
"val": "/foo/2fd4e1c67a2d28fced849ee1bb76e7391b93eb12/user"
"val": "/foo/rule-target/user"
}
}
}
Expand Down Expand Up @@ -2139,4 +2153,93 @@ mod tests {
"open-12345-close",
"open-12345-close"
);

#[test]
fn test_scrub_identifiers_before_rules() {
// There's a rule matching the transaction name. However, the UUID
// should be scrubbed first. Scrubbing the UUID makes the rule to not
// match the transformed transaction name anymore.

let mut event = Annotated::<Event>::from_json(
r#"{
"type": "transaction",
"transaction": "/remains/rule-target/1234567890",
"transaction_info": {
"source": "url"
},
"timestamp": "2021-04-26T08:00:00+0100",
"start_timestamp": "2021-04-26T07:59:01+0100",
"contexts": {
"trace": {
"trace_id": "4c79f60c11214eb38604f4ae0781bfb2",
"span_id": "fa90fdead5f74053"
}
}
}"#,
)
.unwrap();

process_value(
&mut event,
&mut TransactionsProcessor::new(TransactionNameConfig {
scrub_identifiers: true,
rules: &[TransactionNameRule {
pattern: LazyGlob::new("/remains/*/1234567890/".to_owned()),
expiry: Utc.with_ymd_and_hms(3000, 1, 1, 1, 1, 1).unwrap(),
scope: RuleScope::default(),
redaction: RedactionRule::default(),
}],
..Default::default()
}),
ProcessingState::root(),
)
.unwrap();

// Annotate the snapshot instead of comparing transaction names, to also
// make sure the event's _meta is correct.
assert_annotated_snapshot!(event);
}

#[test]
fn test_scrub_identifiers_and_apply_rules() {
// Ensure rules are applied after scrubbing identifiers. Rules are only
// applied when `transaction.source="url"`, so this test ensures this
// value isn't set as part of identifier scrubbing.
let mut event = Annotated::<Event>::from_json(
r#"{
"type": "transaction",
"transaction": "/remains/rule-target/1234567890",
"transaction_info": {
"source": "url"
},
"timestamp": "2021-04-26T08:00:00+0100",
"start_timestamp": "2021-04-26T07:59:01+0100",
"contexts": {
"trace": {
"trace_id": "4c79f60c11214eb38604f4ae0781bfb2",
"span_id": "fa90fdead5f74053"
}
}
}"#,
)
.unwrap();

process_value(
&mut event,
&mut TransactionsProcessor::new(TransactionNameConfig {
scrub_identifiers: true,
mark_scrubbed_as_sanitized: true,
rules: &[TransactionNameRule {
pattern: LazyGlob::new("/remains/*/**".to_owned()),
expiry: Utc.with_ymd_and_hms(3000, 1, 1, 1, 1, 1).unwrap(),
scope: RuleScope::default(),
redaction: RedactionRule::default(),
}],
}),
ProcessingState::root(),
)
.unwrap();

assert_annotated_snapshot!(event);
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
source: relay-general/src/store/transactions/processor.rs
expression: event
---
{
"type": "transaction",
"transaction": "/remains/*/*",
"transaction_info": {
"source": "sanitized"
},
"timestamp": 1619420400.0,
"start_timestamp": 1619420341.0,
"contexts": {
"trace": {
"trace_id": "4c79f60c11214eb38604f4ae0781bfb2",
"span_id": "fa90fdead5f74053",
"op": "default",
"type": "trace"
}
},
"spans": [],
"_meta": {
"transaction": {
"": {
"rem": [
[
"int",
"s",
21,
31
],
[
"/remains/*/**",
"s"
]
],
"val": "/remains/rule-target/1234567890"
}
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
---
source: relay-general/src/store/transactions/processor.rs
expression: event
---
{
"type": "transaction",
"transaction": "/remains/rule-target/*",
"transaction_info": {
"source": "url"
},
"timestamp": 1619420400.0,
"start_timestamp": 1619420341.0,
"contexts": {
"trace": {
"trace_id": "4c79f60c11214eb38604f4ae0781bfb2",
"span_id": "fa90fdead5f74053",
"op": "default",
"type": "trace"
}
},
"spans": [],
"_meta": {
"transaction": {
"": {
"rem": [
[
"int",
"s",
21,
31
]
],
"val": "/remains/rule-target/1234567890"
}
}
}
}