Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref(tx-processor): Apply transaction rules after UUID scrubbing #1964

Merged
merged 11 commits into from
Mar 29, 2023

Conversation

iker-barriocanal
Copy link
Contributor

@iker-barriocanal iker-barriocanal commented Mar 23, 2023

Applying transaction rules before UUID scrubbing makes it impossible to track what rules were applied. It's interesting to know what rules are useful to reduce the high cardinality, so that we can use them again over other rules that are less interesting or provide no value. Applying these rules after UUID scrubbing leaves no room for further modification of the transaction name, so the resulting transaction name is guaranteed to be the direct output of a rule.

This is the first step in the process of discarding the rules that provide no value and prioritizing those that do.

Ref: https://github.com/getsentry/team-ingest/issues/14.

@iker-barriocanal iker-barriocanal self-assigned this Mar 23, 2023
@iker-barriocanal iker-barriocanal requested a review from a team March 23, 2023 12:13
.get_or_insert_with(Default::default)
.source;
source.set_value(Some(TransactionSource::Sanitized));
if self.name_config.scrub_identifiers {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should we rename the scrub_identifiers flag in config to normalize?

I've been thinking about naming for a while, I think we need unambiguous names to refer to the different parts of transaction name handling but haven't been able to come up with good ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the names could be better, but I don't have any suggestions (besides being very explicit and setting extremely long names, which doesn't really improve things I guess). Since I expect to remove these feature flags soonish, I didn't rename them. That said, happy to refactor the names if you come up with a good suggestion.

Comment on lines 352 to 362
if self.name_config.mark_scrubbed_as_sanitized {
// TODO(iker): we also mark transaction as sanitized after
// applying renaming rules. Once we remove this feature
// flag, relay should only mark transactions as sanitized
// once.
event
.transaction_info
.get_or_insert_with(Default::default)
.source
.set_value(Some(TransactionSource::Sanitized));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might not matter in practice, but I would put this block after apply_transaction_rename_rule. Basically

  1. Apply sanitation step 1
  2. Apply sanitation step 2
  3. Mark as sanitized

Right now the order is

  1. Apply sanitation step 1
  2. Mark as sanitized
  3. Apply sanitation step 2

which feels unnatural to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs of mark_scrubbed_as_sanitized are as follows:

True if transaction names scrubbed by regex patterns should be marked as TransactionSource::Sanitized.

Transaction names modified by clusterer rules are always marked as such.

So the code follows these steps:

  1. Apply sanitation step 1.
  2. Mark as sanitized, resulting from step 1.
  3. Apply sanitation step 2 (which also marks as sanitized, so no 4 needed).

I think 1 and 2 belong together. I agree that this could be improved, and that's why I added the TODO, but I wanted to minimize refactoring as much as possible until we have more information on how this looks and what we do with the feature flags..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True if transaction names scrubbed by regex patterns should be marked as TransactionSource::Sanitized.

That doc was written before you removed the if changed condition though (by which it actually belonged to pattern-based matching). By removing that condition, the real consequence of mark_scrubbed_as_sanitized is that every url transaction will be marked as sanitized as long as the normalize-transaction-names feature flag is on.

Comment on lines +64 to +66
if meta.original_value().is_none() {
meta.set_original_value(Some(transaction.clone()));
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to do this with get_or_insert_with, but I couldn't figure out how. Happy to change it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is nice and explicit.

@@ -1675,7 +1684,7 @@ mod tests {
let json = r#"
{
"type": "transaction",
"transaction": "/foo/2fd4e1c67a2d28fced849ee1bb76e7391b93eb12/user/123/0/",
"transaction": "/foo/rule-target/user/123/0/",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for these changes is to avoid UUID scrubbing rules running first in all fields. They are already running for 123 so that is good enough.

Copy link
Member

@jjbayer jjbayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +64 to +66
if meta.original_value().is_none() {
meta.set_original_value(Some(transaction.clone()));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is nice and explicit.

relay-general/src/store/transactions/processor.rs Outdated Show resolved Hide resolved
Comment on lines +365 to +366
if matches!(event.get_transaction_source(), &TransactionSource::Url)
&& self.name_config.mark_scrubbed_as_sanitized
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, we might want to add an an additional condition which is scrub_identifiers_changed_something OR rename_rules.len() > 0, to prevent projects that do not have any rules yet from marking everything as sanitized. But that is actually out of scope for this PR.

@iker-barriocanal iker-barriocanal merged commit dac9b06 into master Mar 29, 2023
@iker-barriocanal iker-barriocanal deleted the iker/ref/reorder-tx-rules branch March 29, 2023 07:51
jan-auer added a commit that referenced this pull request Mar 29, 2023
* master:
  ref(tx-processor): Apply transaction rules after UUID scrubbing (#1964)
jan-auer added a commit that referenced this pull request Mar 30, 2023
* master:
  ref(buffering): Add BufferService with SQLite backend (#1920)
  ref(server): Move from actix-web to axum (#1938)
  ref(tx-processor): Apply transaction rules after UUID scrubbing (#1964)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants