feat(transactions): Adding support for transaction (re)naming rules #1695

olksdr · 2022-12-14T11:56:04Z

These changes introduce the support for naming rules, which currently can be applied to only url transaction source:

relay always applies first matching rule
rules applied in light normalization, before extraxting any information from the event
original transaction name is preserved in meta, and also remark added with the rule, which was applied
transaction_info.source is changed to sanitized after rule is applied

This PR also contains changes to the current implementation of Glob, adding the ability to apply the matching rule to replace parts of the matching pattern - which maybe could be done separately.

Also few existing structs were re-used / re-purposed to support new rule format, which also up for the discussions.

closes: #1657

@getsentry/owners-ingest also requesting advice on structuring the code better. Since I currently re-using few utility structs and also existing protocol struct (like TransactionSource) to keep changes smaller and also re-used existing protocol for new things.

These changes introduce the support for naming rules, which currently can be applied to only `url` transaction source: * relay always applys first matching rule * rules applied in light normalization, before extraxting any information from the event * original transaction name is preserved in meta, and also remark added with the rule, which was applied * `transaction_info.source` is changed to `sanitized` after rule is applied This PR also contains changes to the current implementation of `Glob`, addint the ability to apply the matching rule to replace parts of the matching pattern - which maybe could be done separatelly. Also few existing structs were re-used / re-purposed to support new rule format, which also up for the discussions.

jjbayer · 2022-12-14T13:39:07Z

relay-common/src/utils.rs

        Glob {
            value: glob.to_string(),
            pattern: Regex::new(&pattern).unwrap(),
+            replacer: Regex::new(&replacer).unwrap(),


Rather than compiling and storing an additional regex, should we just add an option to Glob::new to tell it which groups to capture?

Thanks for pointing this!

It was a quick test on my side, to make sure the replacements and patterns are working. But you're definitely, it does not make any sense to keep 2 regexes inside the struct. I made the Glob configurable through the builder, with default behaviour like it was before (but it might be good idea to switch the builder around) and added customer deserialize function, see b90a8ba

jjbayer · 2022-12-14T14:05:24Z

relay-general/src/store/transactions/processor.rs

+    ) -> ProcessingResult {
+        let now = Utc::now();
+        transaction.apply(|transaction, meta| {
+            let tran = format!("{}/", transaction.as_str());


Haven't thought through the consequences, but shouldn't we only add a / if it's not already there?

Theoretically it should not matter, if there are 1 or 2 slashed at the end. The ** pattern at the end of each rule will match anything anyway. If I, of course, see it correctly.

jjbayer · 2022-12-14T14:09:50Z

relay-general/src/store/transactions/rules.rs

+pub struct RuleScope {
+    /// The source of the transaction.
+    pub source: TransactionSource,
+}


nit: Let's put this above TransactionNameRule, see https://www.notion.so/sentry/HOWTO-Code-Rust-at-Sentry-7b35f165b10b4492bb95ebe1471a9ada#2695321713704369bccc5fa83ebedfa8.

done in b90a8ba

jjbayer · 2022-12-14T14:11:14Z

relay-general/src/store/transactions/rules.rs

+    /// Date time when the rule expires and it should not be applied anymore.
+    pub expiry: DateTime<Utc>,
+    /// Object containing transaction attributes the rules must only be applied to.
+    #[serde(default)]


Does it even make sense to default these two fields when they are missing? I would just make them required.

I'm was using our docs for rules definition format, and set all the defaults as the relay supposed to use. From my point of few it's better to get all of those set on deserialization rather than using Option and then check additionally if it's set and unwrap with some defaults.

E.g. RuleScope which is optional completely, but we default to url - and it much more convenient to set it straight away to what we expect to be default.

Let me know what you think.

I personally would not set any defaults. If relay can't deserialize a rule because it doesn't know about it, then ignore the rule.

relay-general/src/store/transactions/rules.rs

…lob in txname rule

jjbayer · 2022-12-15T12:59:36Z

relay-server/src/actors/project.rs

    #[serde(skip_serializing_if = "BTreeSet::is_empty")]
    pub features: BTreeSet<Feature>,
+    /// Transaction renaming rules.
+    #[serde(skip_serializing_if = "Vec::is_empty")]
+    pub tx_name_rules: Vec<TransactionNameRule>,


Please also add to LimitedProjectConfig.

jjbayer · 2022-12-15T13:08:03Z

relay-common/src/utils.rs

@@ -56,6 +121,26 @@ impl Glob {
        self.pattern.is_match(value)
    }

+    /// Currently support replacing only all `*` in the input string with provided replacement.
+    /// If no match is found, then a copy of the string is returned unchanged.
+    pub fn apply(&self, input: &str, replacement: &str) -> String {


nit

Suggested change

pub fn apply(&self, input: &str, replacement: &str) -> String {

pub fn replace_captures(&self, input: &str, replacement: &str) -> String {

iker-barriocanal

Still pending review of the glob builder, the rules module, and tests. Leaving some feedback for now.

iker-barriocanal · 2022-12-15T23:30:41Z

relay-general/src/protocol/transaction.rs

@@ -20,6 +23,8 @@ pub enum TransactionSource {
    View,
    /// Named after a software component, such as a function or class name.
    Component,
+    /// The transaction name was updated to remove high cardinality parts.


Probably a nit -- a "sanitized" transaction name doesn't necessarily require getting high cardinality parts removed. Currently, we are only going to support the removal feature, but let's not limit ourselves to that.

Suggested change

/// The transaction name was updated to remove high cardinality parts.

/// The transaction name was updated to reduce the name cardinality.

iker-barriocanal · 2022-12-15T23:32:27Z

relay-general/src/protocol/transaction.rs

 use crate::processor::ProcessValue;
 use crate::protocol::Timestamp;
 use crate::types::{Annotated, Empty, ErrorKind, FromValue, IntoValue, SkipSerialization, Value};

 /// Describes how the name of the transaction was determined.
-#[derive(Clone, Debug, Eq, PartialEq)]
+#[derive(Clone, Debug, Eq, PartialEq, Serialize, Deserialize)]
+#[serde(rename_all = "kebab-case")]


lol TIL kebab-case, I didn't know there was a name for this. It's made my day 😂

iker-barriocanal · 2022-12-15T23:47:59Z

relay-general/src/store/transactions/processor.rs

+            let slash_is_present = transaction
+                .chars()
+                .last()
+                .map(|c| c == '/')
+                .unwrap_or_default();
+
+            // Add new `/` at the end of the transaction if there isn't one.
+            if !slash_is_present {
+                transaction.push('/');
+            }


What do you think about using a context manager here? The overall idea is:

with_slashed_transaction(|transaction_name| { // do the logic below });

Then, with_slashed_transaction automatically handles this logic of (not) adding the / at the end.

I personally find this much easier to read and follow the semantics.

It might make the code a little bit prettier in this one case but one will have to jump to different function to check what's going on. We could maybe come back to this in the followup PRs.

This also can be complicated to do, since we have few cases, if the rules applied we should remove the added / and save the results in to proper places, or another case when we clean up the added slash afterwards.

If you could sketch out how you envision the code of the with_slashed_transaction would look like?

Should this function be added to Event protocol or keep it here and get event as the input data?

My thought was to leave it here, but adding it to the Event might not be a bad idea either. My idea is the following, where closure is a function with the logic below:

def with_slashed_transaction(tx, closure): if slash_is_present(tx): closure() else: add_slash(tx) closure() remove_slash(tx)

iker-barriocanal · 2022-12-15T23:50:22Z

relay-general/src/store/transactions/processor.rs

+    /// Applies the rule if any found to the transaction name.
+    ///
+    /// It find the first rule matching the criteria:
+    /// - source matchining the one provided in the rule sorce, default `url`


There's actually no default in the code in the method below, and that's correct. Using url by default is something controlled by the rule generation, and that lives in sentry and should be independent of how relay behaves.

Suggested change

/// - source matchining the one provided in the rule sorce, default `url`

/// - source matchining the one provided in the rule sorce

iker-barriocanal · 2022-12-16T00:00:48Z

relay-general/src/store/transactions/processor.rs

+                    let _ = transaction.pop();
+                    let _ = result.pop();


Suggested change

let _ = transaction.pop();

let _ = result.pop();

transaction.pop();

result.pop();

Why not? Same on line 83.

iker-barriocanal · 2022-12-16T00:27:53Z

relay-common/src/utils.rs

-pub struct Glob {
-    value: String,
-    pattern: Regex,
+/// Glob options is used to configure the behaviour underlying regex.


I find this sentence difficult to understand, and without the example below I've not been able to. I suggest the following but feel free to modify it to a different one.

Suggested change

/// Glob options is used to configure the behaviour underlying regex.

/// Glob options represent the underlying regex emulating the globs.

iker-barriocanal · 2022-12-16T00:35:30Z

relay-common/src/utils.rs

+    /// Create a new builder with all the captures enabled by default.
+    pub fn new(value: &'g str) -> Self {
+        let opts = GlobPatternOpts {
+            star: "([^/]*?)",


Why the ? in star here? The ? tries to find as less matching groups as possible, so in /abc/ the non-null groups are a, b, and c. We're interested in a matching group of abc, and we accomplish that by removing ?.

Am I missing something?

This regex will match everything expect /.

iker-barriocanal · 2022-12-16T00:37:04Z

relay-common/src/utils.rs

+    pub fn new(value: &'g str) -> Self {
+        let opts = GlobPatternOpts {
+            star: "([^/]*?)",
+            double_star: "(.*?)",


With the double star, the ? doesn't make any difference in this case. However, do we need it? I'm really not sure if I'm missing something.

this also supports proper globs, when you can have the glob like /foo/bar/**/this there ** matches any number of slashes and stuff in between.

Note that these three patterns already exist on master branch, they were just copied to a different location:

relay/relay-common/src/utils.rs

Lines 33 to 35 in 6e2f7ae

"?" => pattern.push_str("(.)"),

"**" => pattern.push_str("(.*?)"),

"*" => pattern.push_str("([^/]*?)"),

iker-barriocanal · 2022-12-16T00:37:53Z

relay-common/src/utils.rs

+            star: "([^/]*?)",
+            double_star: "(.*?)",
+            question_mark: "(.)",


Fantastic idea of surrounding the regexes with parenthesis!

iker-barriocanal · 2022-12-16T00:40:07Z

relay-common/src/utils.rs

+/// `GlobBuilder` provides the posibility to fine tune the final [`Glob`], mainly what capture
+/// groups will be enabled in the underlying regex.
+#[derive(Debug)]
+pub struct GlobBuilder<'g> {


I believe all this glob builder logic belongs to a different PR -- the complexity is enough to be on its own; it's easy to make mistakes when regexes, globs and custom logic is involved; and it makes reviewing the core functionality this PR is introducing more complicated.

There is no much logic, just a simple addition to make sure that existing or better to say, required rule application cane be done, and this happens only in replace_captures function. The rest is just a helper code.

jjbayer · 2022-12-19T08:03:52Z

relay-common/src/utils.rs

-    pattern: Regex,
+/// Glob options represent the underlying regex emulating the globs.
+#[derive(Debug)]
+struct GlobPatternOpts<'g> {


nit: I would like to give this struct a more descriptive name but I don't have one. Maybe something like GlobPatternGroups or GlobPatternBuildingBlocks?

jjbayer · 2022-12-19T08:10:41Z

relay-common/src/utils.rs

+    pub fn new(value: &'g str) -> Self {
+        let opts = GlobPatternOpts {
+            star: "([^/]*?)",
+            double_star: "(.*?)",


Note that these three patterns already exist on master branch, they were just copied to a different location:

relay/relay-common/src/utils.rs

Lines 33 to 35 in 6e2f7ae

"?" => pattern.push_str("(.)"),

"**" => pattern.push_str("(.*?)"),

"*" => pattern.push_str("([^/]*?)"),

jjbayer · 2022-12-19T08:20:41Z

relay-common/src/utils.rs

+        assert_eq!(
+            g.replace_captures("/foo/testing/1/", "*"),
+            "/foo/testing/1/"
+        );


Should we add some more test coverage for replace_captures? We could do something like

for (pattern, star, double_star, question_mark, expected_result) in [ ("**", false, true, false, "*"), // ... ] { let g = Glob::builder(pattern) .capture_star(star) .capture_double_star(double_star) .capture_question_mark(question_mark) .build(); // test }

jjbayer · 2022-12-19T08:29:38Z

relay-general/src/store/transactions/processor.rs

+                source_match(&rule.scope.source)
+                    && rule.expiry > now
+                    // Adding `/` at the end of the name, ensures that rules like /<something>/*/**
+                    // will always match the string.
+                    && rule.pattern.is_match(transaction)


How about we move this logic into a fn matches() in impl TransactionNameRule?

jjbayer · 2022-12-19T08:40:59Z

relay-general/src/store/transactions/rules.rs

+            }
+            _ => {
+                relay_log::trace!("Replacement rule type is unsupported!");
+                None


The only reason this can return None is because of the #[serde(other)] option on RedactionRule, right? I would instead change the signature to -> String, and return value.to_owned() here.

Ideally, we would remove unsupported replacement rules on deserialization. Then this branch practically becomes dead code.

Or, if we add a matches function as suggested below, that could automatically return false for any unsupported rule.

jjbayer · 2022-12-19T08:50:35Z

relay-general/src/store/transactions/rules.rs

+#[serde(tag = "method", rename_all = "snake_case")]
+pub enum RedactionRule {
+    Replace(Replace),
+    #[serde(other, skip_serializing)]


Suggested change

#[serde(other, skip_serializing)]

#[serde(other)]

If we skip serializing, an internal Relay passes on this config to a downstream Relay which then assumes the default substitution, which is not what we want.

olksdr · 2022-12-19T10:54:56Z

@jjbayer @iker-barriocanal added more tests, reworked some functions to get your review comments in. PTAL!

jjbayer

Looking good, just one more note on the slash_is_present logic.

jjbayer · 2022-12-19T12:22:19Z

relay-general/src/store/transactions/processor.rs

+                    if !slash_is_present {
+                        transaction.pop();


I think we should copy these two lines to line 59, that is, re-remove the added slash even if no rule matches at all. In other words, the original, unmodified transaction name should always survive. Maybe it's safer to use a Cow for the transaction name, and evaluate the rule matching / replacement on a copy when slash_is_present.

Currently the slash is always removed if it was added, on the line 77 is the check, even if the rule is not matching.

I was thinking to copy the string and use it for the checks and manipulations - but I also want to avoid unnecessary copies if we can avoid it.

But I see that the code now a bit longer, and keeping code more compact would help here. I'll look into this.

Sorry, I missed that. Yeah, to make this logic more resilient against future refactors it would make sense to make it more compact. Either by using a Cow or by something like a context manager as @iker-barriocanal suggested. The rust equivalent to a context manager would be the RAII pattern.

@jjbayer have a look into this commit 8199c5c, I tried to use Cow and hide the checks in the rule impl:

it doesn't change the transaction

it uses owned data only if needed

and also combined the check and apply into one function

jjbayer · 2022-12-19T13:00:13Z

relay-general/src/store/transactions/processor.rs

+                    if !slash_is_present {
+                        transaction.pop();


Sorry, I missed that. Yeah, to make this logic more resilient against future refactors it would make sense to make it more compact. Either by using a Cow or by something like a context manager as @iker-barriocanal suggested. The rust equivalent to a context manager would be the RAII pattern.

jjbayer

nice!

jjbayer · 2022-12-19T15:28:23Z

relay-general/src/store/transactions/rules.rs

+            .chars()
+            .last()
+            .map(|c| c == '/')
+            .unwrap_or_default();


btw found https://doc.rust-lang.org/std/primitive.str.html#method.ends_with

shame on me 🤦

fixed in 33fed75

olksdr requested a review from a team December 14, 2022 11:56

olksdr self-assigned this Dec 14, 2022

olksdr added 2 commits December 14, 2022 12:57

Update CHANGELOG

49ef92b

feat(transaction): use / in transaction name to make globs work

17c4ded

olksdr changed the title ~~Adding support for transaction (re)naming rules~~ feat(transactions): Adding support for transaction (re)naming rules Dec 14, 2022

jjbayer reviewed Dec 14, 2022

View reviewed changes

olksdr added 3 commits December 15, 2022 08:47

Merge branch 'master' into feat/trans-renaming-rules

8b27e09

feat(transactions): Add builder to Glob and custom deserializer for G…

b90a8ba

…lob in txname rule

Remove serde default from RedactionRule

967f2ed

olksdr requested a review from jjbayer December 15, 2022 10:02

Rearrange the code, log the trace for the wrong reduction type

5a5b1ca

jjbayer reviewed Dec 15, 2022

View reviewed changes

Address review comments

53d59a6

iker-barriocanal reviewed Dec 16, 2022

View reviewed changes

olksdr added 5 commits December 16, 2022 14:29

Merge branch 'master' into feat/trans-renaming-rules

642833c

Address some review comments

ffc24c2

Add roundtrip serialize/deserialize test

180e930

Clean up function args

689a08b

fix lint

0a6af11

jjbayer reviewed Dec 19, 2022

View reviewed changes

olksdr added 2 commits December 19, 2022 11:35

Address review comments

fc1a2f9

Add more tests for captures replace

596f3db

olksdr requested review from jjbayer and iker-barriocanal December 19, 2022 10:55

iker-barriocanal approved these changes Dec 19, 2022

View reviewed changes

jjbayer reviewed Dec 19, 2022

View reviewed changes

jjbayer approved these changes Dec 19, 2022

View reviewed changes

Use Cow on transaction name

8199c5c

jjbayer approved these changes Dec 19, 2022

View reviewed changes

Use "ends_with" on string

33fed75

olksdr enabled auto-merge (squash) December 19, 2022 15:58

olksdr disabled auto-merge December 19, 2022 15:59

olksdr merged commit d8020b0 into master Dec 20, 2022

olksdr deleted the feat/trans-renaming-rules branch December 20, 2022 05:56

	pub fn apply(&self, input: &str, replacement: &str) -> String {
	pub fn replace_captures(&self, input: &str, replacement: &str) -> String {

	/// The transaction name was updated to remove high cardinality parts.
	/// The transaction name was updated to reduce the name cardinality.

	/// - source matchining the one provided in the rule sorce, default `url`
	/// - source matchining the one provided in the rule sorce

	/// Glob options is used to configure the behaviour underlying regex.
	/// Glob options represent the underlying regex emulating the globs.

	"?" => pattern.push_str("(.)"),
	"*" => pattern.push_str("(.?)"),
	"" => pattern.push_str("([^/]?)"),

feat(transactions): Adding support for transaction (re)naming rules #1695

feat(transactions): Adding support for transaction (re)naming rules #1695

Conversation

olksdr commented Dec 14, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iker-barriocanal left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

olksdr commented Dec 19, 2022

jjbayer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jjbayer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment