-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(transactions): Adding support for transaction (re)naming rules #1695
Conversation
These changes introduce the support for naming rules, which currently can be applied to only `url` transaction source: * relay always applys first matching rule * rules applied in light normalization, before extraxting any information from the event * original transaction name is preserved in meta, and also remark added with the rule, which was applied * `transaction_info.source` is changed to `sanitized` after rule is applied This PR also contains changes to the current implementation of `Glob`, addint the ability to apply the matching rule to replace parts of the matching pattern - which maybe could be done separatelly. Also few existing structs were re-used / re-purposed to support new rule format, which also up for the discussions.
relay-common/src/utils.rs
Outdated
Glob { | ||
value: glob.to_string(), | ||
pattern: Regex::new(&pattern).unwrap(), | ||
replacer: Regex::new(&replacer).unwrap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than compiling and storing an additional regex, should we just add an option to Glob::new
to tell it which groups to capture?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this!
It was a quick test on my side, to make sure the replacements and patterns are working. But you're definitely, it does not make any sense to keep 2 regexes inside the struct. I made the Glob
configurable through the builder, with default behaviour like it was before (but it might be good idea to switch the builder around) and added customer deserialize function, see b90a8ba
) -> ProcessingResult { | ||
let now = Utc::now(); | ||
transaction.apply(|transaction, meta| { | ||
let tran = format!("{}/", transaction.as_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't thought through the consequences, but shouldn't we only add a /
if it's not already there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically it should not matter, if there are 1 or 2 slashed at the end. The **
pattern at the end of each rule will match anything anyway. If I, of course, see it correctly.
pub struct RuleScope { | ||
/// The source of the transaction. | ||
pub source: TransactionSource, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Let's put this above TransactionNameRule
, see https://www.notion.so/sentry/HOWTO-Code-Rust-at-Sentry-7b35f165b10b4492bb95ebe1471a9ada#2695321713704369bccc5fa83ebedfa8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in b90a8ba
/// Date time when the rule expires and it should not be applied anymore. | ||
pub expiry: DateTime<Utc>, | ||
/// Object containing transaction attributes the rules must only be applied to. | ||
#[serde(default)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it even make sense to default
these two fields when they are missing? I would just make them required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm was using our docs for rules definition format, and set all the defaults as the relay supposed to use. From my point of few it's better to get all of those set on deserialization rather than using Option
and then check additionally if it's set and unwrap
with some defaults.
E.g. RuleScope
which is optional completely, but we default to url
- and it much more convenient to set it straight away to what we expect to be default.
Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally would not set any defaults. If relay can't deserialize a rule because it doesn't know about it, then ignore the rule.
#[serde(skip_serializing_if = "BTreeSet::is_empty")] | ||
pub features: BTreeSet<Feature>, | ||
/// Transaction renaming rules. | ||
#[serde(skip_serializing_if = "Vec::is_empty")] | ||
pub tx_name_rules: Vec<TransactionNameRule>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add to LimitedProjectConfig
.
relay-common/src/utils.rs
Outdated
@@ -56,6 +121,26 @@ impl Glob { | |||
self.pattern.is_match(value) | |||
} | |||
|
|||
/// Currently support replacing only all `*` in the input string with provided replacement. | |||
/// If no match is found, then a copy of the string is returned unchanged. | |||
pub fn apply(&self, input: &str, replacement: &str) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
pub fn apply(&self, input: &str, replacement: &str) -> String { | |
pub fn replace_captures(&self, input: &str, replacement: &str) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still pending review of the glob builder, the rules module, and tests. Leaving some feedback for now.
@@ -20,6 +23,8 @@ pub enum TransactionSource { | |||
View, | |||
/// Named after a software component, such as a function or class name. | |||
Component, | |||
/// The transaction name was updated to remove high cardinality parts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a nit -- a "sanitized" transaction name doesn't necessarily require getting high cardinality parts removed. Currently, we are only going to support the removal feature, but let's not limit ourselves to that.
/// The transaction name was updated to remove high cardinality parts. | |
/// The transaction name was updated to reduce the name cardinality. |
use crate::processor::ProcessValue; | ||
use crate::protocol::Timestamp; | ||
use crate::types::{Annotated, Empty, ErrorKind, FromValue, IntoValue, SkipSerialization, Value}; | ||
|
||
/// Describes how the name of the transaction was determined. | ||
#[derive(Clone, Debug, Eq, PartialEq)] | ||
#[derive(Clone, Debug, Eq, PartialEq, Serialize, Deserialize)] | ||
#[serde(rename_all = "kebab-case")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol TIL kebab-case, I didn't know there was a name for this. It's made my day 😂
let slash_is_present = transaction | ||
.chars() | ||
.last() | ||
.map(|c| c == '/') | ||
.unwrap_or_default(); | ||
|
||
// Add new `/` at the end of the transaction if there isn't one. | ||
if !slash_is_present { | ||
transaction.push('/'); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about using a context manager here? The overall idea is:
with_slashed_transaction(|transaction_name| {
// do the logic below
});
Then, with_slashed_transaction
automatically handles this logic of (not) adding the /
at the end.
I personally find this much easier to read and follow the semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make the code a little bit prettier in this one case but one will have to jump to different function to check what's going on. We could maybe come back to this in the followup PRs.
This also can be complicated to do, since we have few cases, if the rules applied we should remove the added /
and save the results in to proper places, or another case when we clean up the added slash afterwards.
If you could sketch out how you envision the code of the with_slashed_transaction
would look like?
Should this function be added to Event
protocol or keep it here and get event as the input data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought was to leave it here, but adding it to the Event
might not be a bad idea either. My idea is the following, where closure
is a function with the logic below:
def with_slashed_transaction(tx, closure):
if slash_is_present(tx):
closure()
else:
add_slash(tx)
closure()
remove_slash(tx)
/// Applies the rule if any found to the transaction name. | ||
/// | ||
/// It find the first rule matching the criteria: | ||
/// - source matchining the one provided in the rule sorce, default `url` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's actually no default in the code in the method below, and that's correct. Using url
by default is something controlled by the rule generation, and that lives in sentry and should be independent of how relay behaves.
/// - source matchining the one provided in the rule sorce, default `url` | |
/// - source matchining the one provided in the rule sorce |
let _ = transaction.pop(); | ||
let _ = result.pop(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let _ = transaction.pop(); | |
let _ = result.pop(); | |
transaction.pop(); | |
result.pop(); |
Why not? Same on line 83.
relay-common/src/utils.rs
Outdated
pub struct Glob { | ||
value: String, | ||
pattern: Regex, | ||
/// Glob options is used to configure the behaviour underlying regex. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this sentence difficult to understand, and without the example below I've not been able to. I suggest the following but feel free to modify it to a different one.
/// Glob options is used to configure the behaviour underlying regex. | |
/// Glob options represent the underlying regex emulating the globs. |
/// Create a new builder with all the captures enabled by default. | ||
pub fn new(value: &'g str) -> Self { | ||
let opts = GlobPatternOpts { | ||
star: "([^/]*?)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the ?
in star
here? The ?
tries to find as less matching groups as possible, so in /abc/
the non-null groups are a
, b
, and c
. We're interested in a matching group of abc
, and we accomplish that by removing ?
.
Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This regex will match everything expect /
.
pub fn new(value: &'g str) -> Self { | ||
let opts = GlobPatternOpts { | ||
star: "([^/]*?)", | ||
double_star: "(.*?)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the double star, the ?
doesn't make any difference in this case. However, do we need it? I'm really not sure if I'm missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this also supports proper globs, when you can have the glob like /foo/bar/**/this
there **
matches any number of slashes and stuff in between.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that these three patterns already exist on master branch, they were just copied to a different location:
relay/relay-common/src/utils.rs
Lines 33 to 35 in 6e2f7ae
"?" => pattern.push_str("(.)"), | |
"**" => pattern.push_str("(.*?)"), | |
"*" => pattern.push_str("([^/]*?)"), |
star: "([^/]*?)", | ||
double_star: "(.*?)", | ||
question_mark: "(.)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic idea of surrounding the regexes with parenthesis!
/// `GlobBuilder` provides the posibility to fine tune the final [`Glob`], mainly what capture | ||
/// groups will be enabled in the underlying regex. | ||
#[derive(Debug)] | ||
pub struct GlobBuilder<'g> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe all this glob builder logic belongs to a different PR -- the complexity is enough to be on its own; it's easy to make mistakes when regexes, globs and custom logic is involved; and it makes reviewing the core functionality this PR is introducing more complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no much logic, just a simple addition to make sure that existing or better to say, required rule application cane be done, and this happens only in replace_captures
function. The rest is just a helper code.
relay-common/src/utils.rs
Outdated
pattern: Regex, | ||
/// Glob options represent the underlying regex emulating the globs. | ||
#[derive(Debug)] | ||
struct GlobPatternOpts<'g> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would like to give this struct a more descriptive name but I don't have one. Maybe something like GlobPatternGroups
or GlobPatternBuildingBlocks
?
pub fn new(value: &'g str) -> Self { | ||
let opts = GlobPatternOpts { | ||
star: "([^/]*?)", | ||
double_star: "(.*?)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that these three patterns already exist on master branch, they were just copied to a different location:
relay/relay-common/src/utils.rs
Lines 33 to 35 in 6e2f7ae
"?" => pattern.push_str("(.)"), | |
"**" => pattern.push_str("(.*?)"), | |
"*" => pattern.push_str("([^/]*?)"), |
relay-common/src/utils.rs
Outdated
assert_eq!( | ||
g.replace_captures("/foo/testing/1/", "*"), | ||
"/foo/testing/1/" | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add some more test coverage for replace_captures
? We could do something like
for (pattern, star, double_star, question_mark, expected_result) in [
("**", false, true, false, "*"),
// ...
] {
let g = Glob::builder(pattern)
.capture_star(star)
.capture_double_star(double_star)
.capture_question_mark(question_mark)
.build();
// test
}
source_match(&rule.scope.source) | ||
&& rule.expiry > now | ||
// Adding `/` at the end of the name, ensures that rules like /<something>/*/** | ||
// will always match the string. | ||
&& rule.pattern.is_match(transaction) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we move this logic into a fn matches()
in impl TransactionNameRule
?
} | ||
_ => { | ||
relay_log::trace!("Replacement rule type is unsupported!"); | ||
None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only reason this can return None
is because of the #[serde(other)]
option on RedactionRule
, right? I would instead change the signature to -> String
, and return value.to_owned()
here.
Ideally, we would remove unsupported replacement rules on deserialization. Then this branch practically becomes dead code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, if we add a matches
function as suggested below, that could automatically return false for any unsupported rule.
#[serde(tag = "method", rename_all = "snake_case")] | ||
pub enum RedactionRule { | ||
Replace(Replace), | ||
#[serde(other, skip_serializing)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[serde(other, skip_serializing)] | |
#[serde(other)] |
If we skip serializing, an internal Relay passes on this config to a downstream Relay which then assumes the default substitution, which is not what we want.
@jjbayer @iker-barriocanal added more tests, reworked some functions to get your review comments in. PTAL! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, just one more note on the slash_is_present
logic.
if !slash_is_present { | ||
transaction.pop(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should copy these two lines to line 59, that is, re-remove the added slash even if no rule matches at all. In other words, the original, unmodified transaction name should always survive. Maybe it's safer to use a Cow
for the transaction name, and evaluate the rule matching / replacement on a copy when slash_is_present
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the slash is always removed if it was added, on the line 77
is the check, even if the rule is not matching.
I was thinking to copy the string and use it for the checks and manipulations - but I also want to avoid unnecessary copies if we can avoid it.
But I see that the code now a bit longer, and keeping code more compact would help here. I'll look into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I missed that. Yeah, to make this logic more resilient against future refactors it would make sense to make it more compact. Either by using a Cow
or by something like a context manager as @iker-barriocanal suggested. The rust equivalent to a context manager would be the RAII pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if !slash_is_present { | ||
transaction.pop(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I missed that. Yeah, to make this logic more resilient against future refactors it would make sense to make it more compact. Either by using a Cow
or by something like a context manager as @iker-barriocanal suggested. The rust equivalent to a context manager would be the RAII pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
.chars() | ||
.last() | ||
.map(|c| c == '/') | ||
.unwrap_or_default(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shame on me 🤦
fixed in 33fed75
These changes introduce the support for naming rules, which currently can be applied to only
url
transaction source:transaction_info.source
is changed tosanitized
after rule is appliedThis PR also contains changes to the current implementation of
Glob
, adding the ability to apply the matching rule to replace parts of the matching pattern - which maybe could be done separately.Also few existing structs were re-used / re-purposed to support new rule format, which also up for the discussions.
closes: #1657
@getsentry/owners-ingest also requesting advice on structuring the code better. Since I currently re-using few utility structs and also existing protocol struct (like
TransactionSource
) to keep changes smaller and also re-used existing protocol for new things.