Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(transactions): Adding support for transaction (re)naming rules #1695

Merged
merged 17 commits into from
Dec 20, 2022

Conversation

olksdr
Copy link
Contributor

@olksdr olksdr commented Dec 14, 2022

These changes introduce the support for naming rules, which currently can be applied to only url transaction source:

  • relay always applies first matching rule
  • rules applied in light normalization, before extraxting any information from the event
  • original transaction name is preserved in meta, and also remark added with the rule, which was applied
  • transaction_info.source is changed to sanitized after rule is applied

This PR also contains changes to the current implementation of Glob, adding the ability to apply the matching rule to replace parts of the matching pattern - which maybe could be done separately.

Also few existing structs were re-used / re-purposed to support new rule format, which also up for the discussions.

closes: #1657

@getsentry/owners-ingest also requesting advice on structuring the code better. Since I currently re-using few utility structs and also existing protocol struct (like TransactionSource) to keep changes smaller and also re-used existing protocol for new things.

These changes introduce the support for naming rules, which currently
can be applied to only `url` transaction source:

* relay always applys first matching rule
* rules applied in light normalization, before extraxting any
  information from the event
* original transaction name is preserved in meta, and also remark added
  with the rule, which was applied
* `transaction_info.source` is changed to `sanitized` after rule is
  applied

This PR also contains changes to the current implementation of `Glob`,
addint the ability to apply the matching rule to replace parts of the
matching pattern - which maybe could be done separatelly.

Also few existing structs were re-used / re-purposed to support new rule
format, which also up for the discussions.
@olksdr olksdr requested a review from a team December 14, 2022 11:56
@olksdr olksdr self-assigned this Dec 14, 2022
@olksdr olksdr changed the title Adding support for transaction (re)naming rules feat(transactions): Adding support for transaction (re)naming rules Dec 14, 2022
Glob {
value: glob.to_string(),
pattern: Regex::new(&pattern).unwrap(),
replacer: Regex::new(&replacer).unwrap(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than compiling and storing an additional regex, should we just add an option to Glob::new to tell it which groups to capture?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this!

It was a quick test on my side, to make sure the replacements and patterns are working. But you're definitely, it does not make any sense to keep 2 regexes inside the struct. I made the Glob configurable through the builder, with default behaviour like it was before (but it might be good idea to switch the builder around) and added customer deserialize function, see b90a8ba

) -> ProcessingResult {
let now = Utc::now();
transaction.apply(|transaction, meta| {
let tran = format!("{}/", transaction.as_str());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't thought through the consequences, but shouldn't we only add a / if it's not already there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically it should not matter, if there are 1 or 2 slashed at the end. The ** pattern at the end of each rule will match anything anyway. If I, of course, see it correctly.

pub struct RuleScope {
/// The source of the transaction.
pub source: TransactionSource,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in b90a8ba

/// Date time when the rule expires and it should not be applied anymore.
pub expiry: DateTime<Utc>,
/// Object containing transaction attributes the rules must only be applied to.
#[serde(default)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it even make sense to default these two fields when they are missing? I would just make them required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm was using our docs for rules definition format, and set all the defaults as the relay supposed to use. From my point of few it's better to get all of those set on deserialization rather than using Option and then check additionally if it's set and unwrap with some defaults.

E.g. RuleScope which is optional completely, but we default to url - and it much more convenient to set it straight away to what we expect to be default.

Let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally would not set any defaults. If relay can't deserialize a rule because it doesn't know about it, then ignore the rule.

relay-general/src/store/transactions/rules.rs Show resolved Hide resolved
@olksdr olksdr requested a review from jjbayer December 15, 2022 10:02
#[serde(skip_serializing_if = "BTreeSet::is_empty")]
pub features: BTreeSet<Feature>,
/// Transaction renaming rules.
#[serde(skip_serializing_if = "Vec::is_empty")]
pub tx_name_rules: Vec<TransactionNameRule>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add to LimitedProjectConfig.

@@ -56,6 +121,26 @@ impl Glob {
self.pattern.is_match(value)
}

/// Currently support replacing only all `*` in the input string with provided replacement.
/// If no match is found, then a copy of the string is returned unchanged.
pub fn apply(&self, input: &str, replacement: &str) -> String {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
pub fn apply(&self, input: &str, replacement: &str) -> String {
pub fn replace_captures(&self, input: &str, replacement: &str) -> String {

Copy link
Contributor

@iker-barriocanal iker-barriocanal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still pending review of the glob builder, the rules module, and tests. Leaving some feedback for now.

@@ -20,6 +23,8 @@ pub enum TransactionSource {
View,
/// Named after a software component, such as a function or class name.
Component,
/// The transaction name was updated to remove high cardinality parts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a nit -- a "sanitized" transaction name doesn't necessarily require getting high cardinality parts removed. Currently, we are only going to support the removal feature, but let's not limit ourselves to that.

Suggested change
/// The transaction name was updated to remove high cardinality parts.
/// The transaction name was updated to reduce the name cardinality.

use crate::processor::ProcessValue;
use crate::protocol::Timestamp;
use crate::types::{Annotated, Empty, ErrorKind, FromValue, IntoValue, SkipSerialization, Value};

/// Describes how the name of the transaction was determined.
#[derive(Clone, Debug, Eq, PartialEq)]
#[derive(Clone, Debug, Eq, PartialEq, Serialize, Deserialize)]
#[serde(rename_all = "kebab-case")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol TIL kebab-case, I didn't know there was a name for this. It's made my day 😂

Comment on lines 46 to 55
let slash_is_present = transaction
.chars()
.last()
.map(|c| c == '/')
.unwrap_or_default();

// Add new `/` at the end of the transaction if there isn't one.
if !slash_is_present {
transaction.push('/');
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about using a context manager here? The overall idea is:

with_slashed_transaction(|transaction_name| {
  // do the logic below
});

Then, with_slashed_transaction automatically handles this logic of (not) adding the / at the end.

I personally find this much easier to read and follow the semantics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make the code a little bit prettier in this one case but one will have to jump to different function to check what's going on. We could maybe come back to this in the followup PRs.

This also can be complicated to do, since we have few cases, if the rules applied we should remove the added / and save the results in to proper places, or another case when we clean up the added slash afterwards.

If you could sketch out how you envision the code of the with_slashed_transaction would look like?

Should this function be added to Event protocol or keep it here and get event as the input data?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought was to leave it here, but adding it to the Event might not be a bad idea either. My idea is the following, where closure is a function with the logic below:

def with_slashed_transaction(tx, closure):
  if slash_is_present(tx):
      closure()
  else:
      add_slash(tx)
      closure()
      remove_slash(tx)

/// Applies the rule if any found to the transaction name.
///
/// It find the first rule matching the criteria:
/// - source matchining the one provided in the rule sorce, default `url`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's actually no default in the code in the method below, and that's correct. Using url by default is something controlled by the rule generation, and that lives in sentry and should be independent of how relay behaves.

Suggested change
/// - source matchining the one provided in the rule sorce, default `url`
/// - source matchining the one provided in the rule sorce

Comment on lines 68 to 69
let _ = transaction.pop();
let _ = result.pop();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let _ = transaction.pop();
let _ = result.pop();
transaction.pop();
result.pop();

Why not? Same on line 83.

pub struct Glob {
value: String,
pattern: Regex,
/// Glob options is used to configure the behaviour underlying regex.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this sentence difficult to understand, and without the example below I've not been able to. I suggest the following but feel free to modify it to a different one.

Suggested change
/// Glob options is used to configure the behaviour underlying regex.
/// Glob options represent the underlying regex emulating the globs.

/// Create a new builder with all the captures enabled by default.
pub fn new(value: &'g str) -> Self {
let opts = GlobPatternOpts {
star: "([^/]*?)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the ? in star here? The ? tries to find as less matching groups as possible, so in /abc/ the non-null groups are a, b, and c. We're interested in a matching group of abc, and we accomplish that by removing ?.

Am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regex will match everything expect /.

pub fn new(value: &'g str) -> Self {
let opts = GlobPatternOpts {
star: "([^/]*?)",
double_star: "(.*?)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the double star, the ? doesn't make any difference in this case. However, do we need it? I'm really not sure if I'm missing something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also supports proper globs, when you can have the glob like /foo/bar/**/this there ** matches any number of slashes and stuff in between.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that these three patterns already exist on master branch, they were just copied to a different location:

"?" => pattern.push_str("(.)"),
"**" => pattern.push_str("(.*?)"),
"*" => pattern.push_str("([^/]*?)"),

Comment on lines +29 to +31
star: "([^/]*?)",
double_star: "(.*?)",
question_mark: "(.)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic idea of surrounding the regexes with parenthesis!

/// `GlobBuilder` provides the posibility to fine tune the final [`Glob`], mainly what capture
/// groups will be enabled in the underlying regex.
#[derive(Debug)]
pub struct GlobBuilder<'g> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe all this glob builder logic belongs to a different PR -- the complexity is enough to be on its own; it's easy to make mistakes when regexes, globs and custom logic is involved; and it makes reviewing the core functionality this PR is introducing more complicated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no much logic, just a simple addition to make sure that existing or better to say, required rule application cane be done, and this happens only in replace_captures function. The rest is just a helper code.

pattern: Regex,
/// Glob options represent the underlying regex emulating the globs.
#[derive(Debug)]
struct GlobPatternOpts<'g> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would like to give this struct a more descriptive name but I don't have one. Maybe something like GlobPatternGroups or GlobPatternBuildingBlocks?

pub fn new(value: &'g str) -> Self {
let opts = GlobPatternOpts {
star: "([^/]*?)",
double_star: "(.*?)",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that these three patterns already exist on master branch, they were just copied to a different location:

"?" => pattern.push_str("(.)"),
"**" => pattern.push_str("(.*?)"),
"*" => pattern.push_str("([^/]*?)"),

assert_eq!(
g.replace_captures("/foo/testing/1/", "*"),
"/foo/testing/1/"
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add some more test coverage for replace_captures? We could do something like

for (pattern, star, double_star, question_mark, expected_result) in [
    ("**", false, true, false, "*"),
    // ...
] {
    let g = Glob::builder(pattern)
            .capture_star(star)
            .capture_double_star(double_star)
            .capture_question_mark(question_mark)
            .build();
    // test
}

Comment on lines 64 to 68
source_match(&rule.scope.source)
&& rule.expiry > now
// Adding `/` at the end of the name, ensures that rules like /<something>/*/**
// will always match the string.
&& rule.pattern.is_match(transaction)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we move this logic into a fn matches() in impl TransactionNameRule?

}
_ => {
relay_log::trace!("Replacement rule type is unsupported!");
None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason this can return None is because of the #[serde(other)] option on RedactionRule, right? I would instead change the signature to -> String, and return value.to_owned() here.

Ideally, we would remove unsupported replacement rules on deserialization. Then this branch practically becomes dead code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, if we add a matches function as suggested below, that could automatically return false for any unsupported rule.

#[serde(tag = "method", rename_all = "snake_case")]
pub enum RedactionRule {
Replace(Replace),
#[serde(other, skip_serializing)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[serde(other, skip_serializing)]
#[serde(other)]

If we skip serializing, an internal Relay passes on this config to a downstream Relay which then assumes the default substitution, which is not what we want.

@olksdr
Copy link
Contributor Author

olksdr commented Dec 19, 2022

@jjbayer @iker-barriocanal added more tests, reworked some functions to get your review comments in. PTAL!

Copy link
Member

@jjbayer jjbayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, just one more note on the slash_is_present logic.

Comment on lines 62 to 63
if !slash_is_present {
transaction.pop();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should copy these two lines to line 59, that is, re-remove the added slash even if no rule matches at all. In other words, the original, unmodified transaction name should always survive. Maybe it's safer to use a Cow for the transaction name, and evaluate the rule matching / replacement on a copy when slash_is_present.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the slash is always removed if it was added, on the line 77 is the check, even if the rule is not matching.

I was thinking to copy the string and use it for the checks and manipulations - but I also want to avoid unnecessary copies if we can avoid it.

But I see that the code now a bit longer, and keeping code more compact would help here. I'll look into this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed that. Yeah, to make this logic more resilient against future refactors it would make sense to make it more compact. Either by using a Cow or by something like a context manager as @iker-barriocanal suggested. The rust equivalent to a context manager would be the RAII pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjbayer have a look into this commit 8199c5c, I tried to use Cow and hide the checks in the rule impl:

  • it doesn't change the transaction
  • it uses owned data only if needed
  • and also combined the check and apply into one function

Comment on lines 62 to 63
if !slash_is_present {
transaction.pop();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed that. Yeah, to make this logic more resilient against future refactors it would make sense to make it more compact. Either by using a Cow or by something like a context manager as @iker-barriocanal suggested. The rust equivalent to a context manager would be the RAII pattern.

Copy link
Member

@jjbayer jjbayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

.chars()
.last()
.map(|c| c == '/')
.unwrap_or_default();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shame on me 🤦

fixed in 33fed75

@olksdr olksdr enabled auto-merge (squash) December 19, 2022 15:58
@olksdr olksdr disabled auto-merge December 19, 2022 15:59
@olksdr olksdr merged commit d8020b0 into master Dec 20, 2022
@olksdr olksdr deleted the feat/trans-renaming-rules branch December 20, 2022 05:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support transaction renaming rules
3 participants