-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(relay): Implement decaying rule [TET-607] #1692
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general concept makes sense to me, but I have some comments on the code structure.
relay-sampling/src/lib.rs
Outdated
// We round to 2 digits in order to avoid high-precision sample rates that are more difficult | ||
// to work with. Rounding is performed following the nearest integer. | ||
let sample_rate_difference = | ||
f64::round((from_sample_rate - self.external_params.base_sample_rate) * 100.0) / 100.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are rounded sample rates easier to work with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is easier to debug and in general I thought we were going to keep 2 decimal digits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By multiplying sample_rate_difference * inverse_progress
, you potentially get a lot more decimals again, right? If we want to do 1% steps of decay (e.g. 0.7, 0.69, 0.68, ...
) I would round when the computation is done, e.g. in get_sample_rate
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you are saying is correct but I did that on purpose.
I had at first rounded only the sample difference because it was behaving quite weird, in the sense that I was getting 0.7 - 0.2 = 0.4999999 due to floating arithmetic approximations I think. And this was going to screw up the whole calculation because we started doing that with an ill number.
I think that we could simplify everything and round at the end, with some more loss propagated through multiplications but that is fine. The only problem I see is that we will have weird numbers in tests if we choose for example the 50% progress time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do increased precision and floating point glitches really make this more difficult to work with? In the end, you can just make your "correct" calculation and use the result. If you ever have to debug print the value and receive 0.49999...
, it is still clear what the value is. Besides, interpolation will give you all sorts of values anyway.
Also, please do not round to integer percentages. We need more precision than that, considering that a realistic uniform sample rate is 1%. You can, of course, round your rule targets, but please do not round here.
I would suggest to remove complexity and not round anywhere in Relay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Thanks for the explanation Jan!
relay-sampling/src/lib.rs
Outdated
/// want to inversely reduce the sample rate in the interval (from - base_sample_rate). | ||
/// Thus, in the 70% case, we would take only 30% of the (from - base_sample_rate) which | ||
/// is equal to 0.06. | ||
fn execute_linear_decay(&self, from_sample_rate: f64) -> f64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: There are variables in this function that begin with start_
and end_
, so to clarify intent I would rename from_sample_rate
to start_sample_rate
, and create an alias let end_sample_rate = self.external_params.base_sample_rate
.
relay-sampling/src/lib.rs
Outdated
|
||
/// A struct containing the set of external params required by a decaying function. | ||
#[derive(Clone, Copy, Debug)] | ||
struct DecayingFunctionExternalParams { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that the types involved in this logic are too tightly coupled to the serialization format, in that we differentiate between internal and external params.
In order to decouple them, can we construct something like that from the SamplingRule
, e.g.
struct LinearDecay {
// whatever model parameters a linear function needs. In theory only two (`(k, d)` for `y = kx + d`),
// but it might make sense to store `(start, end)` timestamps and sample rates as well.
start_time: DateTime<Utc>,
end_time: DateTime<Utc>,
start_rate: f64,
end_rate: f64,
}
enum DecayFunc {
// ...
Linear(LinearDecay),
}
impl SamplingRule {
fn get_decay_fn(&self) -> DecayFunc {
// construct DecayFunc from rule here. We can also cache it on the sample rule
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you deserialize this? I wanted to keep my change as small as possible on the existing rule payload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your proposal makes total sense, indeed, it is the simplest way of doing it but this requires the removal of the timerange, which we can for sure do but we need to be careful with backward compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would leave the existing rule payload as-is, and create these helper structs independently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would end up creating duplication within the json. Do we want that? And how do we treat the TimeRange, because it won't be needed anymore if we keep intervals within the functions themselves.
relay-sampling/src/lib.rs
Outdated
/// | ||
/// A decaying function is responsible of decaying the sample rate from a value to another following | ||
/// a given curve. | ||
#[derive(Debug, Clone, Copy, PartialEq, Serialize, Deserialize)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[derive(Debug, Clone, Copy, PartialEq, Serialize, Deserialize)] | |
#[derive(Default, Debug, Clone, Copy, PartialEq, Serialize, Deserialize)] |
relay-sampling/src/lib.rs
Outdated
impl Default for DecayingFunction { | ||
fn default() -> Self { | ||
Self::Constant | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
impl Default for DecayingFunction { | |
fn default() -> Self { | |
Self::Constant | |
} | |
} |
relay-sampling/src/lib.rs
Outdated
if let Some(params) = self.get_decaying_function_params(now) { | ||
self.decaying_fn.call(params) | ||
} else { | ||
self.sample_rate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that sample_rate
is not the base sample rate anymore, I believe that instead of falling back to this value we should make sure that invalid rules do not match any events. We already have the is_active
mechanism for that, should we extend that function so it returns false for invalid rules?
relay/relay-sampling/src/lib.rs
Lines 741 to 742 in 0e8bd22
if !rule.is_active() { | |
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jan-auer this is a good point, with the new implementation, we would need to decide what to do in case of invalid params. We could either:
- Implement a mechanism that automatically falls back either to
decayedSampleRate
orsampleRate
based on the function type. - Make automatically the rule invalid by performing a validation of params before
get_sample_rate
- Propagate some error but this is generally not suggested and looking at the current implementation it is not done
Just to clarify, this can happen for example if timeRange
has start
and not end
, irrespectively if we specify a decaying function or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could write a custom deserializer for SamplingConfig::rules
, which removes structurally invalid rules and logs an error for them. But get_decaying_function_params
also depends on now
, and that part of the "validation" should go into is_active
IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that first we would need to clarify whether we want to allow open intervals or not.
If we wanted to do it better, I would have redefined the timerange with only closed intervals and then automatically use the constant decaying if not specified. Then for each function build custom extra validation and fallback mechanism(s).
I don't know tho, if there is a proper rationale behind the decision of open intervals. For DS we don't need it (as of now).
CHANGELOG.md
Outdated
|
||
- Add support for `limits.keepalive_timeout` configuration. ([#1645](https://github.com/getsentry/relay/pull/1645)) | ||
- Add support for decaying functions in sampling rules. ([#1692](https://github.com/getsentry/relay/pull/1692)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Add support for decaying functions in sampling rules. ([#1692](https://github.com/getsentry/relay/pull/1692)) | |
- Add support for decaying functions in dynamic sampling rules. ([#1692](https://github.com/getsentry/relay/pull/1692)) |
@@ -427,57 +399,95 @@ impl SamplingRule { | |||
/// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's update this doc comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ One line above, it says Returns whether the sampling rule is active
. We should change it to "Returns an ActiveRule
if the rule is active", or something like that.
relay-sampling/src/lib.rs
Outdated
end: Some(end), | ||
} = self.time_range | ||
{ | ||
if self.sample_rate > decayed_sample_rate && start < end && now >= start { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because time_range.contains
is not called in this branch anymore, we need to also check that now
is < end
(feel free to change <=
to <
and vice versa).
if self.sample_rate > decayed_sample_rate && start < end && now >= start { | |
if self.sample_rate > decayed_sample_rate && start <= now && now < end { |
We do not need to check start < end
explicitly then, because it follows from start <= now < end
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently this was not caught by any test case. Maybe we should add one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this was done with a purpose but for the previous implementation, now we should explicitly verify like you said because the check is in is_active
.
relay-sampling/src/lib.rs
Outdated
/// A sampling rule that has been successfully matched and that contains all the required data | ||
/// to return the sample rate. | ||
#[derive(Debug, Clone, Copy)] | ||
pub struct MatchingRule { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ActiveRule
might actually be a better name now that I think of it, because it is created by is_active
.
pub struct MatchingRule { | |
pub struct ActiveRule { |
relay-sampling/src/lib.rs
Outdated
DecayingFunction::Linear { | ||
/// A struct representing the evaluation context of a sample rate. | ||
#[derive(Debug, Clone, Copy)] | ||
enum SampleRateEvaluator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually declare types before using them, so In this case, declare SampleRateEvaluator
, then MatchingRule
, then SamplingRule
. See internal docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of nitpicks, apart from those this looks good to me!
@@ -427,57 +399,95 @@ impl SamplingRule { | |||
/// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ One line above, it says Returns whether the sampling rule is active
. We should change it to "Returns an ActiveRule
if the rule is active", or something like that.
// For consistency reasons we take a snapshot in time and use that time across all code that | ||
// requires it. | ||
let now = Utc::now(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call!
relay-sampling/src/lib.rs
Outdated
pub fn get_sample_rate(&self, now: DateTime<Utc>) -> f64 { | ||
self.evaluator.evaluate(now) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Because SamplingRule
uses active rule, I would change the declaration order to
SampleRateEvaluator
ActiveRule
SamplingRule
); | ||
|
||
assert_eq!( | ||
prepare_and_get_sampling_rule(1.0, EventType::Transaction, &project_state, now) | ||
.unwrap() | ||
.unwrap() | ||
.sample_rate, | ||
0.7 | ||
0.44999999999999996 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks flaky, should we rather do something like (expected - result).abs() < eps
instead? See https://rust-lang.github.io/rust-clippy/master/#float_cmp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good call! I totally overlooked the precision, especially considering that it can differ from architecture to architecture.
A decaying function is a function that interpolates the value of the sample rate based on a set of parameters (e.g.
start
,end
,fromSampleRate
). The rationale behind the decaying function is to be able to systematically change the sample rate based on multiple parameters. This should be especially helpful with the behavior of latest releases, in which the number of transactions grows non immediately and we want to inversely follow that trend.There exist currently two types of decaying functions:
constant
→ returns thesampleRate
irrespectively of the time of evaluation.linear
→ returns a decreasing sample rate fromsampleRate
todecayedSampleRate
based on the time of evaluation.By default, if no decaying function is specified, the
constant
function will be used, which returns thesampleRate
at any point within the specified time range.Both decaying functions require time ranges, that are fetched from the previously existing
timeRange
field:Depending on which
decayingFn
is used, thetimeRange
check will be different:constant
→ a valid time range can be open or closed (which means that you can have both intervals closed or at most one side of the interval open) and, if both are set,start < end
.linear
→ a valid time range must be closed andstart < end
.In case at least one of the required conditions are not met, the rule with be considered
inactive
and won’t be matched.Implementation
The implementation of decaying rules introduces the idea of an
ActiveRule
which is a rule returned only in case it is active. A rule is active if it satisfies a series of conditions that are checked in relation to thedecayingFn
used.An
ActiveRule
has theget_sample_rate(now)
method which returns the sample rate at point in timenow
.