-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(metrics): Remove/reject nul-bytes from metric strings [ingest-1204] #1235
Conversation
We are observing release names which are UTF-16 strings reinterpreted as UTF-8. This is causing issues downstream and does not line up with the releases that end up in the event payload, which for some reason are totally correct. In order to counteract this, we strip NUL-bytes from release values. In other string types (tag keys, metrics), there are "fewer excuses" for not getting those strings right, so we just drop them.
See getsentry/relay#1235, we want to ensure we don't accept release names with NUL-bytes in them. We might want to strip NUL-bytes instead of dropping the entire release but that'd be too much effort for now.
@@ -343,7 +343,7 @@ pub struct Event { | |||
/// can be the git SHA for the given project, or a product identifier with a semantic version. | |||
#[metastructure( | |||
max_chars = "tag_value", // release ends in tag | |||
deny_chars = "\r\n\x0c\t/\\", | |||
// release allowed chars are validated in the sentry-release-parser crate! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change does mean that the release name is only validated in the store normalizer. I think we will need to follow up with moving this part of store normalization to run before metrics extraction as well
- create ticket
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it even make sense to remove the deny_chars
at this point? Or should we do that as part of the follow up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A separate PR that moves this in one go would've been slightly more semantic, but let's keep it this way in light of moving fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
relay-metrics/src/aggregation.rs
Outdated
@@ -1055,6 +1080,8 @@ impl Aggregator { | |||
let timestamp = key.timestamp; | |||
let project_key = key.project_key; | |||
|
|||
let key = key.validate_strings().map_err(|()| AggregateMetricsError)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will fail the entire batch. we have two options:
- map the error to
return Ok(())
- make the error type an enum with multiple variants, and have matching logic in
merge_all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jjbayer @jan-auer @iker-barriocanal I believe we should change merge_all. there is a related case involving get_bucket_timestamps
where the entire batch fails as well, for no discernible reason
asking for additional review to make sure |
relay-metrics/src/aggregation.rs
Outdated
@@ -1140,7 +1167,7 @@ impl Aggregator { | |||
I: IntoIterator<Item = Bucket>, | |||
{ | |||
for bucket in buckets.into_iter() { | |||
self.merge(project_key, bucket)?; | |||
let _result = self.merge(project_key, bucket); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense. Should we log to sentry or datadog when _result
is not Ok
though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i considered this but there's no information on the error type. I can make this a simple error enum. I know that @jan-auer would have opinions on how we do error handlig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd definitely not silently drop the error. An enum works for me, but not opinionated here.
relay-metrics/src/aggregation.rs
Outdated
@@ -716,8 +716,17 @@ impl Bucket { | |||
|
|||
/// Any error that may occur during aggregation. | |||
#[derive(Debug, Fail)] | |||
#[fail(display = "failed to aggregate metrics")] | |||
pub struct AggregateMetricsError; | |||
pub enum AggregateMetricsError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for introducing this, it makes sense at this point.
Please move this to an internal enum and leave the metrics aggregation error opaque to the public interface of relay-metrics
. I would not expose the enum at all, and rely on Display
outside of the crate. We should not match on the enum variant.
Introducing this enum variant could've been a good standalone PR to go before this one.
relay-metrics/src/aggregation.rs
Outdated
@@ -745,6 +754,31 @@ impl BucketKey { | |||
std::hash::Hash::hash(self, &mut hasher); | |||
hasher.finalize() as i64 | |||
} | |||
|
|||
/// Remove invalid characters from tags and metric names. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metric names and metric tags have different character allow lists, let's have separate validator functions. Especially, there is already is_valid_name
. parse_tags
doesn't have such logic yet.
More of a code style suggestion: BucketKey
is a plain data object used for bucketing. I would suggest to put such validator functions
on types that specify the public protocol or directly in the aggregator if it's easily possible.
relay-metrics/src/aggregation.rs
Outdated
/// | ||
/// Returns `Err` if the metric should be dropped. | ||
fn validate_strings(mut self) -> Result<Self, ()> { | ||
if self.metric_name.contains('\0') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you instead reject all control characters here?
relay-metrics/src/aggregation.rs
Outdated
fn validate_strings(mut self) -> Result<Self, ()> { | ||
if self.metric_name.contains('\0') { | ||
relay_log::error!("invalid metric name {:?}", self.metric_name); | ||
return Err(()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not log here but rather return AggregateMetricsError
that can be logged somewhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't go for that because it would bloat the size of AggregateMetricsError
but I can do it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether you return ()
or the error enum shouldn't affect the result size here since Self
is larger anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I mean if I were to store the metric name in the error. and yeah even then it's not struct size but the extra allocation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Maybe log this on debug!
then, we certainly don't want that in a production instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed it entirely now :/
relay-metrics/src/aggregation.rs
Outdated
return Err(()); | ||
} | ||
|
||
self.metric_name.retain(|c| c != '\0'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this redundant given that we're rejecting the name just before?
In order for #1235 to land, we need the latest release parser that actually rejects control characters in release names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
* master: fix: Add bucket key to extra (#1243) fix(metrics): Remove/reject nul-bytes from metric strings (#1235) chore: Bump sentry-release-parser (#1242) fix(metrics): Fix MRI validation in statsd protocol (#1241) ref(metrics): Refactor aggregation error, recover from errors more gracefully (#1240)
We are observing release names which are UTF-16 strings reinterpreted as
UTF-8. This is causing issues downstream and does not line up with the
releases that end up in the event payload, which for some reason are
totally correct.
In order to counteract this, we strip NUL-bytes from release values.
In other string types (tag keys, metrics), there are "fewer excuses" for
not getting those strings right, so we just drop them.