Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(metrics): Remove/reject nul-bytes from metric strings [ingest-1204] #1235

Merged
merged 24 commits into from
Apr 26, 2022

Conversation

untitaker
Copy link
Member

We are observing release names which are UTF-16 strings reinterpreted as
UTF-8. This is causing issues downstream and does not line up with the
releases that end up in the event payload, which for some reason are
totally correct.

In order to counteract this, we strip NUL-bytes from release values.

In other string types (tag keys, metrics), there are "fewer excuses" for
not getting those strings right, so we just drop them.

We are observing release names which are UTF-16 strings reinterpreted as
UTF-8. This is causing issues downstream and does not line up with the
releases that end up in the event payload, which for some reason are
totally correct.

In order to counteract this, we strip NUL-bytes from release values.

In other string types (tag keys, metrics), there are "fewer excuses" for
not getting those strings right, so we just drop them.
@untitaker untitaker requested a review from a team April 21, 2022 18:55
untitaker added a commit to getsentry/sentry-release-parser that referenced this pull request Apr 21, 2022
See getsentry/relay#1235, we want to ensure we
don't accept release names with NUL-bytes in them. We might want to
strip NUL-bytes instead of dropping the entire release but that'd be too
much effort for now.
@@ -343,7 +343,7 @@ pub struct Event {
/// can be the git SHA for the given project, or a product identifier with a semantic version.
#[metastructure(
max_chars = "tag_value", // release ends in tag
deny_chars = "\r\n\x0c\t/\\",
// release allowed chars are validated in the sentry-release-parser crate!
Copy link
Member Author

@untitaker untitaker Apr 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change does mean that the release name is only validated in the store normalizer. I think we will need to follow up with moving this part of store normalization to run before metrics extraction as well

  • create ticket

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it even make sense to remove the deny_chars at this point? Or should we do that as part of the follow up?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A separate PR that moves this in one go would've been slightly more semantic, but let's keep it this way in light of moving fast.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -1055,6 +1080,8 @@ impl Aggregator {
let timestamp = key.timestamp;
let project_key = key.project_key;

let key = key.validate_strings().map_err(|()| AggregateMetricsError)?;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will fail the entire batch. we have two options:

  • map the error to return Ok(())
  • make the error type an enum with multiple variants, and have matching logic in merge_all

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjbayer @jan-auer @iker-barriocanal I believe we should change merge_all. there is a related case involving get_bucket_timestamps where the entire batch fails as well, for no discernible reason

@untitaker
Copy link
Member Author

asking for additional review to make sure

@@ -1140,7 +1167,7 @@ impl Aggregator {
I: IntoIterator<Item = Bucket>,
{
for bucket in buckets.into_iter() {
self.merge(project_key, bucket)?;
let _result = self.merge(project_key, bucket);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense. Should we log to sentry or datadog when _result is not Ok though?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i considered this but there's no information on the error type. I can make this a simple error enum. I know that @jan-auer would have opinions on how we do error handlig

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd definitely not silently drop the error. An enum works for me, but not opinionated here.

@@ -716,8 +716,17 @@ impl Bucket {

/// Any error that may occur during aggregation.
#[derive(Debug, Fail)]
#[fail(display = "failed to aggregate metrics")]
pub struct AggregateMetricsError;
pub enum AggregateMetricsError {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for introducing this, it makes sense at this point.

Please move this to an internal enum and leave the metrics aggregation error opaque to the public interface of relay-metrics. I would not expose the enum at all, and rely on Display outside of the crate. We should not match on the enum variant.

Introducing this enum variant could've been a good standalone PR to go before this one.

@@ -745,6 +754,31 @@ impl BucketKey {
std::hash::Hash::hash(self, &mut hasher);
hasher.finalize() as i64
}

/// Remove invalid characters from tags and metric names.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metric names and metric tags have different character allow lists, let's have separate validator functions. Especially, there is already is_valid_name. parse_tags doesn't have such logic yet.

More of a code style suggestion: BucketKey is a plain data object used for bucketing. I would suggest to put such validator functions
on types that specify the public protocol or directly in the aggregator if it's easily possible.

///
/// Returns `Err` if the metric should be dropped.
fn validate_strings(mut self) -> Result<Self, ()> {
if self.metric_name.contains('\0') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you instead reject all control characters here?

fn validate_strings(mut self) -> Result<Self, ()> {
if self.metric_name.contains('\0') {
relay_log::error!("invalid metric name {:?}", self.metric_name);
return Err(());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not log here but rather return AggregateMetricsError that can be logged somewhere else.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't go for that because it would bloat the size of AggregateMetricsError but I can do it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether you return () or the error enum shouldn't affect the result size here since Self is larger anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I mean if I were to store the metric name in the error. and yeah even then it's not struct size but the extra allocation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Maybe log this on debug! then, we certainly don't want that in a production instance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed it entirely now :/

return Err(());
}

self.metric_name.retain(|c| c != '\0');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this redundant given that we're rejecting the name just before?

@untitaker untitaker changed the base branch from master to fix/mri-validation April 25, 2022 18:20
Base automatically changed from fix/mri-validation to master April 25, 2022 18:39
@untitaker untitaker requested a review from jan-auer April 25, 2022 18:40
untitaker added a commit that referenced this pull request Apr 25, 2022
In order for #1235 to land, we
need the latest release parser that actually rejects control characters
in release names.
Copy link
Member

@jan-auer jan-auer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@untitaker untitaker merged commit a66c0eb into master Apr 26, 2022
@untitaker untitaker deleted the fix/reject-null-values branch April 26, 2022 10:19
jan-auer added a commit that referenced this pull request Apr 27, 2022
* master:
  fix: Add bucket key to extra (#1243)
  fix(metrics): Remove/reject nul-bytes from metric strings (#1235)
  chore: Bump sentry-release-parser (#1242)
  fix(metrics): Fix MRI validation in statsd protocol (#1241)
  ref(metrics): Refactor aggregation error, recover from errors more gracefully (#1240)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants