Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(metrics): Add received_at timestamp in bucket metadata #3488

Merged
merged 32 commits into from May 6, 2024

Conversation

iambriccardo
Copy link
Member

@iambriccardo iambriccardo commented Apr 26, 2024

This PR adds a new received_at field to the BucketMetadata struct. The goal of this field is to eventually propagate it to Kafka to measure the end-to-end latency of our metrics pipeline. For this reason, the received_at measurement must happen in the outermost internal Relay.

The field received_at will be set/overridden by the outermost Relay via the option keep_metadata which when set to false will tell Relay to override the incoming metadata.

The merge operation on two received_at timestamps is defined as min(r1, r2) if both timestamps are non-null, otherwise the first non-null value will be taken. If both values are null, the merge result will still yield null.

Closes: https://github.com/getsentry/team-ingest/issues/265

@iambriccardo iambriccardo changed the title riccardo/feat/add timestamp feat(metrics): Add received_at timestamp in bucket metadata Apr 26, 2024
}

impl Default for SentryMetrics {
fn default() -> Self {
Self {
meta_locations_expiry: 15 * 24 * 60 * 60,
meta_locations_max: 5,
override_received_at_metadata: false,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default we do not want to override the metadata to avoid unexpected behaviors due to the missing config param.

@@ -524,13 +524,17 @@ struct SentryMetrics {
///
/// Defaults to 5.
pub meta_locations_max: usize,
/// Whether to override the [`received_at`] field in the [`BucketMetadata`] with the current
/// receive time of the instance.
pub override_received_at_metadata: bool,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In production, we would like to set this option to true only for pop relays, since we want to have the received_at metadata field set to the outermost relay timestamp that receives a given bucket.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have that handling in the processor as keep_metadata for example in handle_process_metrics which should already do the right thing.

@@ -56,13 +56,20 @@ where
continue;
};

// For extracted metrics we assume the `received_at` timestamp is equivalent to the time
// in which the metric is extracted.
#[cfg(not(test))]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of mocking time, we generate it differently if we are in test, in order to make the unit tests deterministic.

bucket.metadata = BucketMetadata::default();
}

if override_received_at_metadata {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We purposefully override the metadata after the decision to keep it has been made.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This piece of code could be optimized to avoid overwriting a field but I felt it's more comprehensible like this.

}

#[tokio::test]
async fn test_process_batched_metrics_bucket_metadata() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add integration tests in a follow up PR where we are also going to write the data to kafka.

@iambriccardo iambriccardo marked this pull request as ready for review April 30, 2024 10:42
@iambriccardo iambriccardo requested a review from a team as a code owner April 30, 2024 10:42
@@ -524,13 +524,17 @@ struct SentryMetrics {
///
/// Defaults to 5.
pub meta_locations_max: usize,
/// Whether to override the [`received_at`] field in the [`BucketMetadata`] with the current
/// receive time of the instance.
pub override_received_at_metadata: bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have that handling in the processor as keep_metadata for example in handle_process_metrics which should already do the right thing.

relay-metrics/src/bucket.rs Outdated Show resolved Hide resolved
relay-server/src/metrics_extraction/generic.rs Outdated Show resolved Hide resolved
relay-server/src/metrics_extraction/generic.rs Outdated Show resolved Hide resolved
relay-metrics/src/bucket.rs Show resolved Hide resolved
relay-server/src/metrics_extraction/sessions/types.rs Outdated Show resolved Hide resolved
relay-server/src/services/processor.rs Outdated Show resolved Hide resolved
relay-server/src/services/processor.rs Outdated Show resolved Hide resolved
relay-server/src/services/store.rs Show resolved Hide resolved
relay-server/src/services/processor.rs Outdated Show resolved Hide resolved
@iambriccardo iambriccardo requested a review from Dav1dde May 2, 2024 13:29
Comment on lines 1303 to 1304
let mut buckets = serde_json::from_str::<Vec<Bucket>>(json).unwrap();
buckets[0].metadata = BucketMetadata::new(UnixTimestamp::from_secs(1615889440));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just put this into the json, right?

relay-server/src/services/processor.rs Outdated Show resolved Hide resolved
relay-server/src/services/processor.rs Outdated Show resolved Hide resolved
relay-server/src/services/processor.rs Outdated Show resolved Hide resolved
span_time = 9.910107
else:
span_time = 9.910106
span_time = 9.910106
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests seem to succeed with the actual value 9.910106.

metrics[metric["name"]] = metric
metrics["headers"][metric["name"]] = metric_headers

metrics_consumer.assert_empty()
return metrics


def metrics_without_keys(received_metrics, keys):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided to opt for a generic solution that can remove any key.

@iambriccardo iambriccardo requested a review from Dav1dde May 3, 2024 11:57
@@ -155,6 +155,34 @@ pub fn create_test_processor(config: Config) -> EnvelopeProcessorService {
)
}

pub fn create_test_processor_with_addrs(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_test_processor should forward to this

@iambriccardo iambriccardo merged commit f5abf09 into master May 6, 2024
28 checks passed
@iambriccardo iambriccardo deleted the riccardo/feat/add-timestamp branch May 6, 2024 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants