Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(metrics): Extract user satisfaction as tag [INGEST-589] #1197

Merged
merged 10 commits into from
Mar 3, 2022
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

## Unreleased

**Features**:

- Tag transaction metrics by user satisfaction. ([#1197](https://github.com/getsentry/relay/pull/1197))

**Internal**:

- Spread out metric aggregation over the aggregation window to avoid concentrated waves of metrics requests to the upstream every 10 seconds. Relay now applies jitter to `initial_delay` to spread out requests more evenly over time. ([#1185](https://github.com/getsentry/relay/pull/1185))
Expand Down
241 changes: 226 additions & 15 deletions relay-server/src/metrics_extraction/transactions.rs
Original file line number Diff line number Diff line change
@@ -1,22 +1,49 @@
use serde::{Deserialize, Serialize};
use std::collections::BTreeSet;
use std::collections::{BTreeMap, BTreeSet};

#[cfg(feature = "processing")]
use {
relay_common::UnixTimestamp,
relay_general::protocol::{AsPair, Event, EventType},
relay_general::store::{get_breakdown_measurements, normalize_dist, BreakdownsConfig},
relay_metrics::{Metric, MetricUnit, MetricValue},
std::collections::BTreeMap,
std::fmt,
std::fmt::Write,
};

/// Configuration in relation to extracting metrics from transaction events.
/// The metric on which the user satisfaction threshold is applied.
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(rename_all = "lowercase")]
enum SatisfactionMetric {
Duration,
Lcp,
}

/// Configuration for a single threshold.
#[derive(Debug, Clone, Serialize, Deserialize)]
struct SatisfactionThreshold {
metric: SatisfactionMetric,
threshold: f64,
}

/// Configuration for applying the user satisfaction threshold.
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
struct SatisfactionConfig {
/// The project-wide threshold to apply.
project_threshold: SatisfactionThreshold,
/// Transaction-specific overrides of the project-wide threshold.
#[serde(default, skip_serializing_if = "BTreeMap::is_empty")]
transaction_thresholds: BTreeMap<String, SatisfactionThreshold>,
}

/// Configuration for extracting metrics from transaction payloads.
#[derive(Default, Debug, Clone, Serialize, Deserialize)]
#[serde(default, rename_all = "camelCase")]
pub struct TransactionMetricsConfig {
extract_metrics: BTreeSet<String>,
extract_custom_tags: BTreeSet<String>,
satisfaction_thresholds: Option<SatisfactionConfig>,
}

#[cfg(feature = "processing")]
Expand Down Expand Up @@ -57,6 +84,96 @@ fn extract_dist(transaction: &Event) -> Option<String> {
dist
}

/// Satisfaction value used for Apdex and User Misery
/// https://docs.sentry.io/product/performance/metrics/#apdex
#[cfg(feature = "processing")]
enum UserSatisfaction {
Satisfied,
Tolerated,
Frustrated,
}

#[cfg(feature = "processing")]
impl fmt::Display for UserSatisfaction {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match self {
UserSatisfaction::Satisfied => write!(f, "satisfied"),
UserSatisfaction::Tolerated => write!(f, "tolerated"),
UserSatisfaction::Frustrated => write!(f, "frustrated"),
}
}
}

#[cfg(feature = "processing")]
impl UserSatisfaction {
/// The frustration threshold is always four times the threshold
/// (see https://docs.sentry.io/product/performance/metrics/#apdex)
const FRUSTRATION_FACTOR: f64 = 4.0;

fn from_value(value: f64, threshold: f64) -> Self {
if value < threshold {
Self::Satisfied
} else if value < Self::FRUSTRATION_FACTOR * threshold {
jjbayer marked this conversation as resolved.
Show resolved Hide resolved
Self::Tolerated
} else {
Self::Frustrated
}
}
}

/// Get duration from timestamp and start_timestamp.
#[cfg(feature = "processing")]
fn get_duration_millis(transaction: &Event) -> Option<f64> {
let start = transaction.start_timestamp.value();
let end = transaction.timestamp.value();
match (start, end) {
(Some(start), Some(end)) => {
let start = start.timestamp_millis();
let end = end.timestamp_millis();
Some(end.saturating_sub(start) as f64)
}
_ => None,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open question: Should a transaction without duration count towards satisfied users?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have to choose between satisfied, tolerated, and frustrated, I say yes. However, do we know what cases cause transactions without duration count? If there are cases where SDKs aren't able to measure the duration, we may want to include an additional satisfaction value. I don't know how feasible this is though. If a transaction couldn't be measured, I don't think counting it as satisfaction provides any value. It could take a long time, and having another value could lead to users updating SDK code to correctly capture these transactions and have more valuable data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not have to choose, we can simply leave the metric untagged. So I will leave None here. This is an exceptional case anyway, because both timestamp and start_timestamp are required fields: https://develop.sentry.dev/sdk/event-payloads/transaction/.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now refactored so that we do an early exit when the transaction is invalid. Metrics for invalid transactions are dropped later on anyway.

}
}

/// Get the value for a measurement, e.g. lcp -> event.measurements.lcp
#[cfg(feature = "processing")]
fn get_measurement(transaction: &Event, name: &str) -> Option<f64> {
if let Some(measurements) = transaction.measurements.value() {
for (measurement_name, annotated) in measurements.iter() {
if measurement_name == name {
if let Some(value) = annotated.value().and_then(|m| m.value.value()) {
return Some(*value);
}
}
}
}
None
}

/// Extract the the satisfaction value depending on the actual measurement/duration value
/// and the configured threshold.
#[cfg(feature = "processing")]
fn extract_user_satisfaction(
config: &Option<SatisfactionConfig>,
transaction: &Event,
) -> Option<UserSatisfaction> {
if let Some(config) = config {
let threshold = transaction
.transaction
.value()
.and_then(|name| config.transaction_thresholds.get(name))
.unwrap_or(&config.project_threshold);
if let Some(value) = match threshold.metric {
SatisfactionMetric::Duration => get_duration_millis(transaction),
SatisfactionMetric::Lcp => get_measurement(transaction, "lcp"),
} {
return Some(UserSatisfaction::from_value(value, threshold.threshold));
}
}
None
}

#[cfg(feature = "processing")]
pub fn extract_transaction_metrics(
config: &TransactionMetricsConfig,
Expand Down Expand Up @@ -172,18 +289,15 @@ pub fn extract_transaction_metrics(
}
}

// Duration
let start = event.start_timestamp.value();
let end = event.timestamp.value();
let duration_millis = match (start, end) {
(Some(start), Some(end)) => {
let start = start.timestamp_millis();
let end = end.timestamp_millis();
end.saturating_sub(start)
}
_ => 0,
let user_satisfaction = extract_user_satisfaction(&config.satisfaction_thresholds, event);
let tags_with_satisfaction = match user_satisfaction {
Some(satisfaction) => with_tag(&tags, "satisfaction", satisfaction),
None => tags.clone(),
};

// Duration
let duration_millis = get_duration_millis(event).unwrap_or(0.0);

// We always push the duration even if it's 0, because we use count(transaction.duration)
// to get the total number of transactions.
// This may need to be changed if it turns out that this skews the duration metric.
Expand All @@ -194,19 +308,25 @@ pub fn extract_transaction_metrics(
timestamp,
tags: match extract_transaction_status(event) {
Some(status) => with_tag(&tags, "transaction.status", status),
None => tags.clone(),
None => tags_with_satisfaction.clone(),
},
});

// User
if let Some(user) = event.user.value() {
if let Some(user_id) = user.id.as_str() {
// TODO: If we have a transaction duration of 0 or no transaction at all, does the user count as satisfied?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think transactions of duration 0 should count as satisfied. No transaction shouldn't count towards user satisfaction, that's misleading because we actually don't have data.

push_metric(Metric {
name: metric_name(&["user"]),
unit: MetricUnit::None,
value: MetricValue::set_from_str(user_id),
timestamp,
tags: tags.clone(),
// A single user might end up in multiple satisfaction buckets when they have
// some satisfying transactions and some frustrating transactions.
// This is OK as long as we do not add these numbers *after* aggregation:
// <WRONG>total_users = uniqIf(user, satisfied) + uniqIf(user, tolerated) + uniqIf(user, frustrated)</WRONG>
// <RIGHT>total_users = uniq(user)</RIGHT>
tags: tags_with_satisfaction,
});
}
}
Expand Down Expand Up @@ -424,4 +544,95 @@ mod tests {
assert_eq!(duration_metric.tags["environment"], "fake_environment");
assert_eq!(duration_metric.tags["transaction"], "mytransaction");
}

#[test]
fn test_user_satisfaction() {
let json = r#"
{
"type": "transaction",
"transaction": "foo",
"start_timestamp": "2021-04-26T08:00:00+0100",
"timestamp": "2021-04-26T08:00:01+0100",
"user": {
"id": "user123"
}
}
"#;

let event = Annotated::from_json(json).unwrap();

let config: TransactionMetricsConfig = serde_json::from_str(
r#"
{
"extractMetrics": [
"sentry.transactions.transaction.duration",
"sentry.transactions.user"
],
"satisfactionThresholds": {
"projectThreshold": {
"metric": "duration",
"threshold": 300
}
}
}
"#,
)
.unwrap();
let mut metrics = vec![];
extract_transaction_metrics(&config, None, event.value().unwrap(), &mut metrics);
assert_eq!(metrics.len(), 2);

for metric in metrics {
assert_eq!(metric.tags.len(), 2);
assert_eq!(metric.tags["satisfaction"], "tolerated");
}
}

#[test]
fn test_user_satisfaction_override() {
let json = r#"
{
"type": "transaction",
"transaction": "foo",
"start_timestamp": "2021-04-26T08:00:00+0100",
"timestamp": "2021-04-26T08:00:02+0100",
"measurements": {
"lcp": {"value": 41}
}
}
"#;

let event = Annotated::from_json(json).unwrap();

let config: TransactionMetricsConfig = serde_json::from_str(
r#"
{
"extractMetrics": [
"sentry.transactions.transaction.duration"
],
"satisfactionThresholds": {
"projectThreshold": {
"metric": "duration",
"threshold": 300
},
"transactionThresholds": {
"foo": {
"metric": "lcp",
"threshold": 42
}
}
}
}
"#,
)
.unwrap();
let mut metrics = vec![];
extract_transaction_metrics(&config, None, event.value().unwrap(), &mut metrics);
assert_eq!(metrics.len(), 1);

for metric in metrics {
assert_eq!(metric.tags.len(), 2);
assert_eq!(metric.tags["satisfaction"], "satisfied");
}
}
}
2 changes: 1 addition & 1 deletion tests/integration/test_metrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ def test_metrics(mini_sentry, relay):
metrics_payload = f"foo:42|c\nbar:17|c"
relay.send_metrics(project_id, metrics_payload, timestamp)

envelope = mini_sentry.captured_events.get(timeout=2)
envelope = mini_sentry.captured_events.get(timeout=3)
assert len(envelope.items) == 1

metrics_item = envelope.items[0]
Expand Down