Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add VAPID quality tracking metric messages #417

Merged
merged 27 commits into from
Jul 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
ad53a9b
feat: add VAPID quality tracking metric messages
jrconlin Jul 31, 2023
d4d8e2a
f add more data for vapid error
jrconlin Aug 2, 2023
46df55a
Merge branch 'master' of github.com:mozilla-services/autopush-rs into…
jrconlin Aug 3, 2023
55c6c66
f fmt
jrconlin Aug 3, 2023
88d1255
Merge branch 'master' of github.com:mozilla-services/autopush-rs into…
jrconlin Aug 11, 2023
87e08df
Merge branch 'master' of github.com:mozilla-services/autopush-rs into…
jrconlin Aug 28, 2023
4825dfc
Merge branch 'master' of github.com:mozilla-services/autopush-rs into…
jrconlin Aug 28, 2023
4c3a7b9
Merge branch 'master' into feat/SYNC-3842_metrics
jrconlin Aug 31, 2023
399733c
Merge branch 'master' of github.com:mozilla-services/autopush-rs into…
jrconlin Nov 29, 2023
c07f4b2
Merge branch 'master' into feat/SYNC-3842_metrics
jrconlin Dec 4, 2023
2f01052
Merge branch 'master' into feat/SYNC-3842_metrics
jrconlin Dec 5, 2023
fdbeed8
Merge branch 'master' into feat/SYNC-3842_metrics
jrconlin Dec 12, 2023
1c7e29d
Merge branch 'master' into feat/SYNC-3842_metrics
jrconlin Dec 14, 2023
e8e8032
Merge branch 'master' into feat/SYNC-3842_metrics
jrconlin Dec 14, 2023
5cae7d9
Merge branch 'master' into feat/SYNC-3842_metrics
jrconlin Dec 16, 2023
8ee216b
Merge branch 'master' into feat/SYNC-3842_metrics
jrconlin Jan 11, 2024
44aba72
Merge branch 'master' into feat/SYNC-3842_metrics
jrconlin Jan 18, 2024
3425494
Merge branch 'master' of github.com:mozilla-services/autopush-rs into…
jrconlin Jun 17, 2024
3bb1e9b
f cargo update
jrconlin Jun 20, 2024
b267585
f r's
jrconlin Jun 20, 2024
f977a1d
f r's
jrconlin Jun 20, 2024
8ce5d43
Merge branch 'master' of github.com:mozilla-services/autopush-rs into…
jrconlin Jun 20, 2024
5cca5bb
Merge branches 'feat/SYNC-3842_metrics' and 'master' of github.com:mo…
jrconlin Jul 16, 2024
41f54b5
f r
jrconlin Jul 16, 2024
8ef8e43
f clippy
jrconlin Jul 16, 2024
396ad5f
f reduce
jrconlin Jul 16, 2024
a1aae71
Merge branch 'master' into feat/SYNC-3842_metrics
pjenvey Jul 17, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

84 changes: 80 additions & 4 deletions autoendpoint/src/extractors/subscription.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
use std::borrow::Cow;
use std::error::Error;
use std::str::FromStr;

use actix_web::{dev::Payload, web::Data, FromRequest, HttpRequest};
Expand Down Expand Up @@ -67,7 +68,29 @@ impl FromRequest for Subscription {
.map(|vapid| extract_public_key(vapid, &token_info))
.transpose()?;

trace!("Vapid: {:?}", &vapid);
trace!("raw vapid: {:?}", &vapid);

// Capturing the vapid sub right now will cause too much cardinality. Instead,
// let's just capture if we have a valid VAPID, as well as what sort of bad sub
// values we get.
if let Some(ref header) = vapid {
let sub = header
.vapid
.sub()
.map_err(|e: VapidError| {
// Capture the type of error and add it to metrics.
let mut tags = Tags::default();
tags.tags
.insert("error".to_owned(), e.as_metric().to_owned());
metrics
.clone()
.incr_with_tags("notification.auth.error", Some(tags));
})
.unwrap_or_default();
// For now, record that we had a good (?) VAPID sub,
metrics.clone().incr("notification.auth.ok");
info!("VAPID sub: {:?}", sub)
Comment on lines +83 to +92
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things here:

  • The lack of sub isn't really an actual error, maybe call this metric something else? Could it be a tag on the existing notification.auth metric (moving all this into parse_vapid)?
  • If it doesn't make sense as part of parse_vapid I think ideally this extra work for metrics shouldn't occur until validate_vapid_jwt succeeds in validating
  • Is the new notification.auth.ok totally necessary? We emit updates.vapid.draft{version} for successfully validated vapid, we could get a similar number by subtracting the no sub metric from those 2 combined

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • While not strictly required by the RFC, we can require sub none the less, and already do. (Honestly, having an empty VAPID header with no data isn't very helpful for us, and the other platforms strictly require VAPID headers with all elements filled out, which kinda already renders them less "voluntary")
  • I was thinking that we would want to capture the sub as early as possible, notably because if we are seeing a lot of failed headers from a given provider, we could identify the source and either contact or filter them.
  • Doing metric math has been tricky in the past (partly because it can be difficult to get a good read on a value, so we wind up guessing or estimating things, particularly with things like stacked vs unstacked values).

};

match token_info.api_version {
ApiVersion::Version1 => version_1_validation(&token)?,
Expand Down Expand Up @@ -134,7 +157,13 @@ fn parse_vapid(token_info: &TokenInfo, metrics: &StatsdClient) -> ApiResult<Opti
None => return Ok(None),
};

let vapid = VapidHeader::parse(auth_header)?;
let vapid = VapidHeader::parse(auth_header).map_err(|e| {
metrics
.incr_with_tags("notification.auth.error")
.with_tag("error", e.as_metric())
.send();
e
})?;

metrics
.incr_with_tags("notification.auth")
Expand Down Expand Up @@ -223,6 +252,18 @@ fn version_2_validation(token: &[u8], vapid: Option<&VapidHeaderWithKey>) -> Api
Ok(())
}

// Perform a very brain dead conversion of a string to a CamelCaseVersion
fn term_to_label(term: &str) -> String {
term.split(' ').fold("".to_owned(), |prev, word: &str| {
format!(
"{}{}{}",
prev,
word.get(0..1).unwrap_or_default().to_ascii_uppercase(),
word.get(1..).unwrap_or_default()
)
})
}

/// Validate the VAPID JWT token. Specifically,
/// - Check the signature
/// - Make sure it hasn't expired
Expand Down Expand Up @@ -278,7 +319,33 @@ fn validate_vapid_jwt(
return Err(VapidError::InvalidVapid(e.to_string()).into());
}
_ => {
metrics.clone().incr("notification.auth.bad_vapid.other");
// Attempt to match up the majority of ErrorKind variants.
// The third-party errors all defer to the source, so we can
// use that to differentiate for actual errors.
let mut tags = Tags::default();
let label = if e.source().is_none() {
// These two have the most cardinality, so we need to handle
// them separately.
Comment on lines +327 to +328
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment and the other mention of cardinality a few lines below -- there's some sense of cardinality across all these ErrorKind variant since there's almost 20 of them, but the concern with these 2 variants and others is that their to_string includes whitespace which aren't valid for tag values

let mut label_name = e.to_string();
if label_name.contains(':') {
// if the error begins with a common tag e.g. "Missing required claim: ..."
// then convert it to a less cardinal version. This is lossy, but acceptable.
label_name =
term_to_label(label_name.split(':').next().unwrap_or_default());
} else if label_name.contains(' ') {
// if a space still snuck through somehow, remove it.
label_name = term_to_label(&label_name);
}
label_name
} else {
// If you need to dig into these, there's always the logs.
"Other".to_owned()
};
tags.tags.insert("error".to_owned(), label);
metrics
.clone()
.incr_with_tags("notification.auth.bad_vapid.other", Some(tags));
error!("Bad Aud: Unexpected VAPID error: {:?}", &e);
return Err(e.into());
}
},
Expand Down Expand Up @@ -330,7 +397,7 @@ fn validate_vapid_jwt(

#[cfg(test)]
pub mod tests {
use super::{validate_vapid_jwt, VapidClaims};
use super::{term_to_label, validate_vapid_jwt, VapidClaims};
use crate::error::ApiErrorKind;
use crate::extractors::subscription::repad_base64;
use crate::headers::vapid::{VapidError, VapidHeader, VapidHeaderWithKey, VapidVersionData};
Expand Down Expand Up @@ -571,4 +638,13 @@ pub mod tests {
ApiErrorKind::VapidError(VapidError::InvalidVapid(_))
])
}

#[test]
fn test_crapitalize() {
assert_eq!(
"LabelFieldWithoutData",
term_to_label("LabelField without data")
);
assert_eq!("UntouchedField", term_to_label("UntouchedField"));
}
}
52 changes: 52 additions & 0 deletions autoendpoint/src/headers/vapid.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ use std::fmt;

use base64::Engine;
use serde::{Deserialize, Serialize};
use serde_json::Value;
use thiserror::Error;

use crate::headers::util::split_key_value;
Expand Down Expand Up @@ -134,6 +135,29 @@ impl VapidHeader {
}
}

pub fn sub(&self) -> Result<String, VapidError> {
let data: HashMap<String, Value> = serde_json::from_str(&self.token).map_err(|e| {
warn!("🔐 Vapid: {:?}", e);
VapidError::SubInvalid
})?;

if let Some(sub_candiate) = data.get("sub") {
if let Some(sub) = sub_candiate.as_str() {
if !sub.starts_with("mailto:") || !sub.starts_with("https://") {
info!("🔐 Vapid: Bad Format {:?}", sub);
return Err(VapidError::SubBadFormat);
}
if sub.is_empty() {
info!("🔐 Empty Vapid sub");
return Err(VapidError::SubEmpty);
}
info!("🔐 Vapid: sub: {:?}", sub);
return Ok(sub.to_owned());
}
}
Err(VapidError::SubMissing)
}

pub fn claims(&self) -> Result<VapidClaims, VapidError> {
VapidClaims::try_from(self.clone())
}
Expand All @@ -159,6 +183,34 @@ pub enum VapidError {
FutureExpirationToken,
#[error("Unknown auth scheme")]
UnknownScheme,
#[error("Unparsable sub string")]
SubInvalid,
#[error("Improperly formatted sub string")]
SubBadFormat,
#[error("Empty sub string")]
SubEmpty,
#[error("Missing sub")]
SubMissing,
}

impl VapidError {
pub fn as_metric(&self) -> &str {
match self {
Self::MissingToken => "missing_token",
Self::InvalidVapid(_) => "invalid_vapid",
Self::MissingKey => "missing_key",
Self::InvalidKey(_) => "invalid_key",
Self::InvalidAudience => "invalid_audience",
Self::InvalidExpiry => "invalid_expiry",
Self::KeyMismatch => "key_mismatch",
Self::FutureExpirationToken => "future_expiration_token",
Self::UnknownScheme => "unknown_scheme",
Self::SubInvalid => "invalid_sub",
Self::SubBadFormat => "bad_format_sub",
Self::SubEmpty => "empty_sub",
Self::SubMissing => "missing_sub",
}
}
}

#[cfg(test)]
Expand Down