-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(server): Retry project configs asynchronously [INGEST-1220] #1263
Conversation
a bit all over the place with indicators of where to look
#[derive(Debug, Deserialize, Serialize)] | ||
#[serde(rename_all = "camelCase")] | ||
pub struct GetProjectStatesResponse { | ||
#[serde(default)] | ||
pub configs: HashMap<ProjectKey, ErrorBoundary<Option<ProjectState>>>, | ||
#[serde(default)] | ||
pub pending: Vec<ProjectKey>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make it impossible to get into an inconsistent state here (project key appears in both collections), should we turn configs
into something like HashMap<ProjectKey, ErrorBoundary<MaybeProjectState>>
, with
enum MaybeProjectState {
None, Present(ProjectState), Pending
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also define precedence of when both are present, for instance that the supplied config always wins over the pending state.
The initial approach was to do exactly what you are suggesting. Making serialization work in an unobtrusive way (supporting both v2 and v3 without much overhead) is slightly harder in that scenario, but still possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the advantage of the currently implemented schema? the protocol is not backwards-compatible anyway, even if it is at the syntactical level, we cannot send v3 to a v2 relay. if we accidentally send v3 response to v2 relay, deserialization failure (unknown value "pending") is actually safer IMO than assuming the project does not exist. The latter causes silent data dropping while the former throws errors everywhere. (Presumably we will see alerts either way, but customer relays may have less monitoring)
also, will we write pending state into redis, and if so, how?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation was just massively easier to implement and read than the other one after trying both out.
In theory, we can send this down to a V2 Relay, and this is also how the endpoint is implemented. Such a Relay would assume the now missing project config as missing and stop polling. Since that would be incorrect in practice, the new key will never be sent to a V2 Relay based on the endpoint implementation.
* master: feat(sampling): Support custom tags and more contexts (#1268) deps: Update symbolic to pull in Unreal parser fixes (#1266) Bring in CLA Lite (#1257) release: 0.8.11 ref(profiling): Normalize all profiles (#1250) feat(profiling): Add a profile data category and count profiles in an envelope to apply rate limits (#1259) fix: Raise a new InvalidCompression Outcome for invalid Unreal compression (#1237) chore: Update logo for dark or light theme (#1262) ref: Use Duration to compute breakdowns and span durations (#1260)
Let's wait with the changelog entry until after May 15th, when the next OSS release runs. |
whoops, feel free to revert it. but why? |
No worries about the changelog. Once the OSS release runs, the existing section will be renamed to 22.5.0, so we'll have to move the changelog entry to a new This PR is ready to be reviewed. Before merging and deploying it, we will have to accept |
#[derive(Debug, Deserialize, Serialize)] | ||
#[serde(rename_all = "camelCase")] | ||
pub struct GetProjectStatesResponse { | ||
#[serde(default)] | ||
pub configs: HashMap<ProjectKey, ErrorBoundary<Option<ProjectState>>>, | ||
#[serde(default)] | ||
pub pending: Vec<ProjectKey>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the advantage of the currently implemented schema? the protocol is not backwards-compatible anyway, even if it is at the syntactical level, we cannot send v3 to a v2 relay. if we accidentally send v3 response to v2 relay, deserialization failure (unknown value "pending") is actually safer IMO than assuming the project does not exist. The latter causes silent data dropping while the former throws errors everywhere. (Presumably we will see alerts either way, but customer relays may have less monitoring)
also, will we write pending state into redis, and if so, how?
.map(|query| query.version == self.0) | ||
.unwrap_or(false) | ||
let query = VersionQuery::from_request(req); | ||
query.version >= ENDPOINT_V2 && query.version <= ENDPOINT_V3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the previous move from v1 to v2 has been implemented such that relay only supports one version, forwards all other requests at the HTTP layer, and all legacy compat stuff is handled in sentry. did we decide against continuing this path because we fear the increased traffic on python web workers coming from existing v2 customer relays? that's a bit unfortunate, it makes versioning more complicated. I think sentry will have to support all versions forever anyway. I thought it would be possible to weather through the increased traffic on python web workers until customers upgrade their relays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The basic model is still the same. Relay supports V2 and V3 now, handling all deployed versions of Relay without relying on the HTTP endpoint in Sentry. It will forward V4+ though, just like we used to forward future versions before. So the difference now is just that we also keep supporting the "old" V2 for performance reasons.
that's a bit unfortunate, it makes versioning more complicated.
Agreed, although we can drop V2 support some time in the future, returning to supporting just a single version. Given the current implementation, do you think this is too much added complexity in this case? Personally, to me this seems still quite low-overhead.
#[macro_export] | ||
macro_rules! project_states_version { | ||
() => { | ||
2 | ||
3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, this constant is now also defined as const
item further down
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. We might also get rid of this macro now. The ENDPOINT_V3
constant wasn't set to this because V3 != current_version. Thoughts?
Closing & reopening as per suggestion from @chadwhitacre to get CLA checks working. |
* master: fix(ci): Use correct trigger condition for CLA (#1269) release: 22.5.0
Introduces version 3 of the endpoint protocol for project configs. It is
a superset version 2 that adds a list of
pending
keys. Keys in thatlist are computed asynchronously by the upstream and should be retried
after a short while.
The version 2 endpoint blocked the response of the entire batch request
on every requested project config. Especially for longer chains of
Relays, a single slow config could therefore fail or timeout an entire
batch. Under load or in extraordinary circumstances, this had the
potential to accelerate the degradation of ingestion.
The version 3 implementation is twofold:
pending
list back into the queue. It will be retried on the next fetch
cycle(s), until a config can be retrieved or the project exceeds its
retry limits.
continues to await the project state response and never populates
pending
. In version 3, it only responds with a config if the cachedstate is up-to-date (including soft outdated). Otherwise, version 3
always responds with
pending
.An example for the response schema is:
Requires getsentry/sentry#34588, do not deploy Relay before Sentry is deployed.