Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(server): Retry project configs asynchronously [INGEST-1220] #1263

Merged
merged 10 commits into from
May 17, 2022

Conversation

flub
Copy link
Contributor

@flub flub commented May 6, 2022

Introduces version 3 of the endpoint protocol for project configs. It is
a superset version 2 that adds a list of pending keys. Keys in that
list are computed asynchronously by the upstream and should be retried
after a short while.

The version 2 endpoint blocked the response of the entire batch request
on every requested project config. Especially for longer chains of
Relays, a single slow config could therefore fail or timeout an entire
batch. Under load or in extraordinary circumstances, this had the
potential to accelerate the degradation of ingestion.

The version 3 implementation is twofold:

  1. The upstream project source puts every project key from the pending
    list back into the queue. It will be retried on the next fetch
    cycle(s), until a config can be retrieved or the project exceeds its
    retry limits.
  2. The endpoint now supports both versions, 2 and 3. In version two, it
    continues to await the project state response and never populates
    pending. In version 3, it only responds with a config if the cached
    state is up-to-date (including soft outdated). Otherwise, version 3
    always responds with pending.

An example for the response schema is:

{
  "configs": {
    "50f0f59961bce997c872b45942c57e9f": { /* ... */ },
    "8f19b1b4d628e4ef0a3014c70379c180": { /* ... */ }
  },
  "pending": ["e1eef6aef6b20fcdb8190d703ab6a56b"]
}

Requires getsentry/sentry#34588, do not deploy Relay before Sentry is deployed.

@jan-auer jan-auer changed the title Feat/upstream pending feat(server): Retry project configs asynchronously [INGEST-1220] May 9, 2022
#[derive(Debug, Deserialize, Serialize)]
#[serde(rename_all = "camelCase")]
pub struct GetProjectStatesResponse {
#[serde(default)]
pub configs: HashMap<ProjectKey, ErrorBoundary<Option<ProjectState>>>,
#[serde(default)]
pub pending: Vec<ProjectKey>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make it impossible to get into an inconsistent state here (project key appears in both collections), should we turn configs into something like HashMap<ProjectKey, ErrorBoundary<MaybeProjectState>>, with

enum MaybeProjectState {
    None, Present(ProjectState), Pending 
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also define precedence of when both are present, for instance that the supplied config always wins over the pending state.

The initial approach was to do exactly what you are suggesting. Making serialization work in an unobtrusive way (supporting both v2 and v3 without much overhead) is slightly harder in that scenario, but still possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the advantage of the currently implemented schema? the protocol is not backwards-compatible anyway, even if it is at the syntactical level, we cannot send v3 to a v2 relay. if we accidentally send v3 response to v2 relay, deserialization failure (unknown value "pending") is actually safer IMO than assuming the project does not exist. The latter causes silent data dropping while the former throws errors everywhere. (Presumably we will see alerts either way, but customer relays may have less monitoring)

also, will we write pending state into redis, and if so, how?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation was just massively easier to implement and read than the other one after trying both out.

In theory, we can send this down to a V2 Relay, and this is also how the endpoint is implemented. Such a Relay would assume the now missing project config as missing and stop polling. Since that would be incorrect in practice, the new key will never be sent to a V2 Relay based on the endpoint implementation.

relay-server/src/endpoints/project_configs.rs Show resolved Hide resolved
jan-auer and others added 4 commits May 12, 2022 12:27
* master:
  feat(sampling): Support custom tags and more contexts (#1268)
  deps: Update symbolic to pull in Unreal parser fixes (#1266)
  Bring in CLA Lite (#1257)
  release: 0.8.11
  ref(profiling): Normalize all profiles (#1250)
  feat(profiling): Add a profile data category and count profiles in an envelope to apply rate limits (#1259)
  fix: Raise a new InvalidCompression Outcome for invalid Unreal compression (#1237)
  chore: Update logo for dark or light theme (#1262)
  ref: Use Duration to compute breakdowns and span durations (#1260)
@jan-auer
Copy link
Member

Let's wait with the changelog entry until after May 15th, when the next OSS release runs.

@flub
Copy link
Contributor Author

flub commented May 13, 2022

Let's wait with the changelog entry until after May 15th, when the next OSS release runs.

whoops, feel free to revert it. but why?

@jan-auer jan-auer marked this pull request as ready for review May 16, 2022 13:48
@jan-auer jan-auer requested a review from a team May 16, 2022 13:48
@jan-auer
Copy link
Member

No worries about the changelog. Once the OSS release runs, the existing section will be renamed to 22.5.0, so we'll have to move the changelog entry to a new Unreleased section.

This PR is ready to be reviewed. Before merging and deploying it, we will have to accept version=3 requests in Sentry, though.

#[derive(Debug, Deserialize, Serialize)]
#[serde(rename_all = "camelCase")]
pub struct GetProjectStatesResponse {
#[serde(default)]
pub configs: HashMap<ProjectKey, ErrorBoundary<Option<ProjectState>>>,
#[serde(default)]
pub pending: Vec<ProjectKey>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the advantage of the currently implemented schema? the protocol is not backwards-compatible anyway, even if it is at the syntactical level, we cannot send v3 to a v2 relay. if we accidentally send v3 response to v2 relay, deserialization failure (unknown value "pending") is actually safer IMO than assuming the project does not exist. The latter causes silent data dropping while the former throws errors everywhere. (Presumably we will see alerts either way, but customer relays may have less monitoring)

also, will we write pending state into redis, and if so, how?

.map(|query| query.version == self.0)
.unwrap_or(false)
let query = VersionQuery::from_request(req);
query.version >= ENDPOINT_V2 && query.version <= ENDPOINT_V3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the previous move from v1 to v2 has been implemented such that relay only supports one version, forwards all other requests at the HTTP layer, and all legacy compat stuff is handled in sentry. did we decide against continuing this path because we fear the increased traffic on python web workers coming from existing v2 customer relays? that's a bit unfortunate, it makes versioning more complicated. I think sentry will have to support all versions forever anyway. I thought it would be possible to weather through the increased traffic on python web workers until customers upgrade their relays.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The basic model is still the same. Relay supports V2 and V3 now, handling all deployed versions of Relay without relying on the HTTP endpoint in Sentry. It will forward V4+ though, just like we used to forward future versions before. So the difference now is just that we also keep supporting the "old" V2 for performance reasons.

that's a bit unfortunate, it makes versioning more complicated.

Agreed, although we can drop V2 support some time in the future, returning to supporting just a single version. Given the current implementation, do you think this is too much added complexity in this case? Personally, to me this seems still quite low-overhead.

#[macro_export]
macro_rules! project_states_version {
() => {
2
3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, this constant is now also defined as const item further down

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. We might also get rid of this macro now. The ENDPOINT_V3 constant wasn't set to this because V3 != current_version. Thoughts?

@jan-auer
Copy link
Member

Closing & reopening as per suggestion from @chadwhitacre to get CLA checks working.

* master:
  fix(ci): Use correct trigger condition for CLA (#1269)
  release: 22.5.0
@jan-auer jan-auer merged commit 7e63150 into master May 17, 2022
@jan-auer jan-auer deleted the feat/upstream-pending branch May 17, 2022 11:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants